How persistent memory works in AI agents

What context windows do and do not do

Every language model has a context window — the amount of text it can see at once during a conversation. Frontier models in 2026 have windows of 200,000 to 1,000,000 tokens. Claude Sonnet 4.6 and Opus 4.6 both support 1 million tokens. These are genuinely large. A million tokens holds roughly 700,000 words simultaneously.

But context windows are not persistent. They last for one session. Start a new conversation and the model knows nothing from the previous ones. Two hundred sessions with a chatbot, then a fresh conversation — you are back to zero, regardless of window size. This is not a limitation that larger windows fix. Even with infinite context, a model would still start each session fresh unless a separate memory store loads into context at session start. That is what persistent memory systems do.

How persistent memory systems work

A persistent memory system stores information between sessions and retrieves relevant pieces when a new one starts. At its simplest: a database of text snippets with metadata, and a retrieval function that queries it based on what the current session needs.

Vector databases handle this well because they allow semantic retrieval — finding memories that are conceptually relevant even if the wording differs. If the agent stored 'the user prefers Python over JavaScript' and the current task involves writing code, that memory surfaces even if the new prompt never mentions Python. More structured approaches use tiered memory: hot memory for things needed in almost every session (your name, your primary projects, standing preferences), warm memory for less frequent but important context, and cold memory for archived history retrievable on demand but not loaded automatically.

What Hermes stores in memory

Hermes Agent uses three distinct memory types implemented as files the agent reads and writes directly. The user model lives in USER.md — a structured document containing your technical background, communication preferences, standing project context, and operational patterns. The agent updates this as it learns more about you. General learned facts live in MEMORY.md, a curated store read at session start and written to when the agent discovers something worth retaining.

Skill Documents are procedural memory — searchable markdown files in the agentskills.io open format encoding how to approach specific task types. Tools used, decision tree, failure modes encountered, how they were handled. On future similar tasks, the agent retrieves and loads the relevant Skill Document rather than reasoning from scratch. This is the compounding mechanism: an agent that has done 50 research tasks is materially faster and more reliable than one on its first.

Event memory is a timestamped log of tasks, decisions, and outcomes. This is what lets the agent tell you what it did last Tuesday and why it made a particular call. Optional Honcho integration adds cross-session AI-native user modeling as a separate API layer — building a persistent user understanding that carries across different tools, not just Hermes sessions.

Why this requires a persistent server

Memory stored on your laptop disappears when you close the application, reformat the drive, or switch machines. For memory to be genuinely persistent — accessible from any device, surviving hardware failures, available when the agent runs scheduled tasks while you are offline — it needs to live on a server.

This is the infrastructure dependency that makes cloud hosting important for anyone relying on their agent's memory long-term. A self-hosted VPS works, but it requires manually configuring backups, volume mounts, and disaster recovery. The practical consequence: the longer an agent runs, the more valuable its accumulated memory becomes. An agent six months in with thousands of Skill Documents is meaningfully different from a fresh install. That accumulated state is worth protecting.

What persistent memory does not do

Memory systems do not make agents reliable. An agent with rich memory can still take incorrect actions, misunderstand ambiguous instructions, or apply a past strategy where it does not fit. Memory helps with context. It does not substitute for good task design and human oversight on consequential actions.

Memory also does not stay accurate on its own. If the agent learns something incorrect — a wrong assumption about how a system works, a miscategorized approach in a Skill Document — that incorrect information persists and gets retrieved on future tasks. Periodic review and correction matters for agents doing high-stakes work.

How persistent memory works in AI agents

What context windows do and do not do

How persistent memory systems work

What Hermes stores in memory

Why this requires a persistent server

What persistent memory does not do

Is memory shared across different agents or profiles?

How much does storing agent memory cost?

Can I export or review my agent's memory?

What happens to memory if I cancel my Hermes OS subscription?

Deploy in 5 minutes.