"Can you give the agent memory?" is one of the most common requests we get, and one of the most ambiguous. Three different teams will mean three different things by it. One means the agent forgot what the user said two turns ago. One means the agent should remember a returning customer's preferences across sessions. One means the agent lost track of its own plan halfway through a long task. These are three different problems with three different fixes, and the reason agents have bad memory is usually that someone treated them as one problem.
Memory in an agent is not a single feature you switch on. It is at least three systems, each with a separate job, a separate storage, and a separate failure mode. Get the distinction right and most "memory" complaints resolve cleanly. Get it wrong and you end up with an agent that either forgets what it was doing or drowns in its own history, sometimes both at once.
Short-term memory: the context window, actively managed
Short-term memory is what the agent can see right now: the current context window. The conversation so far, the recent tool outputs, the working state of the task. It is fast, it is immediate, and it is strictly bounded. There is a hard ceiling on how much fits, and everything in it competes for the model's attention.
The most common mistake here is believing that more short-term memory is better, that the fix for forgetting is to keep stuffing the whole conversation back into every call. The opposite is true. A context window packed with ten turns of stale tool output does not make the agent smarter; it makes the current question harder to find. The model attends to everything you give it, including the parts that no longer matter, and accuracy drops as the signal-to-noise ratio falls.
Short-term memory has to be actively managed, not passively accumulated. That means summarizing older turns into compact state, dropping tool outputs that are no longer relevant, and keeping verbatim only what the current step genuinely needs. This is the same discipline as curating the context layer on every step, and it is where the real engineering is. "The agent has a great memory" usually means "the agent is disciplined about what it forgets."
Long-term memory: durable facts in external storage
Long-term memory is the thing people usually picture, and it is not a bigger context window. It is external storage of durable facts that outlive a single session: a returning user's preferences, decisions made in past conversations, the current state of an entity the agent tracks, lessons learned that should persist.
The defining property is that it lives outside the context window, in a database or store, and is retrieved selectively when relevant. You do not replay the user's entire history into the prompt at the start of every session. You store it durably and pull in the specific facts that matter for the current interaction. When a returning customer messages, you retrieve "prefers email over phone, on the legacy plan, had a billing issue last month," not the full transcript of every prior conversation.
This is why long-term memory and the knowledge layer are close cousins: both are about retrieving the right durable facts into a bounded context at the right moment. The difference is that long-term memory is facts the agent itself accumulates over time, rather than knowledge you loaded in advance.
Working memory: keeping the plan alive across a long task
The third system is the one teams forget exists. Working memory is the agent's grip on its own task while it is in the middle of executing it: the plan, the progress so far, the sub-goals still outstanding. It is technically part of short-term memory, but it has a distinct failure mode worth naming separately.
On a long, multi-step task, the agent's working memory gets crowded out by the accumulating observations from each step. After several tool calls, the immediate results dominate the context and the agent loses the thread of what it was actually trying to accomplish. This is the loop death we described in the difference between ReAct and Planning: the agent is reacting to the latest observation and has forgotten the master goal.
The fix is to give the plan a protected place in the context that does not get crowded out, and to re-surface it on each step. The plan is working memory that must survive the whole task, so it should not be competing on equal footing with transient tool outputs. Keeping the goal explicit and persistent is how you stop an agent from wandering off mid-task.
The hard part is deciding what to forget
Storing memory is easy. Every framework will happily persist everything for you. The hard part, the part that separates a memory system that helps from one that hurts, is deciding what not to keep.
Three decisions run continuously in a good memory system:
- What to promote to long-term. Not every fact from a session deserves to persist. "The user asked about pricing" is noise; "the user is on the enterprise plan and prefers async updates" is a durable fact. Promoting everything fills long-term storage with junk that pollutes future retrieval.
- What to keep in short-term. Of the current session, what does the next step actually need verbatim, and what can be summarized or dropped? This is the active management that keeps the context window healthy.
- What to drop entirely. Some information is relevant for one step and never again. Holding it costs attention and money for no benefit.
A memory system that remembers everything is not a good memory; it is an expensive way to lose the signal. Human memory works precisely because it forgets aggressively and keeps what matters. Agent memory has to be designed the same way, on purpose.
The symptoms, mapped to the system
When a team reports a "memory problem," the specific symptom points to which system is broken:
- Forgets what was said a few turns ago → short-term management. You are probably truncating naively or summarizing away the wrong things.
- Does not recognize a returning user or past context across sessions → no long-term memory, or long-term memory that is not being retrieved at the right moment.
- Loses track of its own task on long jobs → working memory crowded out. The plan is not protected.
- Gets slower and less accurate the longer a session runs → unmanaged short-term memory. You are accumulating history and feeding it all back in.
- Retrieves stale or irrelevant "memories" → long-term memory polluted with things that should never have been promoted.
Each of these has a different fix, which is exactly why "give the agent memory" is the wrong unit of work. You fix a specific system, not "memory" in general.
How the three fit together
A healthy agent runs all three in concert. Long-term memory holds the durable facts in external storage. A retrieval step pulls the relevant subset into short-term memory at the start of an interaction. Working memory keeps the current plan and goal protected throughout the task. And short-term memory is actively pruned so the context window stays full of signal, not history.
The flow of a fact through this system is: it enters short-term as it happens, it gets evaluated for whether it is durable, the durable ones get promoted to long-term, and on a future session the relevant ones get retrieved back into short-term when they matter. Storage is the easy half. The promotion, retrieval, and pruning decisions are where a memory system is actually built, and where ours spend most of their design effort. We covered the production-deployment side of this in more depth in making memory work in production agents.
If your agent forgets, or remembers too much
Both complaints usually trace to the same root cause: treating memory as one undifferentiated feature instead of three systems with three jobs. An agent that forgets has a short-term or working-memory problem. An agent that gets slow and confused has an unmanaged-accumulation problem. An agent that does not recognize returning context has a long-term problem. They are fixed separately.
Sapota runs a one-week memory architecture pass that separates the three systems, builds the promotion and retrieval logic between them, and ships it as a working integration. We have done this for support agents that need to recognize returning customers, copilots that run long tasks, and assistants that need to stay coherent across sessions.
Reach out via the AI engineering page with a description of what your agent forgets, or what it cannot stop remembering. The distinction usually points straight at the fix.








