SapotaCorp

Agent memory: short-term, long-term, and the context window in between

"Give the agent memory" usually means one of three different things, and teams conflate them until the agent either forgets what it was doing or drowns in its own history. Short-term, long-term, and working memory are separate systems with separate jobs. Here is how to tell them apart and wire each one correctly.

Agent memory: short-term, long-term, and the context window in between

Key takeaways

  • "Memory" in an agent is three different systems: short-term working memory (what is in the context window right now), long-term memory (durable facts in external storage), and the retrieval layer that decides what to pull from long-term into short-term. Conflating them is the root of most memory bugs.
  • Short-term memory is bounded by the context window and must be actively managed. Letting the conversation grow unbounded does not give the agent a better memory, it degrades accuracy and raises cost as relevant signal gets buried under stale history.
  • Long-term memory is not "a bigger context window." It is external storage of durable facts (preferences, past decisions, entity state) that outlive a single session and are retrieved selectively, not replayed wholesale.
  • The hard part is not storing memory, it is deciding what to promote to long-term, what to keep in short-term, and what to drop. A good memory system forgets on purpose. One that remembers everything is just an expensive way to lose the signal.

"Can you give the agent memory?" is one of the most common requests we get, and one of the most ambiguous. Three different teams will mean three different things by it. One means the agent forgot what the user said two turns ago. One means the agent should remember a returning customer's preferences across sessions. One means the agent lost track of its own plan halfway through a long task. These are three different problems with three different fixes, and the reason agents have bad memory is usually that someone treated them as one problem.

Memory in an agent is not a single feature you switch on. It is at least three systems, each with a separate job, a separate storage, and a separate failure mode. Get the distinction right and most "memory" complaints resolve cleanly. Get it wrong and you end up with an agent that either forgets what it was doing or drowns in its own history, sometimes both at once.

Short-term memory: the context window, actively managed

Short-term memory is what the agent can see right now: the current context window. The conversation so far, the recent tool outputs, the working state of the task. It is fast, it is immediate, and it is strictly bounded. There is a hard ceiling on how much fits, and everything in it competes for the model's attention.

The most common mistake here is believing that more short-term memory is better, that the fix for forgetting is to keep stuffing the whole conversation back into every call. The opposite is true. A context window packed with ten turns of stale tool output does not make the agent smarter; it makes the current question harder to find. The model attends to everything you give it, including the parts that no longer matter, and accuracy drops as the signal-to-noise ratio falls.

Short-term memory has to be actively managed, not passively accumulated. That means summarizing older turns into compact state, dropping tool outputs that are no longer relevant, and keeping verbatim only what the current step genuinely needs. This is the same discipline as curating the context layer on every step, and it is where the real engineering is. "The agent has a great memory" usually means "the agent is disciplined about what it forgets."

Long-term memory: durable facts in external storage

Long-term memory is the thing people usually picture, and it is not a bigger context window. It is external storage of durable facts that outlive a single session: a returning user's preferences, decisions made in past conversations, the current state of an entity the agent tracks, lessons learned that should persist.

The defining property is that it lives outside the context window, in a database or store, and is retrieved selectively when relevant. You do not replay the user's entire history into the prompt at the start of every session. You store it durably and pull in the specific facts that matter for the current interaction. When a returning customer messages, you retrieve "prefers email over phone, on the legacy plan, had a billing issue last month," not the full transcript of every prior conversation.

This is why long-term memory and the knowledge layer are close cousins: both are about retrieving the right durable facts into a bounded context at the right moment. The difference is that long-term memory is facts the agent itself accumulates over time, rather than knowledge you loaded in advance.

Working memory: keeping the plan alive across a long task

The third system is the one teams forget exists. Working memory is the agent's grip on its own task while it is in the middle of executing it: the plan, the progress so far, the sub-goals still outstanding. It is technically part of short-term memory, but it has a distinct failure mode worth naming separately.

On a long, multi-step task, the agent's working memory gets crowded out by the accumulating observations from each step. After several tool calls, the immediate results dominate the context and the agent loses the thread of what it was actually trying to accomplish. This is the loop death we described in the difference between ReAct and Planning: the agent is reacting to the latest observation and has forgotten the master goal.

The fix is to give the plan a protected place in the context that does not get crowded out, and to re-surface it on each step. The plan is working memory that must survive the whole task, so it should not be competing on equal footing with transient tool outputs. Keeping the goal explicit and persistent is how you stop an agent from wandering off mid-task.

The hard part is deciding what to forget

Storing memory is easy. Every framework will happily persist everything for you. The hard part, the part that separates a memory system that helps from one that hurts, is deciding what not to keep.

Three decisions run continuously in a good memory system:

  • What to promote to long-term. Not every fact from a session deserves to persist. "The user asked about pricing" is noise; "the user is on the enterprise plan and prefers async updates" is a durable fact. Promoting everything fills long-term storage with junk that pollutes future retrieval.
  • What to keep in short-term. Of the current session, what does the next step actually need verbatim, and what can be summarized or dropped? This is the active management that keeps the context window healthy.
  • What to drop entirely. Some information is relevant for one step and never again. Holding it costs attention and money for no benefit.

A memory system that remembers everything is not a good memory; it is an expensive way to lose the signal. Human memory works precisely because it forgets aggressively and keeps what matters. Agent memory has to be designed the same way, on purpose.

The symptoms, mapped to the system

When a team reports a "memory problem," the specific symptom points to which system is broken:

  • Forgets what was said a few turns ago → short-term management. You are probably truncating naively or summarizing away the wrong things.
  • Does not recognize a returning user or past context across sessions → no long-term memory, or long-term memory that is not being retrieved at the right moment.
  • Loses track of its own task on long jobs → working memory crowded out. The plan is not protected.
  • Gets slower and less accurate the longer a session runs → unmanaged short-term memory. You are accumulating history and feeding it all back in.
  • Retrieves stale or irrelevant "memories" → long-term memory polluted with things that should never have been promoted.

Each of these has a different fix, which is exactly why "give the agent memory" is the wrong unit of work. You fix a specific system, not "memory" in general.

How the three fit together

A healthy agent runs all three in concert. Long-term memory holds the durable facts in external storage. A retrieval step pulls the relevant subset into short-term memory at the start of an interaction. Working memory keeps the current plan and goal protected throughout the task. And short-term memory is actively pruned so the context window stays full of signal, not history.

The flow of a fact through this system is: it enters short-term as it happens, it gets evaluated for whether it is durable, the durable ones get promoted to long-term, and on a future session the relevant ones get retrieved back into short-term when they matter. Storage is the easy half. The promotion, retrieval, and pruning decisions are where a memory system is actually built, and where ours spend most of their design effort. We covered the production-deployment side of this in more depth in making memory work in production agents.

If your agent forgets, or remembers too much

Both complaints usually trace to the same root cause: treating memory as one undifferentiated feature instead of three systems with three jobs. An agent that forgets has a short-term or working-memory problem. An agent that gets slow and confused has an unmanaged-accumulation problem. An agent that does not recognize returning context has a long-term problem. They are fixed separately.

Sapota runs a one-week memory architecture pass that separates the three systems, builds the promotion and retrieval logic between them, and ships it as a working integration. We have done this for support agents that need to recognize returning customers, copilots that run long tasks, and assistants that need to stay coherent across sessions.

Reach out via the AI engineering page with a description of what your agent forgets, or what it cannot stop remembering. The distinction usually points straight at the fix.

Engineering certifications

Sapota engineers hold credentials on AI Agents. Each badge links to the individual engineer's credly profile.

Browse AI Agents certs

Need this on your team?

Sapota engineers ship the patterns you read here. Two-week paid trial, direct pricing from $1,800/ engineer/month, no agency markup.

Get a quote
Contact Us Now

Share Your Story

We build trust by delivering what we promise – the first time and every time!

We'd love to hear your vision. Our IT experts will reach out to you during business hours to discuss making it happen.

WHY CHOOSE US

"Collaborate, Elevate, Celebrate where Associates - Create Project Excellence"

SapotaCorp beyond the IT industry standard, we are

  • Certificated
  • Assured quality
  • Extra maintenance

Tell us about your project