memnode
Sign InSign Up
Back to Articles
Featured

Compaction Is Not Memory: What Your Agent Forgets When the Window Fills

Context compaction feels like the agent kept everything and got efficient. It did not. The new compact_20260112 API drops the raw turns and keeps a lossy summary. Why compaction amnesia and context rot are the same mistake, and the clean split between window management and durable memory.

memnode7 min read
context windowcompactioncontext rotagent memorymcpagents

A long agent session fills its context window, the runtime compacts the history into a summary, and the agent keeps going. It feels like the agent kept everything and just got more efficient. It did not. Compaction is a lossy operation that trades fidelity for headroom, and the thing it throws away does not come back. The most common production complaint behind phrases like compaction amnesia and context rot is the same mistake stated two ways: treating the context window as if it were memory. It is not. This is the difference, why it bites mid-session, and what actually belongs in durable storage instead.

What compaction actually does

Compaction is now a first-class API feature, not a userland hack. Anthropic ships server-side compaction: include the beta header compact-2026-01-12 and add the compact_20260112 edit type to context_management.edits, supported on Claude Opus 4.6 and later and Sonnet 4.6. When input tokens cross your trigger threshold, the API generates a summary of the conversation so far, emits a compaction block, and continues from there. The mechanism is the important part:

response = client.beta.messages.create(
    betas=["compact-2026-01-12"],
    model="claude-opus-4-6",
    messages=messages,
    context_management={"edits": [{"type": "compact_20260112"}]},
)

On every subsequent request, all content blocks before the compaction block are dropped. The raw turns are gone from what the model sees. What survives is the summary the model wrote about them. That is the entire trade: you get the window back, and in exchange the agent now reasons over a paraphrase of its own past instead of the past itself.

The sibling feature, context editing, is narrower and cleaner. The clear_tool_uses_20250919 strategy (beta header context-management-2025-06-27) clears the oldest tool results once context grows past a threshold and replaces each one with placeholder text so the model knows something was removed. Old file dumps and search results that the model has already digested stop costing tokens. Both features manage the window. Neither is a place to keep anything.

Why this is not memory

A summary is not a record. Three properties that durable memory has, a compaction summary structurally cannot:

  • It is lossy and irreversible. The summarizer decides what mattered. If it judged a detail unimportant and dropped it, that detail is not retrievable later. There is no index behind the summary to fall back to.
  • It is scoped to one session. The compaction block lives in this conversation. Start a fresh session tomorrow and it is not there. Cross-session continuity was never what compaction was for.
  • It has no provenance and no correction path. The summary blends facts, speculation, and intermediate reasoning into prose. You cannot ask where a line came from, mark one fact as superseded, or update it when a convention changes. It is a snapshot, not a queryable store.

Persistent memory is the opposite on all three: explicit records you write on purpose, queryable across sessions, each carrying where it came from and whether it is still current. If you want the full taxonomy of what durable memory should be, the long-term memory guidecovers the four shapes and the properties that decide them. The point here is narrower: compaction is not one of those shapes. It is window management that happens to produce text.

Compaction amnesia: the failure mode

The thing that surprises people is that the damage starts before the window is full. Chroma's context-rot study evaluated 18 frontier models, including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3, and found that every one of them gets less reliable as input length grows. Crucially, the degradation does not wait for the limit. Models showed measurable trouble at modest input sizes well short of their maximum window, and the gap between a focused prompt and a bloated one was large: on LongMemEval the full inputs averaged about 113k tokens against roughly 300 tokens for the equivalent focused prompt, with a consistent quality gap across model families.

Now stack compaction on top of that. The agent is already reasoning worse as the window fills. Compaction then replaces the high-fidelity prefix with a summary written by a model that was, at the moment it summarized, operating in exactly the degraded regime the study describes. You are asking a tired reader to write the crib notes the next reader will rely on. When the agent later confidently asserts something that contradicts a decision made 80k tokens ago, that is compaction amnesia: the decision was real, it just did not survive the summarizer, and nothing flagged the loss.

The pattern that actually holds up

Keep the two jobs separate. Use compaction and context editing for what they are good at, which is keeping a long session inside the window. Use a memory layer for what compaction cannot do, which is remembering across the boundary. The division of labor that works in practice:

  1. Working context stays in the window. The current task, the files in flight, the last few turns. Let compaction trim this when it overflows. Losing fidelity here is acceptable because it is the disposable scratch space of the session.
  2. Durable facts get written out explicitly. A decision, a convention, a resolved constraint, anything the agent will need next week, gets recorded to a memory store the moment it is established, not left to chance in the transcript. Then it does not matter whether compaction later drops the turn it came from.
  3. Recall is pulled back in deliberately. At session start, or when the agent needs a fact, it queries the memory layer for the small relevant slice rather than re-reading the whole history. This also keeps the window smaller, which directly fights the context rot that made compaction necessary in the first place.

The tell that a team has conflated the two is a system that re-sends the entire conversation every turn and leans on compaction to keep it affordable. That works until the first time the agent acts on something the summarizer quietly dropped. The fix is not a bigger window or a smarter summarizer. It is moving the facts you cannot afford to lose out of the transcript and into storage you can query and correct.

A quick test for what belongs where

For any fact the agent is holding, ask: if compaction fired right now and rewrote the last 100k tokens into three sentences, would losing this be fine? If yes, it is working context, leave it in the window. If no, it is memory, and it should already be written somewhere durable. Anything you would be unhappy to lose to a summary is, by definition, not something a summary should be responsible for.

What to take away

Compaction is a genuinely useful window-management primitive, and the new server-side API makes long sessions practical that were not before. But it is a garbage collector for the context window, not a memory system. It is lossy, single-session, and unauditable by design. The moment you need a fact to survive a session boundary, a correction, or an audit, you have left compaction's job and entered memory's. Build the memory layer underneath, write the durable facts out on purpose, and let compaction do the one thing it is actually for.

Sources