The Stale Context Problem Explained

In short: The stale context problem is what happens when you resume a session after changing files: the agent reasons from cached tool results captured before the edits, so its advice references code that no longer exists and contradicts the current state. The reliable fixes are targeted re-analysis of the specific files that changed, or a fresh start with an injected summary when the context has degraded too far.

What the stale context problem is

The stale context problem is the most consequential failure mode of resuming a session, and it is worth evaluating carefully rather than memorising a slogan. When an agent reads a file, the result of that read is stored in the conversation history as a tool result, effectively a snapshot of the file at the moment it was read. Resume the session later and that snapshot comes back with it. If the file has changed in the meantime, the agent is now reasoning from a photograph of code that no longer exists, and it has no way to know the photograph is out of date.

The visible symptom is advice that contradicts your current code. The agent confidently references a function you renamed, suggests a fix for a bug you already patched, or warns about a structure you refactored away. Nothing is broken in the model; it is faithfully reasoning over the context it was given. The defect lives in the freshness of that context, which is why this is framed at the evaluate Bloom level: you are expected to judge when context has gone stale and to weigh the competing remedies, not just recite a definition.

Stale tool result: A tool result stored in a session that was accurate when captured but no longer reflects the current state of the system, most commonly a file read taken before the file was edited. Resuming the session replays it as if it were still true.

Why resuming reintroduces the problem

The mechanism is a direct consequence of how sessions persist. A session is the accumulated history of prompts, tool calls, tool results, and responses, written to disk so it can be restored intact. That faithfulness is the feature when nothing has changed, and the bug when something has. Restoring the history restores the stale reads along with everything else, because the session has no notion of which results have since been invalidated by edits on disk.

This is also why forking offers no relief. A fork begins as a copy of the original history and then diverges, so it carries the same stale results into the new branch. People reach for fork expecting a clean baseline and instead clone the pollution. The Agent SDK documentation makes the deeper point explicitly: rather than relying on session resume across changes, it is often more robust to capture the results you actually need as application state and pass them into a fresh session's prompt. In other words, the cure is not a different way of reopening the old transcript; it is choosing what to carry forward.

How a stale tool result poisons resumed advice

Loading diagram...

The cached read from T1 survives the resume; the edit at T2 does not reach the agent, so its reasoning is anchored to stale data.

Evaluating the remedies

There are three responses to a stale session, and an architect is expected to rank them rather than grab the first.

The wrong instinct is to make the agent re-explore everything from scratch, re-reading the entire codebase to be thorough. This is wasteful and, ironically, counterproductive: it floods the already-polluted context with even more tokens, and the relevant signal, what actually changed, drowns in noise. Anthropic's context-engineering guidance frames this directly, noting that context windows of every size are subject to pollution and that the goal is the smallest set of high-signal tokens, not the largest.

The efficient remedy is targeted re-analysis. You tell the agent precisely which files changed and ask it to re-read only those, so it refreshes the specific snapshots that went stale while keeping the still-valid reasoning. This is cheap, surgical, and usually enough when only a handful of files moved.

The most reliable remedy, when the context has degraded badly, is to abandon the transcript and start fresh with a structured summary of the findings and decisions that still hold. This is the approach detailed in summary injection for fresh sessions, and it is the right call when the pollution is so extensive that patching individual reads is no longer trustworthy. Evaluating which remedy fits a given situation, surgical refresh versus clean restart, is exactly the judgment this knowledge point trains.

Snapshot

a tool result is frozen at read time

Targeted

re-analyse only the changed files

Fresh

restart with a summary when degraded

Why tool results behave like snapshots

It is worth being precise about why this happens, because the precision is what lets you reason about edge cases. A tool result is not a live view; it is a value captured at the instant the tool ran and then stored verbatim in the transcript. A file read taken at ten o'clock records the bytes that existed at ten o'clock, full stop. The model never re-runs that read on its own when you resume; it simply sees the stored value and treats it as current, because the transcript carries no timestamp the model reasons over and no signal that the world has moved on. Staleness, in other words, is not the model forgetting. It is the model remembering too faithfully.

Detecting staleness before it bites

Catching the problem early saves the most pain. The clearest tell is a temporal mismatch: you edited files after the agent last read them, and now you are resuming. Other tells are behavioural, such as advice that names a symbol you renamed, a fix for a bug you already closed, or a warning about a structure you removed. There is also an infrastructure variant worth knowing: the Agent SDK notes that if a resume call runs from a different working directory than the original, the SDK looks in the wrong place and can hand back a fresh session instead of the expected history, because sessions are keyed by an encoded cwd. That is not stale context exactly, but it belongs to the same family of "the session is not what you think it is" surprises, and the habit of verifying what you actually resumed guards against both.

When tooling, not you, moves the files

Editing a file by hand is the obvious source of staleness, but it is far from the only one. Switching git branches, pulling a teammate's changes, running a code formatter, or letting a build step regenerate files all move the bytes on disk out from under the agent's cached reads, often without you consciously "editing" anything at all. This is why a resume that felt perfectly safe can still surprise you: the staleness was introduced by tooling, not by a deliberate keystroke. Before resuming a session you care about, it is worth asking not just "did I change these files" but "did anything change them," and refreshing the reads that any of those operations touched.

Stale context versus a context limit

Do not confuse stale context with simply running out of room. A context limit is about quantity, too many tokens to fit, and is handled by compaction or a fresh start that makes space. Staleness is about accuracy, where the tokens that fit are wrong, and is handled by refreshing or replacing the bad ones. The two can co-occur in a long, heavily edited session, which is exactly when a fresh start with a summary earns its place by solving both at once. Keeping the distinction sharp prevents the classic error of reaching for a bigger window when the real problem is that the content of the window no longer matches the code.

A documented case: idle-session degradation

Stale and degraded context is not only a risk in your own sessions; Anthropic has published a postmortem describing the same family of failure inside the tooling itself. To make resuming an idle session faster, the system was designed to clear older reasoning once a session had been inactive for roughly an hour. A bug caused that clearing to keep firing on every subsequent turn rather than just once, so the agent carried on working "without memory" of decisions it had made earlier in the same task, producing exactly the repetitive, forgetful behaviour that long sessions are prone to.

The episode is a useful exam anchor because it shows the failure mode at the platform level: even a carefully engineered runtime can let earlier context silently drop out of a long or resumed session. The architectural lesson is the one this knowledge point keeps returning to, which is to treat a resumed session as a stateful local artifact with an explicit lifecycle rather than as durable memory you can trust to stay faithful indefinitely. When continuity genuinely matters, externalise the decisions that must survive into an injected summary or durable notes instead of assuming the session will remember them for you.

Why this matters for the exam

Stale context is the dark side of the resume convenience, and Task Statement 1.7 tests whether you can spot it. In Scenario 4, Developer Productivity with Claude, a prompt may describe an engineer who edited several files, resumed, and started receiving advice that no longer fits the code. The exam wants you to name the cause, stale tool results in a resumed session, and to pick the proportionate fix. The most tempting distractors dress the symptom up as a model-quality problem, suggesting a larger context window, a lower temperature, or a newer model. Recognising that none of those touch freshness, and that the real levers are targeted re-analysis or a clean restart, is the discriminating insight.

This knowledge point also connects outward. The habit of refreshing only what changed echoes debugging premature loop termination, where the discipline is again to trace the real signal rather than reach for a blunt instrument.

Worked example

A developer resumes a session after rewriting three files and gets advice that contradicts the new code.

A developer spent the morning with Claude Code analysing a service, then over lunch rewrote three files: the request handler, a validation helper, and its test. In the afternoon they resumed the morning session and asked for the next refactoring step. The agent replied with confident guidance that referenced the old validation helper, the version it had read before lunch, and flatly contradicted the new structure.

Evaluating the situation, the developer recognises this as stale context: the morning's tool results are snapshots that predate the rewrite. They reject the urge to ask the agent to re-read the whole repository, since that would bury the change in noise and burn tokens. Instead they apply targeted re-analysis, telling the agent exactly which three files changed and asking it to re-read only those. The agent refreshes the relevant snapshots, reconciles them with the still-valid morning analysis, and produces advice consistent with the current code.

Had the rewrite touched dozens of files and the context window already been bloated, the better call would have been to start fresh and inject a summary of the decisions that survived. The point of the exercise is not a single right answer but the act of weighing how degraded the context is and matching the remedy to the damage.

Common misreadings to avoid

Misconception

When stale context appears, the safest fix is to make the agent re-read the entire codebase from scratch.

What's actually true

Full re-exploration is wasteful and tends to make things worse, because it floods an already-polluted context with low-signal tokens and buries what actually changed. The efficient fix is targeted re-analysis: tell the agent exactly which files changed so it refreshes only those snapshots. Reserve a full restart for badly degraded context.

Misconception

Stale context is a defect in the model, so a newer or larger model will avoid it.

What's actually true

Stale context is not a model-quality issue; it is a consequence of resuming a session whose cached tool results no longer match disk. Anthropic notes that context windows of all sizes are subject to pollution, so a bigger model still inherits stale reads. The remedy is session hygiene, targeted re-analysis or a fresh start, not a different model.

Check your understanding

A developer resumed a Claude Code session after changing three files, and the agent's advice now references code that no longer exists and contradicts the current implementation. What is the most appropriate remedy?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

The Stale Context Problem in Resumed Agent Sessions