Summary Injection and /compact

In short: Summary injection means summarising the findings of one phase and carrying that summary into the next, instead of dragging the full transcript forward. The /compact command applies the same idea to a full window: it replaces accumulated history with a condensed summary, freeing token space while preserving the essential thread.

What summary injection means between phases

Many long agent tasks are naturally phased: discover, then plan, then implement. Summary injection is the practice of ending each phase by distilling its findings into a compact summary and starting the next phase from that summary rather than from the full, verbose transcript of everything that came before. The discovery phase might read fifty files; the planning phase does not need those fifty files, it needs the handful of conclusions they produced. Injecting that conclusion-summary at the phase boundary keeps the window lean exactly when it is about to be reused for new work.

This is the proactive cousin of the degradation problem from the prerequisite knowledge point. Instead of waiting for the window to fill and the answers to turn generic, you intervene at a clean seam in the workflow. Anthropic's own guidance describes phase-boundary summarisation as a core context-engineering move, and it pairs especially well with subagent delegation: summarise phase one, then spawn the phase-two subagents against the summary, never against the raw phase-one bulk.

Summary injection: Summarising the findings of a completed phase and carrying that summary, not the full transcript, into the next phase. It keeps the context window focused on conclusions rather than the verbose work that produced them.

What /compact does to a full window

Where summary injection is a workflow discipline you apply at phase seams, /compact is a direct context-reduction operation you apply when the window itself is filling. Running it triggers a summarisation pass: Claude reads the conversation so far, produces a condensed summary, and replaces the existing history with that summary as the new starting context. The Claude Code guidance puts it plainly: "Compact asks the model to summarize the conversation so far, then replaces the history with that summary." The token space the old transcript occupied is freed, and the session continues with the thread intact but the bulk gone.

Anthropic offers a server-side version of the same idea for API workflows. Its compaction documentation describes automatic summarisation that "condenses earlier parts of a conversation, enabling long-running conversations beyond context limits with minimal integration work," triggered when input tokens cross a configurable threshold. Whether you invoke it manually in Claude Code or enable it server-side via the API, the shape is identical: replace accumulated history with a faithful summary so the agent can keep going.

Summary injection at phase seams plus /compact within a phase

Loading diagram...

Summary injection trims at clean phase boundaries; /compact trims within a phase when the window fills. Both free space without ending the session.

Why timing decides whether compaction helps

The apply-level subtlety the exam loves is when to compact. Auto-compaction fires when context approaches the window limit, but by then the model is already operating in a degraded state, so it summarises from a fuzzy starting point and the summary is worse. Manual compaction, invoked proactively while there is still headroom, runs while the model still has clear recall of the full conversation and therefore produces a higher-fidelity summary. The practical rule architects cite is to compact early, around the point the window is comfortably filling rather than nearly full, precisely because a clear-headed model summarises better than a degraded one.

The second subtlety is that compaction is lossy on purpose. Anthropic notes that compaction "distills the contents of a context window" but warns that "overly aggressive approaches risk losing subtle but critical details." That is the bridge back to the rest of Task 5.4: summary injection and /compact keep the window usable, but anything that must remain exact, specific identifiers, amounts, the one race condition you found, belongs additionally in a scratchpad file or a persistent block. Summarise to stay lean; persist to stay exact. The two are complementary, not interchangeable.

Worked example

An agent is running a three-phase database migration, discover the schema, plan the changes, implement them, across one long session.

Phase one explores the schema: forty tables, their constraints, and the foreign keys an unfamiliar legacy system has accreted. That is a lot of verbose output. Before moving on, the agent injects a summary: "12 tables in scope; orders and payments lack a foreign key; users.email is non-unique; migration must run in this order." Phase two, planning, starts from that summary, not from the forty raw table dumps, so the planning window is clean and focused on decisions.

Mid-way through planning, the window starts filling again as the agent reasons through edge cases and pulls in a few more files. With headroom still left, the engineer runs /compact. Claude summarises the discussion so far, replaces the history with that summary, and planning continues in a freshly lean window. Crucially, the agent had already written the load-bearing facts, the missing foreign keys, the ordering constraint, to a scratchpad, so even though /compact condensed the discussion, those exact facts remain recoverable.

The exam trap here is two-fold. First, spawning phase two without summarising phase one, so the planning window inherits all the discovery bulk and degrades immediately. Second, waiting until auto-compact fires at the limit, when the model is already degraded and produces a poor summary. The correct apply-level moves are to summarise at the phase boundary and to compact proactively, with headroom, before degradation sets in.

How /compact differs from clearing and context editing

It is worth separating /compact from the neighbouring operations it is easily confused with, because they make different trade-offs. Clearing the context, the /clear style of reset, wipes the conversation and starts fresh. That maximally frees space, but it also throws away the thread: the agent forgets what it was doing. /compact is the middle path; it frees space while preserving continuity, replacing history with a summary so the agent keeps its bearings. The choice between them is about whether you need the thread to survive. Starting an unrelated new task favours clearing; continuing the same long task favours compaction.

Context editing is a third, finer instrument. Rather than summarising the whole conversation, it surgically removes specific low-value content, clearing old tool results, or stripping thinking blocks, while leaving the rest of the window intact. Anthropic positions tool-result clearing and thinking-block clearing as targeted strategies for agentic workflows where a few categories of output dominate the bulk. The mental model is a spectrum of aggressiveness: context editing trims named categories, /compact condenses everything into a summary, and clearing discards the lot. An architect picks the least aggressive operation that recovers enough space, because each step up the spectrum trades away more of the original detail.

Steering what the summary keeps

A compaction is only as good as what it chooses to preserve, and you are not entirely at its mercy. Anthropic's server-side compaction lets you override the default summarisation prompt with custom instructions, so you can tell it to retain the things your task cannot lose, decisions made, key identifiers, the remaining to-do list, rather than accepting a generic summary. It also exposes a configurable trigger: by default it fires when input crosses roughly 150,000 tokens, with a floor around 50,000, and a pause_after_compaction option that lets recent messages stay verbatim so the freshest context is never summarised away. These knobs turn compaction from a blunt reset into a tunable curation step.

The lesson for the exam is that summarisation is a design decision, not an accident that befalls a full window. You decide when it happens (proactively, with headroom), what it preserves (via steering instructions), and what it must never be trusted to hold (exact facts, which go to a scratchpad or manifest instead). Treating /compact as something you configure and time, rather than something that simply triggers at the limit, is the difference between a session that stays sharp across a long task and one that quietly loses a critical detail in an automatic summary it never reviewed.

It also reframes how the server-side and client-side flavours relate. The Claude Code /compact you type and the API compaction you enable with a beta header are the same idea expressed at two layers: one is an interactive command for a coding session, the other an automatic policy for a programmatic agent. An architect designing a long-running service leans on the configurable server-side trigger so that compaction happens without a human in the loop; an engineer steering an interactive session reaches for the manual command at a moment of their choosing. Recognising that these are one technique with two entry points, rather than two competing features, is part of understanding the mechanism rather than just the keystroke.

How server-side compaction works on the API

For programmatic agents it helps to know the exact shape of the server-side mechanism, not just its effect. You enable it by adding the compaction edit, compact_20260112, to the context_management.edits field of a Messages API request, alongside a trigger that sets the token threshold. From then on the condensing happens automatically: when input crosses the trigger, the API generates a summary, returns a compaction block at the start of the assistant response that holds that summary, and continues the conversation from it.

The handling on later turns is what makes this a summary-and-restart rather than a blind truncation. You append the assistant response as you normally would, and on subsequent requests the API drops every message block that came before the compaction block, so the old transcript stops counting against the window while the summary carries the thread forward. The boundary is explicit and the discard happens only after a faithful summary has taken the bulk's place, which is exactly why an architect can treat compaction as a managed checkpoint and still reach for the timing and steering controls covered above.

How this is tested on the exam

Task 5.4 questions describe a multi-phase or long-running session and ask how to keep the window usable. When the scenario emphasises moving between phases, the answer is summary injection: condense the prior phase and carry the summary forward. When it emphasises a window that is filling during work, the answer is /compact: replace history with a summary to free space. The strongest answers add the timing rule, compact proactively, not at the limit, and the caveat that compaction is lossy, so exact facts need durable storage alongside it.

Distractors exploit the two timing mistakes and the one scope mistake. They may suggest starting the next phase without summarising, waiting for the limit before compacting, or treating compaction as a substitute for persisting exact facts. Recognising that summarisation keeps you lean while persistence keeps you exact, and that proactive beats reactive, is the judgement this knowledge point rewards. Where the better fit is isolating noisy work, that is delegation; where it is surviving a crash, that is a manifest.

Misconception

It is best to wait until auto-compaction triggers at the context limit, since that is when compacting is actually needed.

What's actually true

By the time the window is at its limit the model is already degraded, so it summarises poorly. Compact proactively, while there is still headroom and clear recall, to get a higher-fidelity summary. Earlier is better, not wasteful.

Misconception

Running /compact preserves everything, so there is no need to also save findings elsewhere.

What's actually true

Compaction is lossy by design. It condenses, and aggressive summarisation can drop subtle but critical details. Keep facts that must stay exact in a scratchpad or persistent block so a summary cannot quietly lose them.

Check your understanding

An agent runs a three-phase migration (discover, plan, implement) in one session. Moving from discovery to planning, what keeps the planning window clean, and what should the agent do if that window starts filling mid-phase?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Summary Injection and /compact for Context Management