- In short
- Context management strategy selection is the judgement of matching a technique to a situation: scratchpad files for short sessions that must keep findings, subagent delegation for noisy investigations, summary injection between phases, and /compact plus manifests for long-running, crash-prone work. No single technique fits every case, so the skill is choosing and combining.
What context management strategy selection actually asks
Context management strategy selection is the capstone of Task 5.4, and unlike the four techniques beneath it, it is not a thing you do. It is a judgement you make. The four hard prerequisites each give you a tool, and each tool is correct for a different situation. The evaluate-level skill is reading a scenario, identifying which pressures are actually present, and choosing the technique or combination that fits. A question at this level is rarely "what does /compact do?"; it is "given this session, which approach is right, and why are the others worse here?"
The reason a single answer never suffices is that the techniques solve genuinely different problems. Anthropic frames the whole space as curation: because recall degrades as the window fills, "curating what's in context" is the real job, and curation has several instruments. A scratchpad preserves exact findings. A subagent isolates noisy work. Summary injection and /compact reduce bulk. A manifest survives interruption. Reaching for the wrong instrument, summarising when you needed to isolate, or isolating when you only needed to write a note, wastes effort and can lose the very thing you were trying to protect.
- Strategy selection
- The evaluate-level skill of choosing among context-management techniques based on a session's characteristics, length, investigation complexity, phase structure, and crash risk, rather than applying a single default everywhere.
How to read a session and match the tool
The selection turns on a few diagnostic questions about the session in front of you. Is the session short, but with findings you must not lose? A scratchpad is usually enough. Is the work a noisy investigation whose output would flood the main window? Delegate it to a subagent so the noise stays isolated. Is the task naturally phased? Inject a summary at each boundary so the next phase starts lean. Is the main window filling mid-work? Run /compact, proactively, to reduce it. Is the job long-running and exposed to crashes? Add a manifest so an interruption is recoverable. These are not mutually exclusive; the realistic answer to a complex scenario combines several.
Why combining beats defaulting
The most sophisticated answers at this level layer the techniques, because the techniques are complementary by design. A subagent can write its detailed findings to a scratchpad while returning only a summary to the main agent, isolation plus persistence. Summary injection and /compact keep windows lean but are lossy, so they pair with a scratchpad or manifest that keeps the must-have-exact facts intact. A manifest typically stores summarised state, so it leans on the summarisation skill. Anthropic's own guidance pairs these explicitly: it recommends using compaction and the memory tool together so that "compaction keeps the active context manageable" while "memory persists important information across compaction boundaries so that nothing critical is lost in the summary."
That complementarity is why the evaluate-level trap is so specific: applying one strategy to all situations without considering requirements. An architect who only ever summarises will lose exact identifiers a scratchpad should have held. One who delegates everything pays subagent overhead on trivial lookups. One who writes a manifest for a five-minute task adds ceremony that buys nothing. Good selection is proportionate. It spends complexity only where the session's pressures justify it, and it leaves a short, simple session simple. This same proportionate-judgement pattern recurs across Domain 5, which is why this KP relates to error-propagation strategy design and escalation decision analysis: each is an evaluate-level question about choosing a balanced approach over a one-size default.
Worked example
An architect must plan context management for a multi-day, multi-agent effort that maps and refactors a 2,000-file monolith across discovery, planning, and implementation phases.
This scenario activates almost every pressure at once, so a single technique would be obviously inadequate, and that recognition is the first evaluate-level move. The architect decomposes by pressure. The discovery phase is noisy: dozens of areas, each hundreds of files. That calls for subagent delegation, one isolated investigator per area, each returning a summary so the coordinator's window stays clean. Each subagent also writes its detailed findings to a scratchpad, so the depth is recoverable without entering the main window.
The work is phased, so at the discovery-to-planning boundary the coordinator injects a summary of confirmed findings rather than dragging the raw exploration forward, and within the long planning phase it runs /compact proactively whenever the window fills. Finally, because the effort spans days and machines, it is crash-prone: each agent exports a manifest continuously, and the coordinator defines a resume protocol that reloads and re-injects them. The result is a layered strategy, isolate, persist, summarise, compact, recover, each element chosen because a specific pressure is present.
Now contrast a weaker plan that defaults to one tool: "just run /compact whenever the window fills." It keeps the window lean but isolates nothing, so the coordinator still absorbs all the discovery noise before compacting; it persists nothing exact, so lossy compaction quietly drops specific findings; and it survives no crash, so a day-three reboot wipes everything. The exam rewards the architect who reads each pressure and selects accordingly, and penalises the one who applies a single favourite technique to a situation that needs several.
A rubric across session profiles
Selection becomes concrete when you reason from a few recurring session profiles. A short lookup, answer one question against a small surface, needs almost nothing; if a finding must outlive the session, a single scratchpad covers it. A long single-agent investigation of one area benefits from a scratchpad plus proactive /compact, writing findings durably while keeping the one growing window lean. A multi-phase build, discover, plan, implement, adds summary injection at each phase boundary so a clean window carries only conclusions forward. And a multi-agent, multi-day effort layers the full stack: subagents to isolate noisy discovery, scratchpads for depth, summaries at seams, /compact within phases, and manifests for crash recovery.
A practical way to apply the rubric under exam time pressure is to scan the scenario for trigger words and let each map to its instrument: "hundreds of files" or "verbose output" signals isolation, "must not lose" or "specific values" signals persistence, "phase" or "next stage" signals summary injection, "filling up" or "running out of room" signals /compact, and "overnight," "multi-day," or "crash" signals a manifest. The profile is just the sum of the triggers present. Reading those profiles back to front reveals the principle: complexity in the strategy should track complexity and risk in the session, nothing more. Each pressure you can name in the scenario, noise, exactness, phasing, a filling window, crash exposure, earns exactly one technique, and pressures you cannot name earn nothing. That is why the rubric is additive rather than a fixed recipe: you start from the simplest viable approach and add an instrument only when a specific, identifiable pressure demands it. An architect who can articulate which pressure each chosen technique answers, and can defend the omission of the ones left out, has demonstrated the selection skill the exam is probing.
Where context editing fits the selection
The five instruments in the rubric, scratchpad, subagent, summary injection, /compact, and manifest, are the ones the exam foregrounds, but a complete selection also weighs a sixth, finer option: context editing. Anthropic's context-windows guidance describes editing strategies such as tool-result clearing and thinking-block clearing that surgically remove a named category of content instead of condensing the whole window. It is the least aggressive way to recover space, and knowing when to prefer it is part of the evaluate-level judgement.
The selection cue is the shape of the bulk. When the tokens crowding the window are specifically old, already-digested tool output, long file dumps, search results, verbose command logs, context editing is the precise tool: it clears that one category and leaves the reasoning thread and your conclusions untouched. Reaching for /compact in that situation is heavier than necessary, because summarising everything also condenses the thread you wanted verbatim, and clearing the context is heavier still, because it throws the thread away entirely. Match the instrument to which part of the window is actually the problem.
This is the proportionality principle from the rubric applied one notch finer. Ordered by how much they remove, editing clears a single named category and keeps everything else, /compact rewrites the whole window as a condensed summary, and a full reset abandons the conversation outright. The rule is to choose the gentlest step that recovers enough room: when one identifiable category of output dominates, edit it away; only when the excess is diffuse, spread thinly across the entire conversation, does the heavier summary-and-restart of /compact earn its cost.
Why this is an evaluate-level skill, not a checklist
It would be tempting to reduce all of this to a lookup table and call it solved, but the exam deliberately places this knowledge point at the evaluate level because the mapping is not mechanical. The same surface scenario can warrant different strategies depending on constraints the prose only hints at: a tight token budget pushes harder toward isolation and compaction; a high cost of failure pushes toward manifests even on a job that might otherwise seem short enough to skip them; a latency-sensitive task may avoid subagent overhead it could technically afford. Judgement means weighing those constraints, not pattern-matching keywords.
This is the same shape of reasoning that recurs across Domain 5's evaluate-level knowledge points. Choosing a context strategy, designing an error-propagation strategy, and analysing an escalation decision all ask the architect to reject a one-size default in favour of a balanced approach justified by the situation. The unifying competency the exam certifies here is not memorising which tool exists, but reading a real situation, naming its competing pressures, and defending a proportionate design against the alternatives. Strategy selection is where the four techniques of Task 5.4 stop being facts to recall and become a decision to argue.
How this is tested on the exam
As an evaluate-level knowledge point, Task 5.4 presents a richer scenario and asks you to choose and justify a strategy, or to critique a proposed one. The correct answers map specific session characteristics to specific techniques, short-but-must-keep to scratchpad, noisy to subagent, phased to summary injection, filling to /compact, crash-prone to manifest, and, for complex cases, combine them. They also explain why a tempting single-tool answer fails on the dimensions it ignores.
The signature distractor is the one-size-fits-all choice: a plausible technique applied universally, ignoring a pressure it does not address. Spot it by checking each of the scenario's pressures against the proposed strategy and naming what it leaves uncovered. The other distractor over-engineers, heavy machinery on a trivial session, and is wrong because selection is proportionate. Hold the mapping and the proportionality principle together and these capstone questions resolve cleanly. They are the synthesis the rest of Task 5.4 was building toward, and they connect outward to the broader decomposition-strategy judgement in Domain 1.
Misconception
There is a single best context management technique, so a good architect applies it consistently to every session.
What's actually true
Misconception
More context management is always safer, so every session should use scratchpads, subagents, summaries, and manifests together.
What's actually true
An architect plans a multi-day, multi-agent effort to map and refactor a 2,000-file monolith across discovery, planning, and implementation. A teammate proposes simply running /compact whenever the window fills. As an evaluation, which response is strongest?
People also ask
How do I choose a context management strategy?
When should I use a subagent vs /compact?
Do I need a manifest for short sessions?
What is the most common mistake in context management?
Watch and learn
Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.
Controlling context
Why watch: Choosing when to compact, clear, or summarise based on session state is the strategy-selection skill this KP assesses.
More videos for this concept
References & primary sources
Master this concept with Archie
Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.