Progressive Summarisation Trap

In short: Progressive summarisation is the practice of repeatedly compressing older conversation turns into a shorter recap so the context window does not overflow. The trap is that compression is lossy: exact amounts, dates, and identifiers get smoothed into vague phrases, so an agent that summarised a transactional conversation can no longer act on the specifics it once had.

What the progressive summarisation trap is

Progressive summarisation is a sensible-sounding answer to a real problem. A customer-support conversation grows turn by turn, the running history climbs toward the context window limit, and so you periodically ask the model to condense everything said so far into a tidy recap. You then carry the recap forward instead of the raw transcript. It works beautifully for chit-chat and general back-and-forth, which is exactly why it is so tempting to apply everywhere.

The trap is that summarisation is lossy on purpose. A good summary keeps the meaning and throws away the texture, and to a summariser an exact dollar figure or an order number looks like texture. So the line "the customer wants a refund of 247.83 dollars for order 8891 placed on 3 March" quietly becomes "the customer wants a refund for a recent order." The conversation still reads coherently, which is what makes the failure so dangerous: nothing looks broken until the agent tries to actually issue the refund and discovers it no longer knows the amount, the order, or the date.

Progressive summarisation trap: Repeatedly compressing older conversation turns to stay within the context window, in a setting where the discarded specifics (amounts, dates, identifiers) are precisely the data the agent still needs to complete its task.

The specific data that gets compressed away

It helps to be concrete about what disappears, because the losses are not random. Summarisers preserve narrative and intent and discard anything that reads as a low-level particular. Four categories are routinely the first to go.

Numerical values such as refund amounts, balances, quantities, and prices. A summary happily reduces "247.83 dollars" to "a refund," because the number feels incidental to the story.
Dates and times like an order date or a delivery window. "Placed on 3 March" becomes "recently," which is unusable the moment a policy depends on a 30-day return window.
Identifiers including order numbers, ticket IDs, and account references. These are high-entropy strings with no narrative meaning, so a recap drops them almost every time.
Stated customer expectations, for example a promised callback time or an agreed resolution. The commitment survives as a vague intention while its enforceable specifics evaporate.

Notice that every one of these is exactly what an agent needs to take a correct, auditable action. The summariser is optimising for readability, but the agent needs precision, and those two goals pull in opposite directions.

Why this is a context-management failure, not a model failure

It is tempting to blame the model for "forgetting," but the model never forgot anything. It answered faithfully based on the text it was given, and the text it was given was a recap with the specifics already removed. This is the heart of Domain 5: reliability problems usually live in how you curate context, not in the model's raw capability.

Anthropic's own guidance frames compaction (its name for automatic, server-side summarisation near the window limit) as a way to "extend the effective context length for long-running conversations" by condensing older context. That is genuinely useful, but it shifts responsibility onto you to decide what must never be condensed. Summarisation is a tool for managing volume; it is not a memory system, and treating it as one is the mistake the exam wants you to catch.

How a transactional fact is lost to a recap

Loading diagram...

Blanket summarisation strips the specifics; extracting them first preserves them through compression.

How the exam frames it

Domain 5, Context Management and Reliability, carries fifteen percent of the exam, and this knowledge point is its entry door. Task statement 5.1 is about preserving critical information across long interactions, and the customer-support resolution scenario is the natural home for it. Questions rarely use the phrase progressive summarisation directly. Instead they describe a long support chat where the agent suddenly cannot recall the order number, or a multi-agent research run that condensed its findings and lost the figures, and they ask you to identify the root cause and the right remedy.

The wrong answers usually propose tuning something probabilistic, raising a token limit or adjusting the model, while the correct answer points at the curation strategy: the agent applied lossy compression to data that had to stay exact. Recognising that the defect is a design choice about what gets summarised, rather than a model shortcoming, is the judgement being assessed.

The fix: extract before you compress

The remedy is not to abandon summarisation, which you still need to control token growth. It is to change the order of operations. Before any recap is produced, extract the hard transactional facts into a separate, structured block, and exclude that block from everything the summariser touches. The recap then compresses only the conversational filler, while the figures, dates, and identifiers ride alongside it untouched.

This is the bridge to the next knowledge point, the persistent case facts block, which is the concrete implementation of "extract first." It is also why tool result trimming is a sibling technique: both accept that you cannot keep everything, so both decide deliberately what to keep verbatim rather than letting a lossy process decide for them. The unifying principle is that critical data should be pinned by design, never left to survive a summary by luck.

Worked example

A support agent handles a billing dispute over many turns, and a rolling summariser compresses the history every ten turns to stay within budget.

Early in the conversation the customer says, "I was double-charged 247.83 dollars on order 8891, which I placed on 3 March, and I want a full refund." The agent acknowledges this and continues troubleshooting: it checks the order, confirms a duplicate charge, and explains the refund policy. By turn fifteen the running transcript is long, so the summariser fires and produces a recap that reads, "The customer reported a duplicate charge on a recent order and would like a refund. The agent confirmed the duplicate and explained the policy."

Everything in that recap is true, and it is also useless for the final action. When the agent reaches the step where it must call the refund tool, it needs an amount, an order identifier, and a date to validate the return window. None of those survive in the recap. The agent now either guesses, asks the customer to repeat information they already gave (a poor experience that erodes trust), or worse, issues a refund for the wrong amount.

Contrast this with the extract-first approach. At the first mention, a small extraction step writes a case facts block: amount 247.83, order 8891, order date 3 March, requested resolution full refund. That block is pinned to every prompt and is never handed to the summariser. The recap still compresses the chatty middle of the conversation, but when the agent finally issues the refund, the exact figures are right there, intact. Same token budget, same summarisation, but the data that mattered was protected because it was removed from the compressible pile before compression ever ran.

Summarising well instead of summarising less

The lesson here is emphatically not that summarisation is bad. A long support conversation genuinely does need its volume controlled, and a recap of the back-and-forth is a perfectly good way to do that. The defect is treating one undifferentiated stream as if every part of it were equally compressible. Once you split the conversation into a compressible narrative and a non-negotiable set of facts, summarisation becomes safe, because the worst it can do is blur the parts that were always meant to be approximate.

A useful test before you compress anything is to ask, for each candidate piece of information, whether an action later in the case will read it back as an exact value. If the answer is yes, it belongs in the pinned record, not the summary. Refund amounts, return-window dates, and order identifiers all fail that test loudly. Tone, rapport, and the general shape of the discussion all pass it: nothing downstream will parse the customer's politeness to two decimal places. Sorting information by that single question is most of the skill, and it is why this knowledge point is graded at the understand level rather than mere recall. You are being asked to predict, in advance, which details are load-bearing.

There is also a reliability dividend. A conversation whose facts are pinned can be compressed aggressively without fear, which means you can keep the working transcript short and therefore keep the model's attention focused. Bloated history is not just expensive; it dilutes the model's grip on what matters. By deciding early what must survive verbatim, you simultaneously protect accuracy and earn the freedom to summarise the rest hard.

Spotting the trap in a running system

In production the trap rarely announces itself. The agent keeps responding fluently, customers keep getting answers, and the conversations read fine in a transcript viewer. The damage surfaces only at the moment of action, when a refund is the wrong size or a return is wrongly rejected because the window date had been blurred to a vague recently. Because the symptom is downstream of the cause by many turns, teams often misattribute it to a tool bug or a flaky model rather than to their own compression step.

A few signals give it away. If error rates climb specifically on longer conversations while short ones stay clean, suspect that something is being lost as history is compressed. If agents start re-asking customers for information that was clearly provided earlier, the history they are reasoning over no longer contains it. And if your logs show that summarisation fired shortly before the failure, you have likely found the culprit. The remedy is the same in every case: identify which fields the failing action needed, confirm they were inside the summarised region, and move them into a pinned record so the next case cannot lose them. Treat each incident as a prompt to grow the set of facts you protect, and the trap closes a little more with every fix.

Misconceptions this knowledge point corrects

Misconception

Summarising the conversation is safe because the model can still infer the missing details from the recap.

What's actually true

It cannot infer specifics that were never preserved. Once 247.83 dollars has been compressed to a refund, the exact figure is gone from the context, and the model has nothing to reconstruct it from. Inference fills gaps in reasoning, not gaps in deleted data.

Misconception

Progressive summarisation is always the right way to handle a context window that is filling up.

What's actually true

It is right for high-volume, low-precision history and wrong for transactional conversations. The correct move is selective: extract the exact facts into a pinned block first, then summarise only the remaining narrative. Volume control and fact preservation are separate jobs.

Where it sits among the context-management skills

Read this knowledge point as the why, and read its neighbours as the how. It establishes that lossy compression and precise action are in tension, which sets up the persistent case facts block as the structural answer and tool result trimming as the same discipline applied to tool outputs. It also connects outward to session techniques like writing findings to a scratchpad, because every one of those patterns is a different way of deciding, in advance, what must outlive a summary. Master this trap and the rest of Domain 5 reads as a set of deliberate answers to a single question: what are you willing to lose when space runs short?

Check your understanding

A customer-support agent resolves long billing cases. To stay within the context window, the team enabled a rolling summariser that condenses the conversation every ten turns. Agents now frequently issue refunds for the wrong amount or ask customers to repeat their order number. What is the root cause and the best fix?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Progressive Summarisation Trap: Why Compressing History Loses Critical Data