Upstream Agent Optimisation

In short: Upstream agent optimisation is the practice of redesigning a producer (upstream) agent so it returns compact, structured findings (key facts, citations, relevance scores) rather than verbose content and full reasoning chains. The aim is to protect the limited context budget of the downstream agents that consume its output, so a multi-agent pipeline stays within its windows and remains reliable.

What upstream agent optimisation is

In a multi-agent pipeline, agents hand work to one another. A producer (upstream) agent investigates something and passes its result to a consumer (downstream) agent that synthesises, decides, or acts. Upstream agent optimisation is the practice of shaping what the producer returns so that it respects the consumer's context budget. Instead of returning everything it generated, the upstream agent returns a compact, structured digest: the key facts it found, the citations that support them, and relevance scores that tell the consumer what matters most.

The motivation is a budget that is not yours to spend. When an upstream agent dumps its full output, including the long reasoning chains it produced along the way, that verbose payload lands in the downstream agent's window and eats the room the consumer needs to do its own job. Optimise the upstream contract and the downstream agent receives a small, high-signal input it can actually work with. Anthropic's account of building a multi-agent research system makes the same point: subagents explore in their own context windows and then condense the most important tokens for the lead agent, rather than flooding it with raw exploration.

Upstream agent optimisation: Redesigning a producer agent to return structured findings (key facts, citations, relevance scores) instead of verbose content and reasoning chains, in order to protect the limited context budget of the downstream agents that consume its output.

The downstream budget problem

A coordinator or synthesis agent has a fixed window, and in a multi-agent run it has to fit several producers' outputs into it at once. If each producer returns a sprawling transcript, the coordinator's window fills before it has even begun to reason. The symptoms are the familiar ones from elsewhere in Domain 5: the synthesis becomes shallow, earlier inputs get crowded out, and the most important details from any single producer risk being buried in the middle of a huge concatenation, exactly the lost in the middle effect at the pipeline scale.

The crucial shift in thinking is whose budget is being spent. With tool result trimming, you trim a tool response to protect your own window. Upstream agent optimisation applies the same discipline across an agent boundary: the producer constrains its output to protect someone else's window. That is why it is a harder, apply-level skill. You are no longer tidying your own context; you are designing a contract between agents so the whole system stays within budget.

Verbose hand-off versus optimised structured return

Loading diagram...

The same investigation, returned as a structured brief instead of a full transcript, frees the coordinator budget.

The structured return contract

Optimising the upstream agent is mostly about defining a tight output schema and holding the producer to it. Three kinds of field do most of the work. Key facts are the distilled conclusions, each stated once and precisely, with no surrounding narration. Citations attach each fact to its source, a URL, document name, or excerpt, so provenance survives the hand-off and the consumer can trust or verify without re-reading everything. Relevance scores let the producer signal which findings matter most, so the consumer can prioritise and, importantly, front-load the high-value items where downstream attention is strongest.

What the contract deliberately omits is just as important. The producer's reasoning chain, its dead ends, its verbose quotations, and its intermediate scratch work all stay in the producer's own context and never cross the boundary. The producer is free to think as expansively as it likes; it simply does not export that thinking. This separation, expansive private reasoning and a compact public return, is the heart of the pattern, and it pairs naturally with platform features like context editing that keep an agent's own working window lean while it operates.

Designing it in practice

Start from the consumer and work backwards. Ask what the downstream agent actually needs to synthesise or decide, express that as a schema, and make the upstream agent populate exactly that schema. Enforce a rough size budget per producer so a single verbose agent cannot monopolise the coordinator's window, and order the returned findings by relevance so the most important ones sit at the top of the consumer's input. When a producer is tempted to include its reasoning to justify a conclusion, prefer a citation over a transcript: the evidence, not the deliberation.

This is where upstream optimisation composes with its prerequisites. It assumes you already trim at tool boundaries, because a producer that has not trimmed its own tool exhaust has nothing compact to return. It assumes you understand positional recall, because the whole point of returning ranked, structured findings is to land the valuable ones where the consumer will actually use them. Done well, the pattern lets a pipeline scale: each producer does deep work cheaply in isolation, and the coordinator integrates many small, high-signal briefs instead of drowning in a few giant ones.

Worked example

A multi-agent research system has three subagents investigating different angles of a market question, all reporting to a lead synthesis agent with a fixed window.

In the first design, each subagent returns its complete working transcript: the queries it ran, the full text of the passages it read, its step-by-step reasoning, and finally its conclusions. Each transcript is around twelve thousand tokens. The lead agent must load all three at once, so before it writes a single word of synthesis its window is almost full of raw exploration. The synthesis it produces is thin, it loses track of the first subagent's findings while reading the third, and the few genuinely important numbers are buried mid-context where the lead reads least reliably.

The team applies upstream agent optimisation. Each subagent is redesigned to reason privately, exactly as before, but to return only a structured brief of about fifteen hundred tokens: a short list of key facts, each with a citation to the source passage and a relevance score, ordered most-important-first. The subagents are doing the same investigation; they are simply not exporting their scratch work.

Now the lead agent loads three compact, ranked briefs that together occupy a fraction of its window. It has ample budget to reason, the high-relevance facts sit at the tops of their briefs where attention is strong, and every claim arrives with a citation so the synthesis is both grounded and verifiable. The pipeline produces a deeper, better-attributed result, and it does so because the optimisation happened at the producers, before their output ever reached the consumer's budget.

Choosing the schema from the consumer backwards

The hardest part of optimising a producer is deciding what its structured return should contain, and the reliable way to decide is to start from the consumer. Ask what the downstream agent must produce, then ask what minimal set of inputs that output depends on, and let that set define the producer's schema. A synthesis agent writing a market summary needs claims, the evidence behind each claim, and a sense of which claims matter most, so the producer returns facts, citations, and relevance scores and nothing else. Designing forwards from everything the producer happens to know leads straight back to the bloated transcript you were trying to avoid.

Two schema choices pay for themselves repeatedly. Including a citation with every fact means the consumer can trust or verify a claim without the producer shipping the source text wholesale, turning a potentially huge payload into a short reference. Including a relevance score means the consumer can rank and front-load, landing the important findings where its own attention is strongest. Both are small fields that replace large volumes of raw content, which is the whole economic logic of the pattern: spend a few tokens describing a finding well so you can avoid spending thousands reproducing the material it came from.

Why the optimisation belongs upstream

It is fair to ask why the producer should bear this responsibility rather than letting the consumer clean up whatever it receives. The answer is budget arithmetic. Once several verbose outputs have arrived, the consumer's window may already be full, so any tidying it attempts is both too late and itself lossy. Fixing the contract at the source means the bloat is never created in the first place, and the producer is also the agent best placed to compress its own work, since it alone knows which of its findings were solid and which were dead ends.

This is the same logic as trimming a tool result before appending it rather than after, lifted across an agent boundary. In both cases the cheapest and most reliable place to remove waste is at the point of production, before it propagates anywhere. A pipeline built on that principle scales gracefully: each producer does deep, expensive work privately and emits a small, honest summary, and the coordinator integrates many such summaries within a budget that never has to absorb the raw cost of the work behind them.

Misconceptions to retire

Misconception

Upstream agents should return their full reasoning so the downstream agent can see how each conclusion was reached.

What's actually true

Exporting full reasoning is what exhausts the downstream budget. The producer should reason privately and return only distilled facts, citations, and relevance scores. If the consumer needs to justify a claim, a citation is far cheaper and more reliable than a transcript of deliberation.

Misconception

The downstream agent can just summarise the verbose inputs once it receives them.

What's actually true

By then the budget is already spent: loading several verbose transcripts can fill the window before synthesis even starts, and summarising after the fact is lossy and late. Constrain the output at the source so the bloat never reaches the consumer in the first place.

How the exam frames it

This is an apply-level knowledge point anchored in the multi-agent research scenario, so a question will typically describe a pipeline where downstream synthesis is degrading because producers return too much, and ask for the best structural fix. The strong answer changes the upstream contract to return compact, structured, ranked findings; the weak answers enlarge the downstream window, summarise after the inputs arrive, or simply run fewer subagents. Showing that you optimise at the source, protecting another agent's budget by design, is the judgement being assessed, and it is the natural culmination of the trimming and positioning skills earlier in task statement 5.1.

Check your understanding

In a multi-agent research pipeline, three subagents each return their full investigation transcript to a lead synthesis agent, whose window fills before it can reason and whose output is shallow and poorly attributed. What is the most effective fix?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Upstream Agent Optimisation: Returning Structured Data to Save Context