Tool Result Trimming for Claude Agents

In short: Tool result trimming is the practice of reducing a verbose tool response down to only the fields the agent actually needs before that response is appended to the conversation. It treats the tool boundary as a filter, so accumulated tool output cannot quietly exhaust the context budget and trigger the recall degradation known as context rot.

What tool result trimming solves

Agents call tools, and tools are generous. An order lookup, a CRM query, or a search endpoint typically returns a rich object: dozens of fields covering metadata, audit timestamps, internal flags, nested relationships, and pagination cursors, when the agent needed perhaps five of them. Tool result trimming is the discipline of cutting that response down to the fields that matter before the result is appended to the conversation. The tool boundary becomes a filter rather than a firehose.

The reason this is a distinct skill, and not just tidiness, is the compounding cost. Because the Messages API accumulates history, a verbose tool result is not paid for once. It is resent as input on every subsequent turn, so forty unnecessary fields appended early in a long run are re-billed and re-processed dozens of times over. Worse, that bloat consumes the very budget you need for reasoning, and as the window fills, accuracy and recall degrade through what Anthropic calls context rot. Trimming attacks the problem at its source.

Tool result trimming: Reducing a tool response to only the fields the agent requires before appending it to the conversation, so verbose or irrelevant output cannot accumulate across turns and exhaust the context budget.

The forty-fields problem

Picture a get_order tool that returns the full order record: line items, prices, the shipping address, billing address, carrier, tracking number, fulfilment-centre code, internal risk score, a dozen status timestamps, customer marketing preferences, and more. For the task at hand, deciding whether a return is within policy, the agent needs the order ID, the total, the order date, the item count, and the current status. Five fields. The other thirty-five are dead weight.

Append the whole record and three things happen. The immediate request is larger than it needs to be. Every future request in this conversation carries those thirty-five irrelevant fields again. And if the agent makes several such lookups, the bloat stacks, so a run that should fit comfortably starts brushing the window limit purely on tool exhaust. None of this is hypothetical: it is the default outcome of appending raw tool output, and it is why trimming is treated as core context-management hygiene in task statement 5.1.

Trim at the tool boundary before appending

Loading diagram...

The trim step sits between the raw tool response and the conversation, so only relevant fields accumulate.

Trim before you append, not after

The single most important rule is ordering: trim before the result enters the conversation. Once a verbose result has been appended, it is part of the history and it will be resent on every following turn until you actively edit it out. Trimming after the fact means you have already paid for it at least once and you now need a separate clean-up step. Trimming before means the bloat never enters the budget at all.

This is also where tool result trimming differs from its sibling, the progressive summarisation trap. Summarisation compresses conversational narrative after it has accumulated and risks deleting specifics; trimming filters structured tool data at the boundary and keeps the specifics you choose. They are complementary: trimming controls what enters, summarisation controls what remains. At the platform level, Anthropic exposes context editing with tool result clearing for agentic workflows, which removes stale tool output from the window, an automated cousin of the same principle. Doing it at the application boundary, before the append, is the most surgical version because you decide exactly which fields survive.

How to choose which fields to keep

The selection is task-driven, not generic. Ask what the agent will actually do with the result and keep only the inputs to that decision or action. A practical method is to define, per tool, a small projection: a fixed list of fields the agent is allowed to see for a given use, applied as the response comes back. This keeps trimming deterministic and reviewable rather than relying on the model to ignore noise it can still see.

Three guidelines keep the projection honest. Keep anything an action or downstream tool will consume, such as identifiers and amounts. Keep anything a policy check depends on, such as dates and status. Drop everything that is purely descriptive, internal, or duplicative for the current task, even if it might be interesting in the abstract. When a later step genuinely needs a field you trimmed, the fix is to widen that step's projection, not to append everything by default. This mindset, return only what is needed, is exactly what upstream agent optimisation generalises to multi-agent pipelines.

Worked example

A support agent in a long conversation looks up three orders, each via a tool that returns a large record, and the run starts hitting the context limit before resolution.

The agent is helping a customer reconcile three recent purchases. For each, it calls get_order, and each call returns a 42-field record: full addresses, carrier metadata, internal fulfilment codes, marketing flags, and a long array of status-change timestamps. The agent appends each raw record to the conversation. After the third lookup, the history is enormous, dominated by tool exhaust, and the run is brushing the window limit with the actual resolution still ahead. The agent starts to lose track of earlier details, a textbook case of context rot driven entirely by irrelevant tool data.

Now apply trimming. A projection for the reconciliation task keeps five fields per order: order_id, total, order_date, item_count, and status. The trim step runs the instant each tool response returns, so only those five fields are ever appended. Three lookups now add fifteen small fields to the conversation instead of 126 mostly useless ones. The history stays lean, the agent retains its grip on the earlier turns, and there is ample budget left to actually resolve the case.

The customer experience is identical from the outside, because the agent never needed the discarded fields to do its job. What changed is that the team stopped paying, on every single turn, to carry data the task did not use. That is the entire value proposition of trimming: same answer, a fraction of the budget, and far better reliability over a long run.

A checklist for a trim projection

Turning trimming from an instinct into a repeatable design helps it survive contact with a real codebase. A practical projection for a tool is built by answering four questions. Which fields does an action take as input, such as an identifier passed to a refund or a status checked by a policy rule? Which fields does a downstream tool or agent require to continue the workflow? Which fields does a human reviewer need if this case is escalated? And which fields are purely descriptive, internal, or duplicative for the current purpose? The first three define what to keep; the last defines what to drop.

Capture the answer as an explicit, named list per tool and per use, rather than a vague intention to keep things short. A named projection is reviewable in code review, testable, and stable across runs, where leaving the choice to the model on each call is none of those things. When a new step genuinely needs a field you had been dropping, you widen that one projection deliberately, which keeps the decision visible and intentional. The goal is that nobody on the team has to wonder why a particular field is or is not in context, because the projection is the documented answer.

Trimming by age, not just by field

Field projection decides what each tool result contributes; a second axis decides how long it keeps contributing. Even a trimmed result loses relevance once the agent has acted on it, and an early lookup rarely needs to stay at full fidelity dozens of turns later. A complementary pattern is therefore to keep the most recent tool results verbatim while progressively clearing or shrinking older ones once the agent has taken what it needed from them. Anthropic's context editing does exactly this at the platform level, clearing stale tool output from the window so a long run does not drag every early result behind it.

What you keep matters as much as how long you keep it. Prefer stable, semantic identifiers, a slug, an order number, a UUID, over opaque internal references the model cannot reason about later, so a trimmed result still lets the agent re-fetch detail on demand instead of hoarding it just in case. Anthropic's tool-definition guidance makes the same point: design tool responses to return only the high-signal fields the model needs for its next step, built from meaningful keys rather than whatever the upstream system happened to emit.

It also helps to separate two things that sound alike. Constraining a tool's input schema, for example with strict validation, governs what the model may send to the tool; it does nothing to shrink what the tool sends back. Output trimming is a distinct, deliberate step on the return path. Getting both right means valid calls going in and lean results coming out, and neither one substitutes for the other.

How trimming relates to the other context controls

Trimming is one of several levers in this task statement, and it helps to see where it sits relative to the others. It governs what enters the context from tools, deciding at the boundary which structured fields are allowed in. Summarisation governs what remains of the conversational narrative after it has accumulated. Pinning, through a facts block, governs what is protected from compression entirely. The three are complementary rather than competing: a well-run agent trims its tool inputs, pins the exact facts an action will need, and summarises the chatty remainder.

The reason trimming comes first in that sequence is that it is the cheapest and most surgical. Stopping irrelevant data at the door means you never pay to carry it, never have to summarise around it, and never risk it crowding out something important in the middle of a long context. Summarisation and pinning then operate on a stream that is already lean. Architects who internalise this ordering reach for trimming reflexively whenever a tool is verbose, long before the window is anywhere near full, because the cheapest token of all is the one you never admitted to the conversation.

None of this requires exotic machinery. A trim projection is usually just a few lines of code that select fields from a response, and yet it is one of the highest-leverage reliability habits in the whole domain, precisely because it prevents problems instead of cleaning them up after the fact. The discipline is simply to apply it by default, treating every verbose tool as something to filter rather than forward, so that bloat never becomes the conversation's problem to begin with.

Misconceptions to retire

Misconception

It is safest to append the full tool result so the agent has everything in case it needs something later.

What's actually true

Appending everything is the costly default, not the safe one. Every extra field is resent on every later turn and accelerates context rot. Keep only what the current task consumes; if a later step needs more, widen that step's projection rather than carrying all fields by default.

Misconception

You can let the model ignore the irrelevant fields, so trimming is unnecessary.

What's actually true

The model still has to process and re-receive those tokens on every turn, and they still count against the budget and dilute attention. Ignoring is not the same as removing. Trimming at the boundary keeps the noise out of the window entirely.

How the exam frames trimming

This is an apply-level knowledge point in the customer-support and multi-agent research scenarios, so questions present a concrete situation, a run that exhausts its budget or loses earlier context after several verbose tool calls, and ask for the most effective remedy. The strong answer trims tool results to the needed fields before appending; the weak answers reach for a bigger model, summarise the whole history, or simply call the tool less often. Demonstrating that you fix the problem at the tool boundary, before the data ever joins the conversation, is the judgement being tested.

Check your understanding

An agent makes several lookups in a long conversation, each returning a 40-plus-field record that it appends in full. The run exhausts its context budget and the agent loses track of earlier details before it can finish. What is the best fix?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Tool Result Trimming: Stop Verbose Outputs Exhausting Your Context