- In short
- Multi agent tool design is the practice of building tool interfaces for systems where several Claude agents share or pass tools, using clear input and output contracts and structured error signalling so that downstream agents can process results and failures. Descriptions must stay unambiguous even when multiple agents with different roles can access the same tool.
What changes when more than one agent is involved
Most tool-design lessons assume a single agent: one model reads the descriptions, calls a tool, and consumes the result itself. Multi agent tool design relaxes that assumption, and the relaxation changes the requirements. In a coordinator-subagent system or a research pipeline, a tool might be selectable by several agents with different roles, and the result of a call might be handed to a different agent than the one that made it. A tool interface that was perfectly adequate for one agent talking to itself can quietly break when those two things become true.
This is an apply-level skill that builds directly on description writing and structured errors. You already know how to write a description that selects reliably and how to attach structured metadata to an error. Multi agent tool design is using those skills under a harder constraint: the interface has to work for whichever agent calls it and whichever agent reads the result, not just for a single well-understood caller.
- Multi agent tool design
- Designing tool interfaces for systems where multiple Claude agents share or exchange tools, emphasising clear input and output contracts and structured error signalling so any downstream agent can interpret results and failures, and so descriptions stay unambiguous across roles.
Two requirements single-agent design lets you skip
When one agent owns the whole loop, you can get away with loose design because the same model that calls the tool also interprets the answer, in the same context. Multi-agent systems remove that safety net and surface two requirements.
- Clear input and output contracts. A subagent that calls
extract_data_pointsmay pass its result up to a synthesis agent that never saw the document. If the output shape shifts from call to call, the synthesis agent cannot rely on it. The contract, exactly what fields come back, in what shape, in every case, is what makes a result portable between agents. - Structured error signalling. When a tool fails, the agent that consumes the result has to be able to tell that it failed and why, without reading tone or guessing from prose. That is the role of structured error metadata: an explicit failure signal and category that a downstream agent can branch on, the same discipline covered in structured error metadata.
Both requirements come down to predictability. In a single-agent loop, the model can absorb a bit of ambiguity because it has the full context. Across agents, ambiguity becomes lossy: each handoff is a place where an under-specified output or an unsignalled error gets dropped.
Descriptions that hold across roles
There is a third, subtler requirement: descriptions must stay unambiguous even when multiple agents with different roles can access the same tool. A description you tuned while picturing one agent may quietly assume that agent's context. Give the same tool to a second agent with a different job and the description that read clearly for the first can become ambiguous for the second, because the second agent brings different default intentions to the same words.
The discipline from writing effective tool descriptions still applies, but you now write the description for the union of roles that can see the tool, not for one imagined caller. State the boundary in role-neutral terms, name the inputs precisely, and make the purpose narrow enough that no role can reasonably stretch it. Often the cleaner answer is to scope tools to roles so that fewer agents see each tool at all, a distribution decision that interacts with this one and is covered under tool distribution strategy.
Designing the interface
Putting it together, designing a tool for a multi-agent system means specifying three things up front. First, the input contract: the exact parameters, their formats, and which are required, so any calling agent supplies them correctly. Second, the output contract: the precise shape of a successful result, identical across calls, so any consuming agent can parse it. Third, the error contract: a structured failure signal, at minimum a flag that the result is an error and a category describing what kind, so a downstream agent can decide whether to retry, escalate, or proceed with partial data.
These contracts are what turn a tool from something one agent uses into something a system can rely on. Anthropic's tool-use documentation describes the basic loop where a tool returns a structured tool_use result that the application feeds back to the model; in a multi-agent system that structure is doing double duty, because the consumer may be a different agent. Anthropic's guidance on writing effective tools likewise pushes for returning only high-signal, stable information rather than opaque internal identifiers, advice that matters most precisely when a second agent, lacking the first's context, has to make sense of the result.
Worked example
A research system has a retrieval subagent and a synthesis agent. The retrieval subagent calls a fetch_source tool, and the synthesis agent writes the final answer from whatever the subagent returns. Sometimes a fetch fails silently and the synthesis agent fabricates around the gap.
Trace the failure across the boundary. The retrieval subagent calls fetch_source, which on a timeout returns a plain text result like "could not load." The subagent passes its findings up. The synthesis agent receives a bundle of sources, one of which is the string "could not load," and, having no structured signal that this was a failure rather than content, treats it as just another low-information source and writes around it. The defect is not in either agent's reasoning; it is in the tool interface, which signalled a failure as if it were a result.
Redesign the interface for the multi-agent context. Give fetch_source an output contract: on success it returns the source text plus a stable source identifier; on failure it returns a structured error with an isError flag and a category such as transient or not-found, never a bare string. Now the subagent can attempt local recovery on a transient error and, if it still fails, propagate a clearly marked failure with partial results rather than a silent gap. The synthesis agent reads the structured signal, knows that source is missing rather than empty, and either notes the gap or asks for a retry instead of fabricating.
Notice what carried the fix: not a smarter prompt, but a tool interface designed so that success and failure are both explicit and portable across the handoff. That is multi agent tool design, contracts and structured errors that let independent agents cooperate without losing information at the seams.
Common misreadings to avoid
Misconception
A tool that works well for one agent will work just as well when several agents share it.
What's actually true
Misconception
As long as the tool returns the data, how it signals errors does not really matter.
What's actually true
Designing for the consumer you cannot see
The mental shift that multi-agent tool design demands is to stop designing for the agent that calls the tool and start designing for the agent that reads the result. In a single-agent loop those are the same model in the same context, so any informality in the output is absorbed by shared understanding. Across a handoff they are different agents, and the consumer arrives with none of the caller's context. It sees only the bytes the tool returned. Anything the caller just knew that is not encoded in the output is lost at the boundary.
That is why the output contract carries so much weight here. A field that is sometimes present and sometimes absent, a status that is implied by tone rather than stated as a value, an identifier that means something only to the calling agent, each of these is a place where the consumer guesses, and guesses across a handoff become fabrications in the final answer. Returning stable, self-describing values, as Anthropic's tool-writing guidance urges, is not a nicety in a multi-agent system; it is the difference between a result the downstream agent can trust and one it has to invent around. Designing for the invisible consumer is the core habit of this knowledge point.
The agent-computer interface, tested across roles
Anthropic frames reliable agents as a problem of designing a strong agent-computer interface (ACI): the surface of tool names, descriptions, parameters, and return shapes that the model actually reads. Their guidance is blunt that tool definitions deserve the same prompt-engineering attention as the main prompt, meaning thorough documentation, clear parameter names, worked examples, edge cases, and explicit boundaries between similar tools. In a multi-agent system that rigor compounds, because the ACI is shared across roles: a vague specification that one agent happens to tolerate becomes a misuse vector for the next agent that sees the same tool.
Anthropic's writeup on building a multi-agent research system adds two practices that translate directly to the exam. First, scale the interface to the task: a simple fact-find may need one agent and a handful of tool calls, while complex work needs several subagents with clearly divided responsibilities, so you neither overbuild a heavy interface for a trivial job nor underspecify one for a hard one. Second, test tool usage iteratively, watching for the mistakes the model actually makes, the wrong tool, malformed inputs, a misread result, and tightening the descriptions, parameter names, and contracts in response. Because failures in a coordinated system hide at the seams between agents, this observe-and-tighten loop on the ACI is how you catch interface defects that no single agent's transcript would reveal on its own.
Grounding it in the exam scenarios
This concept is squarely aimed at the Multi-Agent Research and Customer Support scenarios, where work crosses agent boundaries by design. In the research system, retrieval, extraction, and synthesis agents pass intermediate results along a chain, and any tool whose output or error is loosely shaped corrupts everything downstream of it. In the support setting, a subagent that resolves part of a case hands its findings back to a coordinator that must act on them. The exam tests whether you fix such failures at the interface, by adding an output contract or a structured error signal, rather than by patching one agent's prompt or collapsing the architecture. Recognising the tool interface as shared infrastructure, and contracts plus structured errors as what make independent agents cooperate, is the applied competence the scenario is measuring.
How it shows up on the exam
Multi-agent scenarios (the research and customer-support scenarios in particular) test whether you can design tools that survive handoffs. Expect a system where one agent's tool output feeds another, and a failure or ambiguity that slips through the seam. The credited answer fixes the interface, adds an output contract, a structured error signal, or a role-neutral description, rather than patching one agent's prompt. Distractors will tempt you to make a single agent smarter or to add coordination overhead. The exam rewards recognising that in multi-agent systems the tool interface is shared infrastructure, and that contracts and structured errors are what let independent agents rely on each other.
A compact way to remember the concept under exam pressure: in a single-agent system the tool talks to itself, so informality is forgiven, but in a multi-agent system the tool talks to a stranger, so everything must be stated. Whenever a scenario shows information lost at a handoff, look to the interface that carried it and ask whether its output and error were explicit enough for a stranger to read. That framing points you straight at the contract fix the exam is looking for, instead of toward a prompt patch on one of the agents.
In a research system, a retrieval subagent's fetch tool returns the string 'could not load' on failure, and the synthesis agent that consumes the results fabricates around the gap. What is the best fix?
People also ask
How is tool design different in multi-agent systems?
Why do single-agent tools break with multiple agents?
What makes a tool interface safe for multiple agents to share?
Watch and learn
Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.
Anthropic: How to Build Multi Agent Systems
Why watch: Directly addresses multi-agent architecture and how tools and their interfaces must be designed so multiple coordinating agents can use them reliably.
More videos for this concept
References & primary sources
Master this concept with Archie
Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.