- In short
- Tool splitting is the design move of breaking one generic, multi-purpose tool into several purpose-specific tools, each with a clear input and output contract. It improves selection reliability because each resulting tool has a single, non-overlapping job, though splitting too far can make the toolset needlessly complex.
What tool splitting solves
Some misrouting cannot be fixed by rewording a boundary, because the problem is not two tools competing. It is one tool trying to be several. A tool named analyze_document that "analyses a document and returns insights" is asked to extract structured fields for one request, summarise prose for another, and check a claim against a source for a third. No single description can make those three jobs unambiguous, because they are genuinely different tasks hiding behind one name. Tool splitting is the design response: break the overloaded tool into purpose-specific tools, each owning exactly one job.
This is an apply-level skill. You are not just recognising that descriptions drive selection; you are using that principle to redesign a toolset so that selection becomes reliable. The lever is specificity. When each tool does one thing and says so, its description stops competing with itself, and the model gets a clean match for every request.
- Tool splitting
- Decomposing one generic tool that performs several jobs into multiple purpose-specific tools, each with a defined input and output contract and a single, non-overlapping purpose. The aim is to make tool selection unambiguous without over-fragmenting the toolset.
The signal that a tool needs splitting
A generic tool gives off recognisable signals. Its description uses umbrella verbs, "analyse," "process," "handle," "manage", that cover multiple operations. Its output shape changes depending on what the caller wanted, so downstream code has to branch on the request rather than the tool. And the model invokes it correctly for some intents but produces the wrong kind of result for others, because it picked the right tool for the wrong job. When one tool's single description has to answer "what does this return?" with "it depends," that dependence is the seam to split along.
Contrast this with a misrouting pair, where two distinct tools overlap. There, the fix is to sharpen the boundary between them. Here, there is only one tool and its boundaries are internal. It is overloaded, not overlapping. Telling the two situations apart is part of the diagnostic skill: diagnosing tool misrouting handles the overlap case, and tool splitting handles the overload case.
How to split for specificity
The method is to name the distinct jobs the generic tool is doing and give each one a tool with its own contract. The canonical example takes analyze_document and splits it into three:
- extract_data_points, input: a document plus the fields to find; output: those fields as structured values. One job: structured extraction.
- summarize_content, input: a document and a desired length or focus; output: prose summary. One job: condensation.
- verify_claim_against_source, input: a claim and a source document; output: supported or not, with the relevant passage. One job: verification.
Each resulting tool has a clear, non-overlapping purpose and a predictable output shape. The descriptions practically write themselves, because there is only one thing to describe. Selection becomes reliable because "summarise this report" can only match summarize_content, and "does this report support the claim that revenue grew?" can only match verify_claim_against_source.
Knowing when to stop
Specificity is the goal, not granularity for its own sake. The same logic that says "split an overloaded tool" can be pushed too far, producing a sprawl of micro-tools, extract_first_name, extract_last_name, extract_email, where one extract_data_points tool with a fields parameter would have been clearer. An oversized toolset reintroduces the very problem splitting was meant to cure: the model now has to scan and reason about too many options, which degrades selection on its own. That tension with tool count is the subject of the tool overload problem.
The test is purpose, not size. Split when two operations are genuinely different jobs with different outputs that a single description cannot disambiguate. Do not split when the operations are the same job parameterised differently. That is what input parameters are for. A well-split toolset has the smallest number of tools such that each one has a distinct, clearly describable purpose.
Worked example
A multi-agent research system gives every agent one analyze_document tool. The synthesis agent calls it to summarise, the fact-checker calls it to verify a claim, and results come back in inconsistent shapes that break downstream handling.
The symptom is not wrong-tool selection. There is only one tool, and the agents all call it. The symptom is wrong-job selection: the model invokes analyze_document but cannot be steered to do extraction versus summarisation versus verification reliably, and the output shape shifts with each call. A boundary edit cannot help, because there is no neighbour to draw a boundary against. The tool is overloaded.
Apply tool splitting. Replace analyze_document with extract_data_points, summarize_content, and verify_claim_against_source, each with an explicit input and output contract. Now scope them to roles: the synthesis agent gets summarize_content, the fact-checker gets verify_claim_against_source, and the extraction agent gets extract_data_points. Each agent's single, purpose-specific tool maps cleanly to its job, and every output has a predictable shape that downstream code can rely on.
Check that you have not over-split. Could extract_data_points reasonably handle "pull the author and date" and "pull the revenue figures" through a fields parameter? Yes. Those are the same job with different inputs, so they stay one tool. You split where the jobs differed, not where the inputs differed. That restraint is what separates specificity from fragmentation.
Common misreadings to avoid
Misconception
More tools always means more reliable selection, so split every tool as finely as possible.
What's actually true
Misconception
Tool splitting is the same fix as rewriting overlapping descriptions.
What's actually true
Overload versus overlap: telling the two apart
The hardest part of applying tool splitting well is recognising when it is the right move at all, because its symptoms can resemble plain misrouting. The distinction is the number of tools involved in the failure. Misrouting is a two-tool problem: two distinct tools compete for the same requests, and the fix sharpens the boundary between them. Overload is a one-tool problem: a single tool is asked to do several jobs, and no boundary edit can help because there is no neighbour to draw a line against.
A quick test settles it. Look at the tool the agent called and ask whether the agent picked the wrong tool or the right tool for the wrong job. If a request that wanted a summary went to a tool that also does extraction and verification, and that tool was the only sensible choice available, you are looking at overload, not misrouting. The model selected correctly but the tool itself cannot commit to one behaviour. That is the signature that calls for splitting. Mistaking overload for overlap leads you to endlessly reword a description that can never be made unambiguous, because the ambiguity is in the tool's job definition, not its wording.
Defining the contracts after a split
Splitting is only half the work; the value comes from the contracts you give each resulting tool. A contract specifies, for one tool, exactly what inputs it requires and exactly what shape it returns, in every case. After splitting the document tool, the extraction tool should commit to returning structured fields, the summarisation tool to returning prose of a requested length, and the verification tool to returning a supported-or-not verdict with the deciding passage. Those commitments are what let the rest of the system rely on the tools without inspecting their internals.
Clear contracts also make the descriptions almost write themselves, which closes the loop back to selection. A tool that does exactly one thing and returns exactly one shape has a narrow, honest description, and narrow descriptions are the easiest of all for the model to match. So splitting improves selection twice over: once by removing the overloaded umbrella that confused the model, and again by producing tools whose single purpose can be described without hedging. In a multi-agent research system, where one agent's output becomes another's input, these contracts are what make the handoffs survivable. A downstream agent can consume a predictable extraction result but not an unpredictable grab-bag from a do-everything tool.
Grounding it in the exam scenarios
Tool splitting surfaces most often in the Multi-Agent Research and Developer Productivity scenarios, where a system has grown a few broad, convenient tools that now hold it back. The exam tends to present a generic tool that has accreted responsibilities and an agent whose behaviour has become unpredictable as a result, then ask for the redesign. The credited path is to split by distinct job into purpose-specific tools with clear contracts, and the seductive wrong answers either keep the overloaded tool and bolt on machinery to steer it, or shatter it into far too many micro-tools. Holding the line between specificity and fragmentation is exactly the judgement the scenario is probing.
How it shows up on the exam
Domain 2 scenarios test tool splitting by describing a single tool that is clearly doing too much, analysing, summarising, and verifying under one name, and asking for the redesign that makes agent behaviour predictable. The credited answer splits it into purpose-specific tools with defined contracts. Watch for distractor answers that over-correct into dozens of micro-tools, or that reach for a routing classifier when the real issue is one overloaded tool. The exam rewards the engineer who applies specificity with judgement: split by distinct job, keep contracts clear, and stop before the toolset becomes its own problem.
A final cue for the exam: watch the verbs in the tool description you are shown. Umbrella verbs such as analyse, process, handle, or manage almost always flag a tool that is doing more than one job and is therefore a candidate for splitting. Single, concrete verbs such as extract, summarise, or verify signal a tool that already owns one purpose. When a scenario hands you a tool whose description leans on an umbrella verb and an agent that behaves unpredictably, the redesign it is fishing for is a split into concrete, single-verb tools with their own contracts, stopping at the point where each tool names exactly one job. Reading the verb is a fast, reliable tell for whether splitting is the move the scenario wants.
A research agent's single analyze_document tool is used for extraction, summarisation, and claim verification, and its output shape is unpredictable. Which redesign best improves selection reliability without over-fragmenting the toolset?
People also ask
When should you split a Claude tool into smaller tools?
Can a tool be too granular?
How does splitting a tool improve selection reliability?
Watch and learn
Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.
How We Build Effective Agents: Barry Zhang, Anthropic
Why watch: Anthropic argues for purpose-specific tools with clear contracts over broad generic tools, the core idea behind splitting a generic tool into focused, non-overlapping ones.
More videos for this concept
References & primary sources
Master this concept with Archie
Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.