AI Skill Certs
Tool Design & MCP Integration·Task 2.1·Bloom: apply·Difficulty 3/5·8 min read·Updated 2026-06-07

Tool Splitting for Specificity in Claude Tool Design

Design effective tool interfaces with clear descriptions and boundaries

SUBy Solomon UdohReviewed by Solomon UdohAI-assisted · human-reviewed
In short
Tool splitting is the design move of breaking one generic, multi-purpose tool into several purpose-specific tools, each with a clear input and output contract. It improves selection reliability because each resulting tool has a single, non-overlapping job, though splitting too far can make the toolset needlessly complex.

What tool splitting solves

Some misrouting cannot be fixed by rewording a boundary, because the problem is not two tools competing. It is one tool trying to be several. A tool named analyze_document that "analyses a document and returns insights" is asked to extract structured fields for one request, summarise prose for another, and check a claim against a source for a third. No single description can make those three jobs unambiguous, because they are genuinely different tasks hiding behind one name. Tool splitting is the design response: break the overloaded tool into purpose-specific tools, each owning exactly one job.

This is an apply-level skill. You are not just recognising that descriptions drive selection; you are using that principle to redesign a toolset so that selection becomes reliable. The lever is specificity. When each tool does one thing and says so, its description stops competing with itself, and the model gets a clean match for every request.

Tool splitting
Decomposing one generic tool that performs several jobs into multiple purpose-specific tools, each with a defined input and output contract and a single, non-overlapping purpose. The aim is to make tool selection unambiguous without over-fragmenting the toolset.

The signal that a tool needs splitting

A generic tool gives off recognisable signals. Its description uses umbrella verbs, "analyse," "process," "handle," "manage", that cover multiple operations. Its output shape changes depending on what the caller wanted, so downstream code has to branch on the request rather than the tool. And the model invokes it correctly for some intents but produces the wrong kind of result for others, because it picked the right tool for the wrong job. When one tool's single description has to answer "what does this return?" with "it depends," that dependence is the seam to split along.

Contrast this with a misrouting pair, where two distinct tools overlap. There, the fix is to sharpen the boundary between them. Here, there is only one tool and its boundaries are internal. It is overloaded, not overlapping. Telling the two situations apart is part of the diagnostic skill: diagnosing tool misrouting handles the overlap case, and tool splitting handles the overload case.

How to split for specificity

The method is to name the distinct jobs the generic tool is doing and give each one a tool with its own contract. The canonical example takes analyze_document and splits it into three:

  • extract_data_points, input: a document plus the fields to find; output: those fields as structured values. One job: structured extraction.
  • summarize_content, input: a document and a desired length or focus; output: prose summary. One job: condensation.
  • verify_claim_against_source, input: a claim and a source document; output: supported or not, with the relevant passage. One job: verification.

Each resulting tool has a clear, non-overlapping purpose and a predictable output shape. The descriptions practically write themselves, because there is only one thing to describe. Selection becomes reliable because "summarise this report" can only match summarize_content, and "does this report support the claim that revenue grew?" can only match verify_claim_against_source.

Splitting one overloaded tool into three purpose-specific tools
Loading diagram...
The generic tool hid three jobs behind one description. Splitting gives each job its own contract, so every request maps to exactly one tool.

Knowing when to stop

Specificity is the goal, not granularity for its own sake. The same logic that says "split an overloaded tool" can be pushed too far, producing a sprawl of micro-tools, extract_first_name, extract_last_name, extract_email, where one extract_data_points tool with a fields parameter would have been clearer. An oversized toolset reintroduces the very problem splitting was meant to cure: the model now has to scan and reason about too many options, which degrades selection on its own. That tension with tool count is the subject of the tool overload problem.

The test is purpose, not size. Split when two operations are genuinely different jobs with different outputs that a single description cannot disambiguate. Do not split when the operations are the same job parameterised differently. That is what input parameters are for. A well-split toolset has the smallest number of tools such that each one has a distinct, clearly describable purpose.

Worked example

A multi-agent research system gives every agent one analyze_document tool. The synthesis agent calls it to summarise, the fact-checker calls it to verify a claim, and results come back in inconsistent shapes that break downstream handling.

The symptom is not wrong-tool selection. There is only one tool, and the agents all call it. The symptom is wrong-job selection: the model invokes analyze_document but cannot be steered to do extraction versus summarisation versus verification reliably, and the output shape shifts with each call. A boundary edit cannot help, because there is no neighbour to draw a boundary against. The tool is overloaded.

Apply tool splitting. Replace analyze_document with extract_data_points, summarize_content, and verify_claim_against_source, each with an explicit input and output contract. Now scope them to roles: the synthesis agent gets summarize_content, the fact-checker gets verify_claim_against_source, and the extraction agent gets extract_data_points. Each agent's single, purpose-specific tool maps cleanly to its job, and every output has a predictable shape that downstream code can rely on.

Check that you have not over-split. Could extract_data_points reasonably handle "pull the author and date" and "pull the revenue figures" through a fields parameter? Yes. Those are the same job with different inputs, so they stay one tool. You split where the jobs differed, not where the inputs differed. That restraint is what separates specificity from fragmentation.

Common misreadings to avoid

Misconception

More tools always means more reliable selection, so split every tool as finely as possible.

What's actually true

Splitting helps only up to the point where each tool has a distinct purpose. Past that, an oversized toolset makes the model scan more options and selection degrades again. Split by job, and use parameters for variations of the same job.

Misconception

Tool splitting is the same fix as rewriting overlapping descriptions.

What's actually true

They address different causes. Rewriting boundaries fixes two distinct tools that compete for the same requests; splitting fixes a single tool that is doing several jobs at once. Diagnose whether you have overlap or overload before choosing the move.

Overload versus overlap: telling the two apart

The hardest part of applying tool splitting well is recognising when it is the right move at all, because its symptoms can resemble plain misrouting. The distinction is the number of tools involved in the failure. Misrouting is a two-tool problem: two distinct tools compete for the same requests, and the fix sharpens the boundary between them. Overload is a one-tool problem: a single tool is asked to do several jobs, and no boundary edit can help because there is no neighbour to draw a line against.

A quick test settles it. Look at the tool the agent called and ask whether the agent picked the wrong tool or the right tool for the wrong job. If a request that wanted a summary went to a tool that also does extraction and verification, and that tool was the only sensible choice available, you are looking at overload, not misrouting. The model selected correctly but the tool itself cannot commit to one behaviour. That is the signature that calls for splitting. Mistaking overload for overlap leads you to endlessly reword a description that can never be made unambiguous, because the ambiguity is in the tool's job definition, not its wording.

Defining the contracts after a split

Splitting is only half the work; the value comes from the contracts you give each resulting tool. A contract specifies, for one tool, exactly what inputs it requires and exactly what shape it returns, in every case. After splitting the document tool, the extraction tool should commit to returning structured fields, the summarisation tool to returning prose of a requested length, and the verification tool to returning a supported-or-not verdict with the deciding passage. Those commitments are what let the rest of the system rely on the tools without inspecting their internals.

Clear contracts also make the descriptions almost write themselves, which closes the loop back to selection. A tool that does exactly one thing and returns exactly one shape has a narrow, honest description, and narrow descriptions are the easiest of all for the model to match. So splitting improves selection twice over: once by removing the overloaded umbrella that confused the model, and again by producing tools whose single purpose can be described without hedging. In a multi-agent research system, where one agent's output becomes another's input, these contracts are what make the handoffs survivable. A downstream agent can consume a predictable extraction result but not an unpredictable grab-bag from a do-everything tool.

Grounding it in the exam scenarios

Tool splitting surfaces most often in the Multi-Agent Research and Developer Productivity scenarios, where a system has grown a few broad, convenient tools that now hold it back. The exam tends to present a generic tool that has accreted responsibilities and an agent whose behaviour has become unpredictable as a result, then ask for the redesign. The credited path is to split by distinct job into purpose-specific tools with clear contracts, and the seductive wrong answers either keep the overloaded tool and bolt on machinery to steer it, or shatter it into far too many micro-tools. Holding the line between specificity and fragmentation is exactly the judgement the scenario is probing.

How it shows up on the exam

Domain 2 scenarios test tool splitting by describing a single tool that is clearly doing too much, analysing, summarising, and verifying under one name, and asking for the redesign that makes agent behaviour predictable. The credited answer splits it into purpose-specific tools with defined contracts. Watch for distractor answers that over-correct into dozens of micro-tools, or that reach for a routing classifier when the real issue is one overloaded tool. The exam rewards the engineer who applies specificity with judgement: split by distinct job, keep contracts clear, and stop before the toolset becomes its own problem.

A final cue for the exam: watch the verbs in the tool description you are shown. Umbrella verbs such as analyse, process, handle, or manage almost always flag a tool that is doing more than one job and is therefore a candidate for splitting. Single, concrete verbs such as extract, summarise, or verify signal a tool that already owns one purpose. When a scenario hands you a tool whose description leans on an umbrella verb and an agent that behaves unpredictably, the redesign it is fishing for is a split into concrete, single-verb tools with their own contracts, stopping at the point where each tool names exactly one job. Reading the verb is a fast, reliable tell for whether splitting is the move the scenario wants.

Check your understanding

A research agent's single analyze_document tool is used for extraction, summarisation, and claim verification, and its output shape is unpredictable. Which redesign best improves selection reliability without over-fragmenting the toolset?

People also ask

When should you split a Claude tool into smaller tools?
When one tool is doing several genuinely different jobs that a single description cannot disambiguate, and each job has its own clear input and output. Splitting gives every job a tool with a non-overlapping purpose.
Can a tool be too granular?
Yes. Splitting past the point where each tool has a meaningful distinct purpose creates an oversized toolset that is itself hard to select from. Use parameters for variations of one job rather than a new tool for each.
How does splitting a tool improve selection reliability?
Each resulting tool has a single, clearly described purpose, so the model gets an unambiguous match for each request instead of one umbrella description trying to cover several jobs.

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

AI Engineer

How We Build Effective Agents: Barry Zhang, Anthropic

Why watch: Anthropic argues for purpose-specific tools with clear contracts over broad generic tools, the core idea behind splitting a generic tool into focused, non-overlapping ones.

More videos for this concept

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying