Diagnosing Tool Misrouting in Claude

In short: Tool misrouting is when Claude consistently selects the wrong tool because two or more tool descriptions overlap or read almost identically. The diagnosis is to compare the competing descriptions for ambiguity, and the first fix is to expand them with explicit boundaries rather than add classifiers or consolidate the tools.

What tool misrouting looks like

Tool misrouting is the failure mode where an agent repeatedly calls a tool that is plausible but wrong for the request. The user asks to cancel an order and the agent fetches the customer profile; the user asks for an invoice and the agent pulls the account summary. The behaviour is inconsistent rather than broken: sometimes the right tool fires, sometimes its neighbour does. That intermittency is the tell. A tool that is always called is well differentiated; a tool that is called sometimes when its sibling should have been is competing with that sibling for the same requests.

Because Claude selects tools from their descriptions, tool misrouting is almost always a description problem before it is anything else. Two tools whose descriptions overlap present the model with a near-tie, and near-ties resolve unpredictably. This knowledge point sits at the analyse level of Bloom's taxonomy for a reason: the skill being tested is not recalling a fact but reading a toolset, locating the ambiguity, and naming the cheapest fix that removes it.

Tool misrouting: A recurring wrong-tool selection caused by two or more tool descriptions being so similar that Claude cannot reliably tell which request belongs to which tool. The diagnosis centres on the overlap between descriptions; the fix is to differentiate them.

How to diagnose tool misrouting

Diagnosis is a comparison exercise. Take the tool the agent should have called and the tool it did call, and put their descriptions next to each other. Ask one question: reading only these two descriptions, could a careful person always tell which one a given request belongs to? If the answer is no, you have found the cause, and you have found it in the prose rather than the model.

The canonical exam example is a pair like get_customer ("Retrieves customer information") and lookup_order ("Retrieves order information"). Both begin "Retrieves ... information." Both take an identifier. Neither states a boundary against the other. A request like "what did I buy last month?" sits in the seam between them, it is about a customer and about orders, so the model splits its bets. The diagnosis is not "the model is unreliable"; it is "these two descriptions overlap and neither one owns the seam."

Diagnosing and fixing a misrouting pair

Loading diagram...

Misrouting is diagnosed by comparing descriptions, not by inspecting the model. The default fix is a boundary edit, then verification.

Why the cheap fix is the right fix

Once you have located the overlap, four candidate fixes usually present themselves. Only one of them addresses the cause directly.

Expand the descriptions with explicit boundaries, the correct first move. Tell each tool when to use it and when to defer to its neighbour. No new infrastructure, no extra tokens per call beyond the description itself.
Add few-shot examples of correct routing. This can nudge behaviour, but it adds token overhead to every request and treats the symptom rather than the ambiguous prose. It is the wrong root-cause fix.
Add a routing classifier, a separate model or rule layer that pre-sorts requests. Over-engineered as a first step: you are building a system to compensate for two sentences you could simply rewrite.
Consolidate the tools into one, sometimes valid as a design choice, but heavy: it changes your tool surface and downstream handlers when the actual problem was wording.

The exam consistently rewards the boundary edit because it is the lowest-effort change that removes the cause. The other three are real techniques with real uses, but reaching for them here means you have misdiagnosed a wording problem as an architecture problem. This ordering is the heart of the low effort, high leverage fix principle, and it is why this concept unlocks that one.

When the boundary edit is not enough

Diagnosis should be honest about its own limits. If you rewrite both descriptions with crisp, non-overlapping boundaries and the agent still misroutes, the problem may not be pure overlap. Perhaps a single tool is genuinely doing two jobs and should be split into purpose-specific tools, which is the move covered in tool splitting for specificity. Perhaps a keyword in the system prompt is pulling the model toward one tool regardless of the descriptions, which is system prompt tool conflicts. Diagnosis means trying the cheapest cause first and escalating only when the evidence forces you to.

Worked example

A support agent has get_customer ('Retrieves customer information') and lookup_order ('Retrieves order information'). Logs show 'Where is my refund?' routes to get_customer about half the time, leaving the agent unable to answer.

Start with the comparison. The request "Where is my refund?" concerns a transaction on an order, but it also concerns the customer's account balance, so it straddles both descriptions. Neither description claims refunds, statuses, or transactions, and neither says "use the other tool for account profiles." The seam is unowned, so the model picks roughly at random.

Now apply the boundary edit. Rewrite lookup_order: "Retrieves an order's status, line items, payments, and refunds by order ID. Use this for any question about a purchase, its delivery, or money returned for it. Do not use it to fetch the customer's saved profile, addresses, or contact details." Rewrite get_customer: "Retrieves a customer's saved profile, contact details, and account settings by customer ID. Use this for who the customer is and how to reach them. Do not use it for order, payment, or refund questions, use lookup_order for those." The word "refunds" now lives unambiguously in one tool, and each description points at the other for its out-of-scope cases.

Re-run the failing requests. "Where is my refund?" now matches lookup_order cleanly. You did not add a classifier, you did not attach examples to every call, and you did not merge the tools. You removed the overlap. The diagnosis pointed at the prose, and the prose is what you changed.

Common misreadings to avoid

Misconception

Persistent wrong-tool calls mean I need a routing classifier or a more capable model.

What's actually true

A classifier is over-engineered as a first response and a model upgrade rarely helps when the descriptions themselves are ambiguous. Diagnose the overlap first and expand the descriptions with explicit boundaries; that targets the cause at a fraction of the effort.

Misconception

Adding a few-shot example of the correct routing is the standard fix for misrouting.

What's actually true

Few-shot examples add token overhead to every request and patch the symptom rather than the ambiguous descriptions that cause it. They are a real technique for output consistency, but for description overlap the root-cause fix is to rewrite the boundaries.

A repeatable diagnostic procedure

Because misrouting is so common, it pays to turn the diagnosis into a fixed routine you can run under exam time pressure or in production. Start by collecting the evidence: which request was sent, which tool the agent chose, and which tool it should have chosen. Misrouting is defined by that gap between intended and actual selection, so you cannot diagnose it without naming both tools.

Next, isolate the competing pair and read their descriptions as the model reads them, together, with no knowledge of your implementation. Ask whether the failing request contains any phrase that clearly belongs to one description and not the other. If it does, the descriptions are probably fine and the cause lies elsewhere, perhaps in the system prompt. If it does not, if the request sits in a seam that neither description claims, you have confirmed overlap as the cause.

Then write the boundary edit and, crucially, verify it. Re-run not just the failing request but a handful of nearby requests, including ones that should route to the other tool, to make sure your edit did not over-correct and start capturing requests that belong to the neighbour. A boundary that fixes one direction while breaking the other is a half-diagnosis. Only when both tools select correctly across the seam is the misrouting genuinely resolved. This collect, compare, edit, and verify loop is what separates a guess from a diagnosis, and it generalises to every misrouting pair you will meet.

Configuration checks beyond the descriptions

Even with crisp, non-overlapping descriptions, a handful of request-level configuration mistakes can imitate misrouting, so a thorough diagnosis rules them out before blaming the prose. First, confirm the tool is actually present in the request's top-level tools array for the agent that should call it. A tool the model never received cannot be selected, and the symptom reads as persistent avoidance rather than overlap.

Second, inspect tool_choice. It defaults to auto when tools are supplied, but an explicit value can distort selection: none blocks tool use entirely, any forces some tool even when none truly fits, and tool pins one specific tool regardless of the request. A stray tool_choice left over from an earlier experiment is a common and easily missed cause of what looks like a routing bug. Third, when the right tool is chosen but with malformed arguments, enabling strict: true on the definition turns on schema validation so bad inputs surface as explicit errors instead of silent misuse. These configuration checks sit alongside the description review, not after it: confirm the request is shaped correctly before concluding the wording is at fault.

Why intermittency is the key clue

The single most useful signal in a misrouting report is that the wrong tool is chosen sometimes, not always. A tool that is never selected when it should be points to a different problem, perhaps it is missing from that agent's toolset, or a system-prompt instruction is steering away from it entirely. A tool that is selected intermittently in place of a sibling is the fingerprint of overlap: the model is resolving a near-tie, and near-ties tip both ways.

This is why experienced architects ask for the distribution of selections, not a single failing example. If the logs show that order-status questions go to lookup_order sixty percent of the time and to get_customer the rest, that split is the overlap made visible. Each request is landing in the contested seam and the model is effectively flipping a weighted coin. Quantifying the split also gives you a verification target: after the boundary edit, the correct tool should win those requests close to every time, and a residual split tells you the seam is not fully closed. Reading misrouting as a probability distribution rather than a binary bug is the analytical habit this knowledge point rewards.

How it shows up on the exam

A Domain 2 misrouting item gives you two suspiciously similar tools and an agent that confuses them, then offers four fixes. The trap answers are the heavier ones, build a classifier, add few-shot examples, consolidate the tools, each of which sounds sophisticated. The credited answer is the boundary edit, because the scenario's cause is overlapping descriptions and the exam rewards the lowest-effort change that removes the cause. Train yourself to diagnose before you design: compare the descriptions, find the unowned seam, and rewrite it before you reach for anything bigger.

One more habit sharpens your instinct here: read the answer options as competing hypotheses about the cause. A misrouting question is really asking you to diagnose, and each option encodes a different theory of why the agent failed. The classifier option theorises that selection needs an external arbiter; the consolidation option theorises that the tools should not be separate at all; the few-shot option theorises that the model needs examples rather than clearer prose. The boundary-edit option theorises that the descriptions overlap. Match each theory against the scenario, and the credited answer is the theory the evidence supports, which for an overlapping pair is almost always the boundary edit. Reading the options this way keeps you from being seduced by the most elaborate-sounding fix and anchors you on the cause the scenario actually describes.

Check your understanding

An agent has get_customer ('Retrieves customer information') and lookup_order ('Retrieves order information') and constantly confuses them on order-status questions. You have limited time. Which action best resolves the misrouting?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Diagnosing Tool Misrouting in Claude Agents