Claude Tool Descriptions: How Selection Works

In short: Claude tool descriptions are the natural-language text in each tool definition that the model reads to decide which tool to call. They are the primary selection signal, not supplementary documentation, so a vague description directly degrades how reliably Claude routes a request to the right tool.

What claude tool descriptions actually control

When you give Claude a set of tools, each tool is a small JSON object with a name, an input_schema, and a description. It is tempting to treat that description the way you treat a docstring: a courtesy note for the next engineer. That instinct is wrong, and correcting it is the whole point of this knowledge point. Claude tool descriptions are the primary mechanism the model uses to decide which tool to call. The model never sees your implementation; it sees the words you wrote.

Anthropic's own documentation is blunt about this: providing extremely detailed descriptions is "by far the most important factor in tool performance." That single sentence reframes description writing from a documentation chore into a core design activity. The text you put in that field is read at selection time, on every turn, and weighed against the user's request. If the words are thin, the model is guessing.

Tool description as selection mechanism: The natural-language description in a tool definition is the functional input Claude uses to choose between tools. It is read at inference time, not just by humans, so its quality directly determines selection accuracy.

Why the model leans on the description

A Claude tool call is a two-step contract. First the model decides whether to use a tool and which one; only then does it fill in the parameters from the schema. That first decision is made almost entirely from the name and the description. The schema constrains the shape of the arguments, but it rarely disambiguates intent. "Should I call get_customer or lookup_order for this billing question?" is answered by prose, not by JSON types.

This is why a description such as "Retrieves customer information" is dangerous in any toolset with more than one tool. It states what the tool does in isolation but says nothing about when to prefer it over a neighbour, what it returns, or where its boundary lies. The model is left to infer those distinctions, and inference under ambiguity is exactly where misrouting begins. The richer the description, the less the model has to guess.

Anthropic frames the same idea from the agent designer's side in its guidance on building effective agents: "tool definitions and specifications should be given just as much prompt engineering attention as your overall prompts." The agent-computer interface is a prompt-engineering surface, and the description is its most important field.

What a strong description contains

A description that genuinely drives selection does five things. It states the tool's purpose in plain language. It says when the tool should be used and, just as importantly, when it should not. It explains what each parameter means and how it affects behaviour. It names edge cases and limitations, including what the tool does not return. And it draws an explicit boundary against any similar tool the model might confuse it with.

Purpose, what the tool does, in one or two concrete sentences.
Usage boundary, the situations it is for, and the ones it is not for.
Inputs, each parameter, its format, and its effect on the result.
Edge cases, empty results, unsupported inputs, and known limitations.
Differentiation, how it differs from the nearest neighbouring tool.

Notice that only the first item is "documentation" in the traditional sense. The other four exist purely to make the model's selection decision unambiguous. That is the shift in mindset the exam is testing: descriptions are written for the model's choice, not for a human reader skimming an API reference.

How a description becomes a tool choice

Loading diagram...

With overlapping descriptions the model guesses; with explicit boundaries it selects confidently. The deciding factor is the prose, not the schema.

Why this is the root of Domain 2

Domain 2 (Tool Design & MCP Integration) carries 18% of the exam, and almost every tool-selection question in it rests on this one idea. When a scenario describes an agent that keeps calling the wrong tool, the exam wants you to reach first for the descriptions, because that is where the cheapest, highest-leverage fix lives. Reaching instead for a routing classifier or a machine-learning router signals that you have missed the mechanism: you cannot fix a selection problem downstream if its cause is upstream in the words.

This knowledge point is the prerequisite for the rest of the task statement. Diagnosing tool misrouting assumes you accept that descriptions drive selection. Writing effective tool descriptions operationalises it. The low effort, high leverage fix principle ranks description edits above heavier engineering precisely because they target the mechanism directly. Get this concept wrong and the others collapse.

Worked example

A customer-support agent has two tools: get_customer ('Retrieves customer information') and lookup_order ('Retrieves order information'). For 'Where is my package?' it sometimes calls get_customer and stalls.

Walk through what the model sees. The user intent is "track a shipment." Claude scans the two descriptions. Both say "Retrieves ... information," and neither mentions shipments, tracking, or orders-in-transit. The phrase "customer information" could plausibly cover a customer's orders, so on some turns the model picks get_customer, gets back a profile with no tracking data, and has nowhere to go.

Nothing about the schema would have prevented this. Both tools take an identifier and return a record. The only lever that changes the outcome is the description. Rewrite lookup_order to read: "Retrieves the status, contents, and shipment tracking of a specific order by order ID. Use this for any question about where a package is, when it will arrive, or what an order contains. Do not use it to fetch a customer's account profile or contact details, use get_customer for that." Now the words "shipment tracking," "where a package is," and the explicit boundary against get_customer give the model an unambiguous match for the request.

The fix touched no code, added no classifier, and changed no schema. It changed the selection signal. That is the entire lesson: the description is the mechanism.

Common misreadings to avoid

Misconception

Tool descriptions are documentation for developers, so a short label is fine as long as my code works.

What's actually true

The description is read by the model at selection time and is the primary factor in which tool it calls. A short label such as 'Retrieves customer information' may be accurate yet still cause constant misrouting, because the model cannot differentiate it from a similar tool.

Misconception

If the model keeps choosing the wrong tool, the schema or the model is at fault.

What's actually true

The first thing to inspect is the description. Schemas constrain argument shape, not intent; selection is driven by the prose. Anthropic calls detailed descriptions the single most important factor in tool performance, so weak descriptions are the usual root cause.

How the name, schema, and description divide the work

It helps to be precise about which part of a tool definition does what, because conflating them is a common source of confusion. A tool definition exposes three things the model can read: the name, the input schema, and the description. Each carries a different kind of signal, and only one of them is built for disambiguating intent.

The name is a short handle. A well-chosen name such as get_invoice or cancel_subscription gives the model a strong first hint, and Anthropic recommends meaningful namespacing, prefixing names with a service, such as github_list_prs or slack_send_message, so that selection stays clear as the toolset grows. But a name is only a few tokens; it cannot express boundaries, edge cases, or when not to use the tool. The input schema, for its part, defines the shape and types of the arguments. It is essential for the second step of a tool call, filling in parameters, but it says almost nothing about whether this tool is the right one for the request. Two tools can share an identical schema, both take a single string identifier, and still be completely different in purpose.

That leaves the description as the only field designed to carry intent. It is where you state what the tool is for, when to choose it, and how it differs from its neighbours. This is why Anthropic singles it out as the most important factor in tool performance: the name hints, the schema constrains, and the description decides. An architect who understands this division stops trying to fix selection problems by renaming tools or tightening schemas, and goes straight to the field that actually governs the choice.

Examples sharpen selection and argument accuracy

The description is the primary signal, but Anthropic exposes an additional, optional lever in the tool definition: tool use examples. You can attach example inputs to a tool, documented as an input_examples field, and each example must conform to the tool's input_schema, with any required field appearing in at least one example. Examples earn their place mainly by improving parameter accuracy, showing the model how to populate the schema for realistic requests, and they reinforce selection as a side effect by demonstrating the kinds of request the tool is meant to handle.

Crucially, examples do not replace the description. The description still carries the decision of whether to call a tool and which one; examples refine how the chosen tool is invoked and can clarify a close boundary faster than another paragraph of prose. Reach for them when a tool's parameters are easy to mis-fill, or when two tools sit so close that a worked example settles the ambiguity more cleanly than words. Write the description first, then add examples only where it leaves real parameter or boundary uncertainty, which keeps this technique on the right rung of the leverage ladder.

Grounding it in the exam scenarios

This knowledge point shows up most directly in the Customer Support Resolution Agent and Developer Productivity scenarios, both of which hand an agent a toolset and watch how it chooses. In the support scenario, an agent juggling account, order, billing, and escalation tools lives or dies by whether those descriptions cleanly partition the space of customer requests. A thin description anywhere in that set creates a seam where requests fall through, and the exam will describe the resulting confusion and ask you to name the cause.

In the developer-productivity setting, the same principle governs how Claude Code chooses among its capabilities and any connected MCP tools. A capable tool with a weak description can simply be passed over in favour of a more clearly described alternative, which is the seed of several later knowledge points about enhancing MCP descriptions so they are not ignored. Across both scenarios the throughline is identical: the description is the lever. Recognising that, before you reach for any heavier mechanism, is the understanding this foundational knowledge point is built to instil, and it is what makes every applied tool-design question downstream tractable.

How it shows up on the exam

Expect scenarios, not definitions. You will be shown a toolset and a misbehaving agent and asked for the cause or the best first fix. The correct answer almost always traces to the descriptions: they overlap, they are too thin to differentiate, or they omit the boundary that would have separated two similar tools. Distractors will tempt you with heavier machinery, a routing model, consolidating the tools, adding few-shot examples, but those are answers to a different question. At this Bloom level the exam wants you to understand that the description carries the selection signal, so that later, applied questions about diagnosing and rewriting descriptions have a foundation to stand on.

Check your understanding

An agent has eight tools, several with one-line descriptions like 'Gets data' and 'Fetches records'. The agent frequently calls the wrong tool. A teammate argues the model is simply not capable enough. What is the most accurate diagnosis?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Claude Tool Descriptions: The Primary Tool Selection Mechanism