Writing Effective Tool Descriptions for Claude

In short: Writing effective tool descriptions means composing the description field so it states what the tool does, the inputs it expects with formats, example queries it handles, edge cases and limitations, and explicit boundaries against similar tools. Done well, the description makes Claude's tool selection unambiguous; Anthropic recommends at least three to four sentences per tool.

What writing effective tool descriptions involves

Earlier knowledge points establish that descriptions drive selection and how to diagnose when they fail. This one is the constructive skill: writing effective tool descriptions from scratch so that selection never becomes a problem in the first place. It is an apply-level competency. You are taking the principle "the description is the selection mechanism" and turning it into a repeatable authoring method that produces descriptions a model can act on without guessing.

The bar is set by Anthropic's own documentation, which instructs designers to "provide extremely detailed descriptions" and calls this "by far the most important factor in tool performance." The same guidance recommends aiming for at least three to four sentences per tool, more for complex ones. So an effective description is not a label. It is a small, structured paragraph engineered to remove ambiguity.

Effective tool description: A tool description that states the tool's purpose, expected inputs and formats, example queries it handles, edge cases and limitations, and explicit boundaries against similar tools, written in enough detail (typically three to four sentences or more) that Claude can select and use the tool unambiguously.

The five elements of a strong description

A reliable description answers five questions in order. Treat them as a checklist you run every time you author a tool.

Purpose, what does the tool do, concretely? Not "handles documents" but "extracts named fields from a PDF and returns them as structured values."
Inputs, what does each parameter mean, what format does it take, and how does it change the result? Inputs described in prose reinforce the schema and remove guesswork about formats.
Example queries, what kinds of user request should map to this tool? A couple of example intents teach the model the tool's territory faster than abstract description alone.
Edge cases and limitations, what does the tool not do, and what does it return when there is no result? Stating limits prevents the model from reaching for the tool in situations it cannot handle.
Boundaries, how does this tool differ from the nearest similar tool? An explicit "use X for this; use Y for that" is what separates two otherwise-confusable tools.

The first three make the tool usable; the last two make it selectable among siblings. Most weak descriptions cover only purpose and inputs and skip the boundary, which is exactly why they misroute. Anthropic's contrast between a good and a poor get_stock_price description makes the point: the good one explains what it does, when to use it, what it returns, and what each parameter means; the poor one says "Gets the stock price for a ticker" and leaves the model with open questions.

The anatomy of an effective tool description

Loading diagram...

Five inputs combine into one description. Purpose, inputs, and examples make the tool usable; edge cases and boundaries make it selectable among siblings.

A template you can reuse

Concretely, an effective description tends to follow a shape like this, expanded to three or four sentences: "[Purpose: what it does and returns.] Use this when [example intents]. [Input notes: what each key parameter means and its format.] Do not use it for [out-of-scope cases], use [sibling tool] instead. [Limitation: what it returns when there is no match.]"

Run that template on a refund-lookup tool and you get: "Retrieves an order's status, line items, payments, and refunds by order ID. Use this for any question about a purchase, its delivery, or money returned for it, such as 'where is my refund?' or 'what did I order?'. The order_id parameter must be the internal order identifier, not the customer ID. Do not use it for account profile or contact-detail questions, use get_customer for those. If no order matches the ID, it returns an empty result rather than an error." Every one of the five elements is present, and the description now does real selection work.

This authoring discipline is the same whether you are defining a native API tool or an MCP tool. For MCP servers it becomes doubly important, because a weak MCP description can lose out to a built-in tool the model already knows, the concern addressed in MCP tool description enhancement. Strong description writing is the foundation that unlocks both the broader fix-ordering principle and multi agent tool design.

Worked example

You are adding a verify_claim_against_source tool to a research agent that already has summarize_content. You want the description to guarantee correct selection from day one.

Walk the five elements rather than writing the first sentence that comes to mind. Purpose: it checks whether a specific claim is supported by a specific source document and returns supported or not supported with the relevant passage. Inputs: a claim string and a source document reference; the claim should be a single assertion, not a question. Example queries: "does this report support the statement that revenue grew?", "is this quote actually in the source?". Edge cases: if the source does not address the claim at all, it returns "not supported" with no passage rather than fabricating one. Boundary: do not use it to condense a document, that is summarize_content, and do not use it to pull structured fields, that is extract_data_points.

Assemble them into a description: "Checks whether a single factual claim is supported by a given source document and returns 'supported' or 'not supported' along with the passage that decides it. Use this to fact-check a specific assertion against a specific source, for example 'does this report support that revenue grew?'. The claim parameter must be one assertion, not a question or a topic. Do not use it to summarise a document (use summarize_content) or to extract fields (use extract_data_points); if the source does not address the claim, it returns 'not supported' with no passage."

That paragraph is four sentences and covers all five elements. Drop it next to summarize_content and a request to "summarise the methodology section" can only match the summariser, while "is the 12% growth figure actually supported?" can only match the verifier. You engineered selection at authoring time instead of debugging it later. That is the payoff of writing effective tool descriptions deliberately.

Common misreadings to avoid

Misconception

A tool description just needs to state accurately what the tool does.

What's actually true

Stating what the tool does is necessary but not sufficient. Without explicit boundaries against similar tools, even an accurate description misroutes. Effective descriptions also cover inputs, example queries, edge cases, and how the tool differs from its neighbours.

Misconception

Shorter descriptions are better because they save tokens and keep the prompt lean.

What's actually true

Anthropic identifies detailed descriptions as the single most important factor in tool performance and recommends at least three to four sentences. Trimming a description to save tokens trades a small saving for unreliable selection, which costs far more in wasted tool calls.

From rubric to review checklist

The five-element model is a writing aid, but it doubles as a review checklist for descriptions you did not write. When you inherit a toolset, you can audit each description by asking whether it answers all five questions: purpose, inputs, example queries, edge cases, and boundary. A description that scores well on purpose and inputs but says nothing about edge cases or differentiation is the most common kind of near-miss, accurate, readable, and still prone to misrouting the moment a sibling tool exists. Scoring descriptions this way turns a subjective sense that something is a bit thin into a specific, fixable gap.

The checklist also tells you where to spend effort. Simple, standalone tools with no near-neighbours can lean on purpose and inputs alone, because there is nothing to differentiate from. Tools that live in a crowded set, or that a model has historically confused, earn their keep from the boundary sentence above all. Spending your description-writing budget where the selection risk actually is, on the tools with confusable siblings, is the efficient version of this skill, rather than padding every description to the same length regardless of need.

Why detail beats brevity here

It is worth dwelling on the counterintuitive economics, because the instinct to keep prompts short is strong and usually correct. Everywhere else in prompt design, brevity is a virtue; with tool descriptions it is a trap. Anthropic's guidance is explicit that detailed descriptions are the single highest-leverage factor in tool performance, recommending several sentences each, because the description is read on every turn and pays back its token cost in avoided wrong calls. A wrong tool call is expensive: it consumes a full round trip, produces a useless result, and often triggers a retry, so the handful of tokens spent making a description unambiguous is repaid many times over.

This reframes the length question entirely. You are not deciding between a short description and a long one; you are deciding between a few extra tokens now and repeated failed calls later. The architect who internalises that trade stops trimming descriptions to look tidy and starts investing in them as the cheapest reliability mechanism available, which is precisely the posture the exam rewards when it asks you to pick the description that will select most reliably rather than the one that reads most economically.

Let the schema back up the description

A description does its job only if the tool's input_schema agrees with it. The description tells Claude when to call the tool and what the parameters mean; the schema decides which of those parameters the model is required to supply. When the two drift apart, even a strong description can still produce bad calls. The most common version of this is over-requiring fields: marking a parameter as required when the underlying source may not contain it. Faced with a schema that demands a value, the model will often invent one to satisfy the contract, so a required field quietly becomes an instruction to fabricate.

The fix is to reserve required fields for the values the tool genuinely cannot run without, and to make everything else optional or nullable. A description that says "order_id is optional, omit it to search by customer email instead" is only trustworthy if the schema actually allows the omission. Aligning the two, prose that explains each input alongside a schema that demands only what is essential, is what makes the description's promises enforceable. It also keeps selection clean, because a model that is not being pushed to fill imaginary parameters spends its attention on choosing the right tool rather than on satisfying an overstrict contract.

How it shows up on the exam

Applied Domain 2 items ask you to choose or improve a description, or to predict which of several descriptions will select reliably. The credited answers are the ones that go beyond stating function to include boundaries, inputs, and edge cases. Distractors are typically accurate-but-thin descriptions ("Retrieves customer information") that read fine in isolation and fail the moment a sibling tool exists. Keep the five-element checklist in mind and you can both write a strong description and recognise a weak one on sight, which is exactly the judgement the exam is sampling.

As a closing discipline, treat the description as the last thing you tune before shipping a tool and the first thing you inspect when one misbehaves. Writing it well at authoring time is cheaper than debugging selection later, and reviewing it first during an incident is faster than rebuilding the toolset. The five elements give you both a composition order and an inspection order: draft purpose, inputs, examples, edge cases, and boundary in turn, then, when something breaks, walk the same five and find the missing one. An architect who makes the description the centre of gravity for both authoring and debugging rarely needs the heavier fixes at all, which is the quiet efficiency this skill buys.

Check your understanding

You are writing the description for a load_document tool that sits beside a fetch_url tool. Which description will most reliably guide Claude to select load_document correctly?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

People also ask

Watch and learn

References & primary sources

Master this concept with Archie