Structured Error Metadata for MCP Tools

In short: Structured error metadata is the machine-readable shape of a failed tool response: an errorCategory label, an isRetryable boolean, and a human-readable description. The agent reads these fields to decide recovery automatically, instead of parsing free text or guessing whether to retry.

What goes into structured error metadata

Structured error metadata is the deliberate, machine-readable shape you give a failed tool response so the agent can act on it without interpreting prose. Where the previous knowledge points established that a failure happened (the error flag) and which kind it is (the four categories), this concept is about packaging that knowledge into fields the model can branch on directly. The canonical three fields are an errorCategory label, an isRetryable boolean, and a description written for a human to read.

The shift here is from free text to a contract. A tool that merely returns "something went wrong" forces the model to guess. A tool that returns errorCategory: "transient", isRetryable: true, and a one-line description hands the model a decision it can make mechanically. That is why structured error metadata is an apply-level skill: you are designing the response payload, not just recognising failure.

Structured error metadata: The fielded form of a failed tool result: an errorCategory, an isRetryable boolean, and a human-readable description. It turns a failure into data the agent can branch on, rather than text it must parse.

errorCategory: the label that selects a strategy

The errorCategory field carries one of the four classifications, transient, validation, business, or permission, into the response itself, so the agent does not have to infer it from a message. By naming the category at the source, the tool author moves the classification decision to where the most context lives: inside the tool, which actually knows whether a 503 was a timeout or whether a rule deliberately refused the request.

This matters because the model's view is limited to what comes back. If the tool keeps the category in its own head and emits only "failed," the model has to reconstruct intent from wording. Putting errorCategory in the payload removes that ambiguity and makes the rest of the metadata coherent: the category and the retryability flag should always agree.

isRetryable: the boolean that ends the guessing

The isRetryable boolean is the field that most directly drives behaviour. It states, in one unambiguous bit, whether attempting the same call again could succeed. Transient failures set it true; validation, business, and permission failures generally set it false, because none of them is fixed by repeating the identical request. The exam treats omitting this flag as a defect: without it, the agent is left to guess whether to retry, which is exactly the uncertainty structured metadata exists to remove.

A subtle but important rule is that isRetryable must stay consistent with errorCategory. A response that says errorCategory: "business" but isRetryable: true is self-contradictory and will produce wrong behaviour. The boolean is not a second opinion; it is the category's retry implication made explicit so the agent can read one field and move.

Grounding the fields in Anthropic's API error types

It is worth being precise about provenance: errorCategory and isRetryable are fields you define in your tool's response, not fields the Anthropic API returns. The platform does not emit an errorCategory or an isRetryable of its own. What it does emit is a documented error type alongside an HTTP status, and that taxonomy is usually what your own classification maps from. An Anthropic API error arrives as JSON with an error object carrying a type and a message, plus a request_id for tracing, a useful anchor when you are deciding how to label a failure you are wrapping.

The official types make the mapping concrete. A 429 rate_limit_error, a 500 api_error, a 504 timeout_error, and a 529 overloaded_error are the transient cases, and these are the ones your tool would surface as errorCategory: "transient" with isRetryable: true. By contrast, 400 invalid_request_error, 401 authentication_error, 403 permission_error, and 404 not_found_error are not fixed by repeating the identical call, so they map to validation, permission, or business categories with isRetryable: false. Designing your metadata to mirror this real taxonomy keeps the boolean honest rather than arbitrary.

The lesson for an apply-level architect is that structured metadata is a translation layer. Upstream services, including Anthropic's own API, speak in status codes and type strings; your tool distils that into the small, decision-ready vocabulary the agent consumes. The closer your errorCategory tracks the genuine retryability of the underlying failure, the less the agent has to second-guess the flag.

description: instructive, not a stack trace

The third field is a human-readable description. Anthropic's guidance is to make error text instructive, say what went wrong and what to try next, for example "Rate limit exceeded. Retry after 60 seconds." rather than a bare "failed." For business errors that will ultimately reach a person, the description doubles as a customer-friendly explanation the agent can relay almost verbatim, such as "Refunds above $500 require supervisor approval."

Two audiences read this field: the model, which uses it to phrase its next action, and sometimes the end user, who hears a softened version of it. A raw stack trace serves neither. The description is where the tool's authorial care shows up, and it is what keeps the agent's recovery both correct and humane.

How the agent consumes structured error metadata

Loading diagram...

The boolean gates the fast path; category and description guide the rest.

Worked example

You are designing the error responses for an issue_refund MCP tool used by a customer-support agent.

A refund can fail in several ways, and your job is to make each failure self-describing. You decide that every error result the tool returns will carry the same three fields.

When the payments service times out, the tool returns errorCategory: "transient", isRetryable: true, and description: "Payments service did not respond in 5s; safe to retry." The agent reads the boolean, retries once with a short backoff, and usually succeeds.

When the caller passes a negative amount, the tool returns errorCategory: "validation", isRetryable: false, and description: "amount must be greater than 0; received -50." The agent does not blindly retry; it corrects the amount and issues a fresh call.

When the refund exceeds the self-service limit, the tool returns errorCategory: "business", isRetryable: false, and description: "Refunds over $500 require supervisor approval." The agent surfaces that exact sentence to the customer and offers to escalate, never retrying.

Notice what the metadata bought you. The agent's recovery code is now a short, boring branch on isRetryable and errorCategory, no string matching, no heuristics. The intelligence lives in the tool's response shape, and the agent simply obeys it. That is the payoff of structured error metadata: predictable recovery, designed once at the tool, reused on every call.

Common misreadings to avoid

Misconception

If the description text explains the failure, the agent can work out retryability from it, so isRetryable is redundant.

What's actually true

Asking the model to infer retryability from prose reintroduces exactly the guesswork structured metadata removes. The isRetryable boolean is an explicit, machine-readable field the agent branches on directly; omitting it is treated as a defect because it forces the agent to guess whether to retry.

Misconception

Error metadata should expose the full internal exception so engineers can debug from the transcript.

What's actually true

The description is read by the model and sometimes by the customer, not a debugger. It should be instructive and, for business errors, customer-friendly. Dumping a raw stack trace bloats context, leaks internals, and gives the agent nothing actionable to do next.

Putting the fields on the actual response

It is worth being concrete about where these fields live, because the abstraction can float free of the wire format. An MCP tool that fails still returns a normal result with its error flag set; the structured metadata rides inside that result's content. In practice the tool serialises an object, errorCategory, isRetryable, description, into the content, and when the tool declares an output schema it can also surface the same object as structured content. The error flag says "this is a failure"; the metadata object says "and here is everything you need to handle it."

This layering is why structured metadata complements, rather than competes with, the error flag from earlier in the task statement. The boolean is the coarse signal that something failed at all. The metadata is the fine-grained description of which failure and what to do. A tool that sets the flag but ships a bare message is signalling failure without structuring it; a tool that ships rich metadata but forgets the flag risks having a successful-looking result. Mature tools do both: flag plus fielded metadata, every time.

Consistency rules the agent relies on

Structured metadata only helps if its fields tell a single coherent story. The cardinal rule is that errorCategory and isRetryable must never contradict each other. Transient pairs with true; validation, business, and permission pair with false. A response that claims a business category but flags isRetryable: true is internally inconsistent, and an agent that trusts one field over the other will behave unpredictably depending on which it reads first.

The description has to stay consistent too. If the category is permission and the boolean is false, the description should explain an access problem, not a timeout. When the three fields agree, the agent can read any one of them and reach the same conclusion, which is exactly the robustness you want. When they disagree, you have built a response that is worse than a plain message, because it actively misleads. Treat the trio as one assertion expressed three ways, not three independent opinions.

Designing the schema once, reusing it everywhere

The real leverage of structured error metadata appears when every tool on a server shares the same error shape. If lookup_order, issue_refund, and update_address all return failures in the identical errorCategory / isRetryable / description form, the agent needs exactly one recovery routine to handle all of them. Uniformity turns error handling from a per-tool special case into a single, well-tested branch that every tool benefits from.

That uniformity is also a maintenance win. When a new tool joins the server, it inherits the established error contract and the agent handles its failures correctly on day one, with no new recovery code. Designing the error schema deliberately and once, rather than improvising a different shape per tool, is the difference between a server whose failures are predictable and one whose every tool surprises the agent in a new way.

What to leave out

Just as important as the required fields is restraint about the optional ones. Structured metadata should carry what the agent needs to decide, not everything the tool happens to know. Internal exception classes, database identifiers, full stack traces, and server-side retry counters do not help the agent choose a recovery and only bloat the context and risk leaking internals. The guiding principle from Anthropic's tool guidance applies: return high-signal information and omit the rest.

A practical test is to ask, for each field, "what would the agent do differently if this value changed?" If the answer is nothing, the field does not belong in the response. Retry bookkeeping, for instance, is the agent's or framework's concern, not something the tool should narrate. Keeping the metadata lean keeps recovery decisions sharp and the transcript clean.

Leanness has a token cost dimension too. Every field you add to an error response is re-sent and re-processed on each subsequent turn of the agentic loop, because the failure becomes part of the running history the model carries forward. A verbose error object that dumps internals therefore taxes every later turn, not just the one where the failure occurred. Three tight, decision-relevant fields cost almost nothing and say everything the agent needs; a sprawling diagnostic blob quietly inflates context for the rest of the conversation while adding no recovery value.

How this is tested

This knowledge point sits at the apply level, so exam items ask you to design or critique a response shape, not merely recall the four categories. A stem might show a tool that returns only a message string and ask which field is missing; the answer is the isRetryable boolean, because without it the agent cannot decide whether to retry. Other items present a business error and ask how the response should be shaped, expecting isRetryable: false paired with a customer-friendly explanation. The throughline is that recovery is automatic only when the metadata is complete and internally consistent.

Check your understanding

A reviewer audits an MCP tool whose every error returns a single field: a free-text message like 'could not complete request.' The agent built on it sometimes retries refusals it should not, and sometimes gives up on timeouts it should retry. Which change most directly fixes the behaviour?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Structured Error Metadata: errorCategory and isRetryable