AI Skill Certs
Tool Design & MCP Integration·Task 2.2·Bloom: evaluate·Difficulty 4/5·9 min read·Updated 2026-06-07

Error Response Scenario Analysis: Category and Recovery

Implement structured error responses for MCP tools

SUBy Solomon UdohReviewed by Solomon UdohAI-assisted · human-reviewed
In short
Error response scenario analysis is the evaluative skill of taking a described tool failure, determining its category and whether it is retryable, and selecting the correct recovery and customer communication. It is where the isError flag, the four categories, structured metadata, and the access-versus-empty distinction converge into one judgement.

What error response scenario analysis asks of you

Error response scenario analysis is the capstone skill of Task Statement 2.2: given a described failure in a realistic setting, evaluate it and choose the single correct response. It is graded at the evaluate level because there is rarely a fact to recall, instead you weigh the situation, decide what kind of failure it is, judge whether retrying could help, and select both a recovery strategy and the message the user should hear. Every earlier knowledge point in this task statement feeds into this one judgement.

The reason error response scenario analysis earns its own knowledge point is that the building blocks are easy to state and easy to misapply under pressure. Knowing the four categories does not guarantee you will spot that a "customer not found" during an outage is really an access failure, or that a refund refusal is a business error rather than something to retry. The skill is the disciplined application of the parts to a concrete case.

A decision procedure you can run every time

The most reliable way to handle these items is to run the same short procedure rather than reacting to surface wording. First, ask what actually failed: did the tool reach its data source at all? If not, you may be looking at an access failure masquerading as an empty result. Second, classify the failure into one of the four categories. Third, read off retryability from the category: transient is retryable, validation is retryable only after correcting the input, business and permission are not. Fourth, choose the recovery the category implies. Fifth, decide what the user should be told.

Running this procedure turns a tricky judgement into a checklist. The wrong answers in scenario questions are usually plausible reactions, retry, escalate, apologise, that are correct for some category but not the one in front of you. Walking the steps keeps you from grabbing the recovery that merely feels right.

The scenario-analysis decision procedure
Loading diagram...
Same five questions, every scenario. The wrong answers are usually the right recovery for the wrong category.

Communication is part of the answer

What distinguishes this knowledge point from pure classification is that the message matters as much as the mechanism. For non-retryable failures especially, the agent owes the user a clear, customer-friendly explanation. Anthropic's guidance to write instructive error text applies here at the human layer: a refund that policy forbids should produce "Refunds over $500 need a supervisor's approval, and I have flagged this for one," not a silent retry loop or an opaque "request failed."

Good communication also prevents a subtler failure: implying that retrying might help when it cannot. Telling a customer "let me try that again" on a business error sets a false expectation and erodes trust when the same refusal returns. Scenario analysis is judged partly on whether the chosen response tells the user the truth about what is and is not possible.

Where the earlier knowledge points converge

This page is deliberately the meeting point of the whole task statement. The isError flag is what made the failure visible in the first place. The four categories are the vocabulary you classify with. Structured metadata, errorCategory, isRetryable, a description, is the form that makes the classification reliable rather than guessed. The access-failure-versus-empty-result distinction is the trap you check for before you even classify. Scenario analysis is simply using all of them together, fast and correctly, on a case you have not seen before.

That convergence is also why the exam saves the hardest items for here. A single stem can require you to notice a disguised access failure, classify the underlying error, reject an inviting retry, and pick a customer message, four decisions that each map back to a different prerequisite knowledge point.

Classify
category before action
isRetryable
gate the retry decision
Reroute
business / permission paths
Explain
honest customer message

Worked example

A support agent is handling a customer who wants a $1,200 refund. The agent calls issue_refund, which returns a failure: errorCategory business, isRetryable false, description 'Refunds above $500 require supervisor approval.'

Walk the procedure. First, did the tool reach its data source? Yes, the refund service ran and applied a rule, so this is not a disguised access failure. Second, classify it: the metadata says business, and the description confirms a policy deliberately refused the amount. Third, retryability: isRetryable is false, and that is consistent with a business error, so retrying the identical $1,200 refund is off the table. Fourth, recovery: business errors need an alternative workflow, so the agent routes the request to a supervisor who can approve refunds over the limit. Fifth, communication: the agent tells the customer plainly, "Refunds above $500 need a supervisor's sign-off. I have sent this to one and they will follow up shortly."

Now consider the tempting wrong moves the procedure rules out. Retrying the refund would just re-trigger the same rule. Apologising and dropping it would abandon a customer who has a legitimate path forward. Escalating as if it were a permission problem would misroute it to access provisioning rather than refund approval. Only the business-error path, reroute plus an honest, specific message, survives the five questions. That is error response scenario analysis: not picking a reaction, but reasoning to the one response the situation actually warrants.

Common misreadings to avoid

Misconception

When in doubt on a scenario question, choosing 'retry the call' is the safe default.

What's actually true

Retry is correct only for transient failures, and for validation failures after the input is fixed. Defaulting to retry fails every business and permission scenario, and it fails any access failure that needs escalation. The safe move is to run the classification procedure, not to reach for a universal retry.

Misconception

Once you have chosen the right recovery action, the wording of the user-facing message is a separate, lower-stakes concern.

What's actually true

At the evaluate level the message is part of the correct answer. Non-retryable failures require a clear, customer-friendly explanation that sets accurate expectations. A response that picks the right action but tells the user 'let me try again' on a non-retryable error is still wrong, because it misrepresents what is possible.

Read the scenario for the cause, not the symptom

The most common way to miss these items is to react to the symptom in the stem rather than diagnosing the cause beneath it. "The call failed" is a symptom shared by every category; "the agent retried and escalated" is a behaviour that can be right or wrong depending on what failed. The evaluative move is to look past what happened on the surface and ask why it happened: was a service briefly down, was the input malformed, did a rule refuse the request, was access missing, or did the tool never reach its source at all?

Anchoring on cause also immunises you against emotionally loaded phrasing. A stem may stress that a customer is "frustrated" or that a deadline is "urgent," nudging you toward a hasty retry or escalation. Urgency does not change a failure's category. A business error is still non-retryable when the customer is impatient; an access failure still needs an honest message when the clock is ticking. Evaluate the cause coldly first, then let empathy shape the wording of the response, not the choice of recovery.

Classify by type, and capture the request ID for escalation

Two practical details separate a clean scenario answer from a sloppy one. The first is that the canonical classifier is the error type, not the raw HTTP status. Anthropic's API returns errors as JSON with an error.type field, and that string (rate_limit_error, permission_error, overloaded_error, and so on) is what you actually reason from. Several stems lean on this: a status code alone can be ambiguous, but the type tells you whether you are looking at a transient overload to back off from or a permission wall to escalate. Read the type, then decide the recovery.

The second detail belongs to the recovery half of the answer, and it matters most when escalation is the right move. When a scenario routes a failure to a human or a support queue (an internal 500 api_error, a persistent 529 overloaded_error, or a billing block the agent cannot clear), the response should capture the request identifier so the failure can be traced afterwards. Anthropic returns a request_id in the error body and a request-id header on every response; on Claude running on AWS, responses also carry an x-amzn-requestid. An escalation that names the request ID is actionable, while one that says only "it failed" forces whoever picks it up to start from nothing.

On the exam this surfaces as the gap between a response that merely picks the right action and one that also closes the loop. The strongest answer classifies by type, respects retryability, routes to the correct workflow, and, when it hands the problem onward, carries the identifier that lets the next responder resume exactly where the agent left off.

The distractors are recoveries for the wrong category

A structural insight that makes these questions far easier: in a well-built scenario item, the wrong options are usually correct recoveries for the wrong category. One distractor retries (right for transient, wrong here), another escalates as permission (right for permission, wrong here), another reports an empty result (right for a true no-match, wrong here). Each is a plausible action lifted from a different branch of the decision procedure. Recognising this turns the question into a matching exercise: identify the actual category, then find the option whose recovery matches it.

This is why running the procedure beats pattern-matching on keywords. If you classify first and only then scan the options, the distractors lose their pull, because you already know which category's recovery you are looking for. Skip the classification and every option looks individually reasonable, which is exactly the confusion the item is engineered to create.

Edge cases the capstone loves

Because this is the evaluate-level apex of the task statement, it gravitates toward the trickiest combinations. Expect the disguised access failure, a tool reporting "no records" during an outage, where the correct first move is to recognise that the source was never reached, before any classification of the underlying error. Expect the exhausted-retry case, where a genuinely transient failure has already consumed its retry budget and the right answer has shifted from "retry" to "escalate," even though the category is still transient. Expect the ambiguous 429, where you must judge whether it is a momentary spike (transient) or an exhausted quota (effectively a limit that retrying will not clear).

Each of these rewards the same habit: do not stop at naming the category, also check whether retryability still applies in this state. A transient failure that has burned its budget no longer warrants another attempt; an access failure must be unmasked before it can be classified at all. The capstone is testing whether you can hold several prerequisite ideas at once and apply them to one unfamiliar case without dropping any of them.

Tying recovery to communication

Finally, remember that at this level the response is judged as a whole, recovery and message together. Choosing the correct action but pairing it with a misleading message is still a wrong answer. A non-retryable failure must be communicated as such, clearly, specifically, and without implying that another attempt might help. The best responses name what cannot be done, state what will happen instead, and do so in language a customer can act on.

That coupling is the difference between a technically correct agent and a trustworthy one. The exam, like a real support desk, rewards the response that both does the right thing and tells the user the truth about it. Treat the message as part of the answer, not a flourish added after the real decision is made.

How this is tested

These are the toughest items in Domain 2 because they combine several decisions into one. A stem will describe a failure with enough texture to tempt at least two plausible recoveries, and your task is to evaluate which is actually correct for the category in play. Expect business-error scenarios that bait a retry, access failures dressed as empty results, and permission errors that look transient. The scoring rewards the response that classifies correctly, respects retryability, routes to the right workflow, and communicates honestly, the same five-step procedure, applied to an unfamiliar case.

Check your understanding

A customer asks an agent to change the email on an account. The agent calls update_email, which returns: errorCategory permission, isRetryable false, description 'This action requires account-owner verification, which has not been completed.' The customer is impatient. What is the best response?

People also ask

How do you analyse an error-handling scenario?
Check whether the tool reached its data source, classify the failure into one of the four categories, read off retryability, choose the matching recovery, and decide what the user should be told. Category drives the action; retryability gates whether you try again.
What recovery does a business error need in a scenario question?
An alternative workflow, not a retry. Route to the path policy allows, such as escalating to a supervisor, and relay a clear explanation of the limit instead of repeating the refused request.
How should non-retryable errors be communicated?
With a plain, customer-friendly explanation of what cannot be done and what happens next, setting accurate expectations rather than exposing internal codes or implying that a retry might help.

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

AI Engineer

Claude Agent SDK [Full Workshop]: Thariq Shihipar, Anthropic

Why watch: Demonstrates how tool errors are passed back to Claude and how to handle them differently per situation, the practical skill behind choosing the right recovery strategy for a given error scenario.

More videos for this concept

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying