Confidence Based Routing for AI Review

In short: Confidence based routing asks the model to attach a self-reported confidence to each finding, then routes findings below a threshold to a human reviewer while letting high-confidence findings through automatically. The threshold is not guessed; it is calibrated against a labelled validation set so the chosen confidence level corresponds to a known accuracy.

What confidence based routing is

Confidence based routing is a way to spend limited human review time where it matters. Instead of sending every finding to a person or trusting every finding blindly, you ask the model to attach a confidence to each finding it produces, and then you route. Findings the model is confident about flow through automatically; findings it is unsure about are escalated to a human reviewer. The effect is a triage system: the cheap, abundant resource (model inference) handles the easy cases, and the expensive, scarce resource (human judgement) is reserved for the cases most likely to be wrong. This knowledge point sits in Domain 4 and is tested at the apply level, because the exam wants you to wire the routing up correctly rather than merely describe the idea.

The pattern is an instance of the broader routing workflow that Anthropic describes for agentic systems, where an input is classified and directed to a specialised follow-up. Here the classification is the model's own confidence, and the two follow-ups are automatic acceptance or human review. The power of the pattern is that it scales: as long as the confidence signal is trustworthy, you can automate the bulk of the work and still guarantee a human looks at the riskiest fraction.

Confidence based routing: A review pattern where the model self-reports confidence per finding and the system routes each finding by that score: confident findings are accepted automatically, low-confidence findings go to a human. The deciding threshold is set by calibration against labelled data, not by guesswork.

Confidence-based versus rule-based routing

It helps to place this pattern next to the alternative it improves on. Rule-based routing sends an item down a path using fixed, deterministic conditions: escalate this finding because it touches a file matching a security path, or because it contains a banned keyword. Those rules are transparent and cheap, but they only catch the cases you anticipated and wrote a rule for, and they say nothing about how likely a given finding is to be wrong.

Confidence-based routing replaces the hand-written condition with a graded signal. Instead of a yes-or-no rule, each finding carries a continuous confidence and a single calibrated threshold decides the path. That lets the system triage cases nobody enumerated in advance: a novel but uncertain finding still falls below the threshold and reaches a human, where a purely rule-based system would have waved it through because no rule matched it. The two approaches are not exclusive, and a mature pipeline often keeps a few hard rules for known-critical categories while using calibrated confidence to handle everything else, getting deterministic guarantees where they matter and graded triage everywhere else.

Why a raw confidence score cannot be trusted

The seductive mistake is to treat the number the model reports as if it were a true probability. It usually is not. Language models, like many neural systems, tend to be poorly calibrated and systematically overconfident: a model that says it is ninety percent sure may be right far less than ninety percent of the time. If you route on that raw number, you will wave through a stream of confident-but-wrong findings and undermine the entire point of the system. The exam treats trusting an uncalibrated confidence score as a recognisable error, because the danger is invisible until something confidently wrong slips past.

Calibration is what converts a vibe into a usable signal. A confidence score is well calibrated when its stated level matches reality: among all findings the model marks at a given confidence, roughly that fraction should actually be correct. Until you have measured that relationship, you do not know whether the model's ninety means ninety, seventy, or fifty, and so you cannot responsibly decide what to automate.

Calibrating the threshold with labelled data

Calibration requires ground truth. You take a labelled validation set, examples where you already know the correct answer, run the model over it, and record both its confidence and whether each finding was actually right. Now you can see the real accuracy at each confidence level and choose a threshold deliberately. Anthropic's evaluation guidance is built on exactly this foundation: define measurable success criteria and test against a held-out set so that your numbers reflect real performance rather than hope. Generating and labelling those datasets is itself a task you can accelerate with the model, but the labels must be trustworthy because every routing decision rests on them.

Choosing the threshold is a trade-off you control. Set it low and more findings clear the bar automatically, raising throughput but letting more borderline errors through. Set it high and more findings are escalated to humans, catching more errors but spending more human time and slowing the pipeline. There is no universally correct value; there is only the value that matches your tolerance for missed errors versus your budget for human review. Because the relationship between confidence and accuracy can drift, you recalibrate whenever the prompt, the model, or the input distribution changes, since a threshold tuned on last quarter's data can quietly become wrong.

self-report

model attaches confidence per finding

threshold

calibrated on labelled validation data

route

low confidence to human, high to auto-accept

The routing flow end to end

Putting the pieces together gives a small pipeline. The model produces findings, each tagged with a confidence. A router compares each confidence to the calibrated threshold. Anything at or above the threshold is accepted automatically and recorded; anything below is queued for a human, who makes the final call and whose decisions can feed back into the next round of calibration. The human queue is deliberately the minority of cases, which is what makes the system affordable, and the labelled outcomes it produces are exactly the data you need to keep the threshold honest over time.

Confidence based routing pipeline

Loading diagram...

High-confidence findings are automated; low-confidence findings are escalated, and human decisions feed recalibration.

Confidence routing and the rest of the review architecture

Confidence based routing does not replace independent review or multi-pass review; it composes with them. The findings being routed should come from an independent instance rather than the model grading its own work, and on a large change set they should come from a multi-pass review so that depth is consistent before confidence is even measured. Routing then sits on top as the human-attention allocator. This layering is why the knowledge point is a prerequisite for the full pipeline design that follows: you need the confidence dial in your toolkit before you can reason about an end-to-end review system that balances automation against human oversight.

How this is tested on the Claude Certified Architect exam

This knowledge point appears in Scenario 5, Claude Code for Continuous Integration, and in Scenario 6, Structured Data Extraction, wherever a workflow must decide which model outputs a human should see. A common exam setup describes a team that automated review using the model's confidence scores directly and was burned by confident mistakes, then asks what they should have done. The answer is to calibrate the confidence against a labelled validation set and route on the calibrated threshold, not to trust the raw score and not to abandon automation entirely.

Watch for distractors that propose escalating everything (which defeats the purpose), trusting the score as-is (the central trap), or fixing calibration by lowering temperature (a sampling change, not a calibration of the confidence-to-accuracy mapping). The exam rewards the architect who treats confidence as a signal to be validated and the threshold as a dial to be tuned against ground truth.

Worked example

A data-extraction service routes low-confidence field extractions to human reviewers, but humans are overwhelmed with correct extractions while genuinely wrong ones still pass automatically.

The team wired routing to the model's raw confidence: anything below 0.8 went to a human. In practice the model was overconfident, so many wrong extractions carried scores like 0.9 and sailed through, while plenty of correct extractions landed at 0.7 and clogged the human queue. The routing was busy and still leaking errors, because the number it routed on did not mean what the team assumed.

They fixed it with calibration. Using a labelled validation set of several hundred documents with known-correct field values, they measured the actual accuracy at each confidence level and discovered that real ninety-percent accuracy only began around a reported score of 0.97. They reset the routing threshold to that calibrated point. Now confident-but-wrong extractions fell below the new bar and reached a human, while the truly safe extractions cleared it, shrinking the human queue and catching the dangerous cases.

Nothing about the model changed; the team simply stopped trusting the score and started trusting the calibration. They also scheduled a recalibration whenever the extraction prompt or the document mix changed, because the confidence-to-accuracy mapping is not fixed. That discipline, calibrate then route, is the whole skill this knowledge point assesses.

Asking the model for confidence well

How you elicit the confidence matters as much as how you use it. A bare instruction to attach a number invites the model to emit a round, overconfident figure with little grounding, so well-designed prompts ask the model to consider specific reasons a finding might be wrong before committing to a score, and to express confidence on a defined scale tied to concrete meanings rather than a vague feeling. Asking for a short justification alongside the score also gives a human reviewer something to act on when a finding is escalated, turning the queue from a list of bare verdicts into a set of reasoned cases.

None of this removes the need for calibration, because even a carefully elicited score still has to be measured against ground truth, but a better-elicited score tends to calibrate to a cleaner mapping and to separate likely-right from likely-wrong findings more sharply. The exam expects you to treat confidence as something you design for, not just a field you read off the output.

Where confidence routing goes wrong

Two failure modes recur, and the exam draws distractors from both. The first is over-trust: routing on the raw score, accepting confident findings that calibration would have shown to be only modestly accurate, and quietly shipping errors. The second is over-escalation: setting the threshold so cautiously, or skipping calibration so the score is meaningless, that nearly everything lands in the human queue and the automation saves no effort at all. Both come from the same root, a confidence signal that has not been tied to real accuracy, and both are fixed by the same discipline of calibrating against labelled data and choosing the threshold deliberately.

A third, subtler failure is letting a once-calibrated threshold drift. A prompt change, a model upgrade, or a shift in the input mix can move the confidence-to-accuracy mapping, so a number that was right last quarter silently becomes wrong. Scheduling recalibration and watching the human queue for sudden changes in volume are how a mature pipeline keeps the routing honest over time.

Misconceptions to avoid

Misconception

If the model reports high confidence, the finding is reliable enough to accept automatically.

What's actually true

Models are frequently overconfident, so a high raw score does not guarantee high accuracy. Only after calibrating the score against labelled ground truth does a confidence level correspond to a known accuracy you can route on.

Misconception

Confidence based routing means sending more cases to humans, so it always reduces throughput.

What's actually true

The threshold is a dial in both directions. A well-calibrated threshold sends only the genuinely uncertain minority to humans and automates the confident majority, which can increase throughput while still catching the riskiest findings.

Check your understanding

A review pipeline routes findings below a fixed confidence of 0.8 to humans, but confident-yet-wrong findings keep getting auto-accepted while reviewers drown in correct ones. What is the best fix?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Confidence Based Routing: Sending Low-Confidence Findings to Humans