Semantic Errors in Structured Output

In short: Semantic errors in structured output are mistakes in the meaning of extracted values that still satisfy the JSON schema: line items that do not sum to the stated total, values placed in the wrong field, and fabricated entries for required fields. Because a schema validates shape and type rather than truth, these errors pass validation and must be caught with validation-and-retry loops or cross-checks, not by editing the schema.

What semantic errors in structured output are

Semantic errors in structured output are mistakes that survive every validation a JSON schema can perform. The JSON parses, every field is the declared type, every enum value is allowed, and the record is still wrong, because the meaning of the values does not match the source. This is the analyse-level heart of Task 4.3: not knowing that schemas help, which is easy, but diagnosing precisely the failures they cannot touch.

It is the natural sequel to the syntax guarantee. Once a tool schema has removed malformed JSON from your worries, the errors that remain are all semantic, and they are harder because nothing throws an exception. A parse failure is loud; a number in the wrong field is silent. The official structured-outputs documentation draws this line explicitly, noting that the guarantees cover field presence, types, and required fields, and that you still need your own validation to ensure the values make business sense.

Semantic error: An extracted value that conforms perfectly to the JSON schema yet misrepresents the source: a value in the wrong field, a figure that contradicts other figures, or an invented entry for a required field. It passes schema validation because the schema checks structure, not truth.

Why semantic errors in structured output slip past schemas

A schema is a grammar for shape. It can say this object has a total that is a number and a line_items array of objects, and it can insist they are present and well typed. What it cannot say is that the total must equal the sum of the line_items, because that is a relationship between values, and JSON Schema has no vocabulary for cross-field arithmetic or for whether a particular string belongs in a particular slot. The constraint you actually care about lives one level above anything the schema can express.

That gap produces three recurring failure modes the exam returns to again and again:

Mathematical inconsistency. Line items of 30, 30, and 30 alongside a stated total of 100. Every number is a valid number; their relationship is wrong.
Field placement errors. The vendor name extracted into the customer field and the customer into vendor. Both are valid strings, so the schema is satisfied while the record is inverted.
Fabrication. A required field with no source value filled with a confident invention, the failure mode you defend against at design time with nullable fields but must also detect at run time.

Each of these is invisible to type checking, which is exactly why naming them as a distinct class is the skill being tested.

The diagnostic move: schema problem or value problem

The analytical reflex to build is a fork. When something is wrong with extracted data, ask first whether the defect is about the shape of the JSON or the truth of its values. If the JSON failed to parse or a field is the wrong type, that is structural, and a schema or tool_choice change is the right tool. If the JSON is impeccable but a total is off or a field is inverted, that is semantic, and no schema edit can ever fix it. Candidates who skip this fork reach for the schema reflexively and pick the wrong answer, because rewriting a schema to chase a semantic bug is effort spent on the one thing that cannot help.

Triaging an extraction defect

Loading diagram...

A schema can only ever resolve the left branch; the right branch needs validation logic the schema cannot express.

Detecting and resolving a non-summing invoice

Worked example

An accounts-payable pipeline extracts invoices with a strict schema. Every record is schema valid, but the finance team finds that on roughly one invoice in fifteen the line items do not add up to the extracted invoice_total.

Because the schema is satisfied, nothing in the pipeline complains, and the bad totals only surface when a human spot-checks a payment. An engineer's first instinct is to tighten the schema, perhaps by adding constraints to the number fields, but no number constraint can encode the line items must sum to the total relationship, so that effort goes nowhere. The defect is semantic, and the schema was never the right place to address it.

The working fix lives outside the schema. The pipeline adds a validation step that re-adds the line items in code and compares the result to invoice_total. On a mismatch it does not silently accept or silently drop the record; it sends the original document back to the model along with the failed extraction and the specific discrepancy, asking it to reconcile the figures. Often the model corrects a misread digit on the retry. A stronger variant has the schema itself carry both a stated_total and a calculated_total so the model is nudged to compute the sum, and a conflict_detected flag the pipeline can act on, turning the cross-check into part of the contract.

Crucially, the team also defines a stopping point. If the discrepancy persists because the source invoice is itself internally inconsistent, retrying forever is pointless, so after a bounded number of attempts the record is routed to a human. The semantic error is caught not by a cleverer schema but by computation over the values plus a retry loop with an exit, which is the architecture this knowledge point is pointing you toward.

shape

what a schema can validate

meaning

what only your code can validate

recompute

the move that surfaces a bad total

Common misconceptions to avoid

Misconception

If extracted values come out wrong, I should be able to constrain the JSON schema to reject them.

What's actually true

Schemas express shape and type, not relationships between values. A constraint like 'line items must sum to the total' or 'this string belongs in the vendor field' cannot be written in JSON Schema, so semantic errors must be caught by validation logic, not schema rules.

Misconception

A semantic error means the model is unreliable and the answer is to retry until it gets it right.

What's actually true

Retry helps only when the information exists and was misread or misplaced. When the source genuinely lacks the data, retrying reproduces the same gap, so the loop needs a bound and a human-escalation exit rather than infinite attempts.

A closer look at the three failure modes

Each semantic failure mode has a characteristic signature worth recognising on sight. Mathematical inconsistency shows up wherever a structure contains both parts and a whole: line items and a total, sub-scores and an overall score, quantities and a sum. The parts are individually valid and collectively wrong, and only arithmetic over the values exposes it. Field placement shows up wherever two fields share a type and a plausible content, such as two party names, two dates, or two amounts; the values are valid and merely swapped, so detection needs a check that ties each value back to its role in the source rather than just confirming its type.

Fabrication is the third and the most insidious, because it manufactures a value where the source had none, leaving nothing inside the record to compare against. Catching it usually requires either a nullable schema that would have let the gap show as a null in the first place, or a confidence signal, or a back-reference to the source text. The common thread across all three modes is that the evidence needed to detect the error lives outside the field's own type, which is precisely why a type-checking schema is structurally blind to them and why a separate validation layer is not optional.

The fixable and the unfixable

Not every semantic error responds to the same treatment, and conflating the two wastes effort. Some errors are fixable on a retry because the information was present and merely misread or misplaced: a transposed digit, a swapped field, a misparsed date. Feeding the document back with the specific error often yields a correct answer the second time. Other errors are unfixable by retry because the information is genuinely absent from the source: no amount of re-asking will conjure a purchase order number that the document never contained. A retry loop that cannot tell these apart will hammer the model forever on the unfixable cases.

The architectural consequence is that a validation loop needs an exit, not merely a check. After a bounded number of attempts, or as soon as the error is recognised as absence rather than misreading, the record should be routed to a human instead of retried. The maximum retry count is a safety guard against runaway loops, not the primary control; the primary control is distinguishing a fixable error from an unfixable one. Designing that distinction into the loop is what separates a robust pipeline from one that silently burns tokens chasing data that was never there.

Encoding the check into the contract

A particularly elegant technique folds the validation back into the extraction itself. Instead of trusting a single stated total, you ask the schema for two values: the total as printed in the document and the total as computed from the line items. When the two disagree, the discrepancy is visible in the record without any external arithmetic, and you can add a boolean such as conflict_detected that the model sets when the source data is internally inconsistent. This turns the model into a partner in catching its own semantic errors, surfacing exactly the mismatches that a naive single-value extraction would have buried behind one confident number.

Constraints Claude's structured outputs do not support

There is a sharper version of the schema-cannot-validate-meaning lesson that the documentation makes explicit: some JSON Schema constraints are not supported by structured outputs at all and must be removed from the schema you send. Anthropic calls out minimum, maximum, minLength, and maxLength among them, and advises moving the intent of such a constraint into the field's description rather than expressing it as an enforced rule. So even a single-field bound, such as an amount that must be positive or a code that must be five characters long, is not something the schema will police for you; it becomes guidance for the model, not a guarantee from the platform.

That reframes where range and length checks belong. If your pipeline must be certain an amount is non-negative or a quantity falls within plausible limits, that check lives in your application validation alongside the cross-field arithmetic this page already covers, not in the schema. The same logic explains a couple of mode incompatibilities worth knowing: citations and message prefilling are not compatible with JSON outputs, so an architecture that depends on either cannot also rely on strict JSON output mode for the same response. The throughline is unchanged and now even firmer: the schema constrains shape and type within a supported subset, and everything about plausibility, range, and meaning remains your code's job.

How it shows up on the exam

This is an analyse-level knowledge point, so the Domain 4 questions are diagnostic rather than definitional. A Scenario 6 stem will describe a record that passed schema validation and was still wrong, then offer you four plausible-sounding remedies. The most tempting wrong answer is almost always a schema change, because it pattern-matches to the words JSON and validation in the stem. The discriminating skill is recognising that a total that does not sum or a value in the wrong field is semantic, which puts the correct answer in the territory of cross-field validation, a recompute-and-compare check, or a retry loop with feedback. If you can articulate why the schema is structurally incapable of catching the described error, you will reliably pick the option that validates meaning instead of shape.

A useful habit for the exam is to mentally label every described defect as either a parse problem or a truth problem before you even read the options. The moment you classify a non-summing total or a swapped field as a truth problem, every answer that edits the schema, changes a model setting, or rewrites the prompt for format can be set aside, because none of them validate meaning. What remains is the option that computes, compares, or cross-references the values, and that is the one to choose. The discipline of classifying first and matching second keeps you from being pulled toward the schema by the surface vocabulary of the stem.

Check your understanding

A contract-extraction service returns records that always parse and always match the schema, yet legal review keeps finding the counterparty_name and client_name fields swapped on documents where the parties are introduced in an unusual order. What is the correct analysis and fix?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Semantic Errors in Structured Output: What JSON Schemas Cannot Catch