- In short
- A JSON schema design review is the evaluative judgement of whether an extraction schema will hold up against varied real documents. It inspects for three recurring pitfalls: all-required fields that force fabrication when data is missing, enums with no unclear value that force incorrect classification, and rigid category sets with no other option that silently discard edge cases.
What a JSON schema design review evaluates
A JSON schema design review is the act of judging, before deployment, whether an extraction schema will survive contact with real documents. This is the evaluate-level capstone of Task 4.3: you are no longer designing fields or diagnosing a failure, you are forming a defensible verdict on someone else's schema and justifying it. The exam frames it exactly that way, handing you a finished schema and asking whether you would approve it, and the credited answer always rests on how the schema behaves under the documents it has not yet seen.
The mental discipline is adversarial imagination. A schema looks fine against the tidy example its author had in mind. The reviewer's job is to picture the document that breaks it: the invoice with no purchase order number, the support ticket that fits two categories, the contract structured unlike any in the test set. A schema that has no honest response to those cases is not merely imperfect; it is a fabrication generator, because every gap it cannot represent becomes a value the model invents.
- JSON schema design review
- An evaluative inspection of an extraction schema that predicts its behaviour on varied and edge-case documents, flagging structural choices, chiefly all-required fields, missing unclear enums, and rigid categories, that will force the model to fabricate or misclassify rather than report honestly.
A JSON schema design review checklist
Three questions catch the large majority of fragile schemas, and the exam draws its distractors from all three.
- Which fields are required, and could a real document omit any of them? Every required field is a promise that the value always exists. For fields that are genuinely universal, an identifier the format guarantees, that is fine. For anything a document might lack, required is a fabrication instruction in disguise. The reviewer flags required fields that should be nullable.
- Do the enums have an escape hatch for ambiguity? A category enum without an
unclearvalue forces a definite label onto indefinite data. The reviewer checks whether genuinely ambiguous inputs have an honest home, and if not, recommends addingunclear. - Can the category set represent the unforeseen? A fixed list of categories assumes you enumerated reality correctly. The reviewer asks what happens to a value that fits none of them, and if the answer is it gets forced into the nearest category, recommends an
othervalue with a freeform detail string so edge cases are preserved rather than mislabelled.
A thorough review adds a fourth question that connects to the limits of schemas: does the surrounding pipeline include validation for the semantic errors the schema cannot catch, such as totals that must reconcile? A schema can be well shaped and still ship bad data if nothing downstream checks meaning.
Trade-offs a reviewer has to weigh
Evaluation is not a checklist applied blindly; it is judgement about trade-offs. Loosening every field to nullable can hollow out a contract that downstream systems depend on, so the reviewer weighs the cost of an honest null against the cost of a fabricated value, and for extraction from varied sources the honest null almost always wins. Likewise, adding unclear and other shifts work onto a review queue, which is a real operational cost, but it is a cost you can see and manage, unlike silent misclassification, which corrupts your data invisibly. The strong reviewer makes these trade-offs explicit rather than reaching for a reflexive make everything optional.
Reviewing a colleague's extraction schema
Worked example
A teammate submits a schema to extract medical-device complaints. Every field is required, the severity field is an enum of low, medium, and high, and the device_category field is an enum of five fixed device types.
The schema is clean, well named, and would pass any superficial look, so the temptation is to approve it. The reviewer instead runs each field through the stress test. The required serial_number field is the first flag: many complaint reports arrive without a serial number, and because the field is required the model will manufacture one, seeding the safety database with fabricated identifiers. That alone is grounds to reject the schema as written.
The severity enum is the second flag. Complaints frequently describe ambiguous or mixed-severity events, and with only low, medium, and high the model must force a definite call on indefinite data, distorting exactly the signal a safety team relies on. The reviewer recommends an unclear value so genuinely ambiguous reports are routed to a human rather than silently bucketed. The device_category enum is the third flag: five fixed types cannot cover a long tail of devices, and without an other value plus a detail string those reports will be mislabelled as the nearest type, quietly losing the very edge cases a regulator cares about.
The verdict is a reasoned reject-with-changes, not a vague this could be better. Make serial_number and other sometimes-absent fields nullable, add unclear to severity, and add other with a detail string to device_category. The reviewer also notes that the pipeline needs a validation step for semantic consistency, since no version of this schema can guarantee the narrative matches the structured fields. That combination of specific flags, concrete remedies, and an explicit trade-off rationale is what an evaluate-level answer looks like.
Common misconceptions to avoid
Misconception
A schema where every field is required is the safest design because it guarantees complete records.
What's actually true
Misconception
If the schema parses and matches the example document, it is ready to approve.
What's actually true
What separates a strong review from a weak one
A weak review reads the schema for tidiness: are the names sensible, are the types reasonable, does it match the example. A strong review reads the schema for behaviour under stress: what does each field do when the document is missing it, ambiguous about it, or shaped unlike anything in the test set. The difference is the direction of attention. Tidiness looks at the schema in isolation; behaviour looks at the schema in collision with the messy world it will actually meet. An evaluator who only checks tidiness will approve fragile schemas all day, because fragility is invisible until a real document arrives to expose it.
The strongest reviews also produce specific, actionable verdicts rather than vague impressions. Saying the schema could be more robust is nearly useless to the author; saying make serial_number nullable because a third of reports omit it, and add an unclear value to severity for mixed cases, gives them a concrete change and the reason behind it. An evaluate-level answer on the exam mirrors this: it names the flaw, names the fix, and explains the consequence of leaving it unaddressed, because that triplet is exactly what a real design review delivers.
The cost of over-loosening
Good evaluation is not a one-way push toward making everything optional, and a sophisticated reviewer guards against over-correction. A schema in which every field is nullable and every enum has an escape hatch is permissive to the point of weakness, because downstream systems lose the guarantees they were relying on and the review queue fills with cases the pipeline could have handled. The reviewer weighs each loosening against its cost: a null is honest but it pushes work downstream, an unclear value is honest but it routes a record to a human. The aim is a contract that is exactly as strict as the data supports, no stricter and no looser.
This is why the skill is genuinely evaluative rather than mechanical. There is no rule that says make field X nullable; there is only judgement about whether real documents of this kind omit field X often enough to justify the loosening. A reviewer who applies the patterns without weighing the trade-offs will swing from one failure mode to the other, replacing a fabrication risk with a contract too weak to be useful. Calibration, not reflex, is the mark of a strong evaluation.
Evaluating the pipeline around the schema
Finally, a complete review looks past the schema to the system that surrounds it. A perfectly shaped schema still ships bad data if nothing downstream validates the semantic constraints it cannot express, so the reviewer asks whether totals are reconciled, whether nulls are acted upon rather than ignored, and whether unclear and other values are actually routed somewhere useful. A schema and its pipeline are evaluated together, because the schema's honest gaps are only valuable if something consumes them. Approving a schema while ignoring whether the pipeline handles its outputs is half a review, and the exam rewards the candidate who sees the whole loop rather than the schema in isolation.
Reviewing for compilation cost, not just fabrication risk
A complete schema review weighs a second axis alongside fabrication risk: whether the schema will compile efficiently under Claude's structured outputs. The same constrained-decoding mechanism that guarantees valid shape has to turn the schema into a grammar, and Anthropic is explicit that some constructs are expensive. Union types, written as anyOf or as type arrays such as ["string", "null"], are called out as particularly costly, and every optional parameter enlarges the grammar's state space. There is a documented complexity ceiling with a 180-second compilation timeout behind it, so a schema that is honest but baroque can still be rejected or slow to compile. The reviewer therefore balances the fabrication-resistant patterns this page champions against their compile cost, reaching for nullable unions and optional fields where the data genuinely needs them rather than sprinkling them everywhere.
Two further checks round out a production-grade review. First, reserve strict: true for the tools where a schema violation is materially harmful, since marking everything strict spends compile budget and offers little where a violation would be harmless. Second, treat the schema as data and not just structure: protected health information must never appear in schema metadata such as property names, enum values, const values, or pattern regular expressions, because the compiled schema is handled differently from message content. A reviewer who flags only fabrication risk has done half the job; the other half is confirming the schema is lean enough to compile and clean enough to be safe.
How it shows up on the exam
Because this knowledge point is pitched at the evaluate level, its Domain 4 questions ask for a judgement and its rationale. A Scenario 6 stem typically presents a schema, often described as having all required fields and tidy enums, and asks whether to approve it or what most needs changing. The seductive distractor praises the schema's completeness or strictness, which is precisely the flaw. The credited answer identifies that all-required fields invite fabrication on varied documents, that the enums need an unclear escape hatch, or that the categories need an other option, and it frames the change as protecting data honesty. Read any all-required, no-escape-hatch schema as a fabrication risk and you will consistently choose the option that values honest gaps over a contract that forces the model to guess.
Under exam conditions the fastest tell is the adjective the stem uses to praise the schema. Words like complete, strict, or every field required are written to sound reassuring, and they are the exact properties a seasoned reviewer treats as warning signs on varied real documents. When you notice the stem complimenting a schema for its rigidity, flip the compliment into a question: what happens here when a document omits a field, fits no category, or arrives in an unexpected shape. The option that answers that question with nullable fields, an unclear value, or an other bucket is the defensible verdict, and the option that simply admires the strictness is the trap.
You are asked to approve a schema for extracting job postings from thousands of varied career pages. It marks salary_range, years_experience, and remote_policy all as required, and employment_type is an enum of full_time and part_time only. What is the strongest evaluation?
People also ask
How do you evaluate a JSON schema for extraction?
Why are all-required fields a problem in an extraction schema?
What makes an extraction schema fragile?
Watch and learn
Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.
Tool use with the Claude 3 model family
Why watch: Anthropic's official walkthrough of defining tool input_schemas, the foundation a learner must understand before evaluating whether a schema's required fields and enums are well designed.
More videos for this concept
References & primary sources
Master this concept with Archie
Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.