Concept deep dive·10 min read·20 June 2026

Claude Jarman: Prompt Patterns for Strict Structured Output

Claude Jarman techniques for reliable structured output: XML tags, schema design, tool_use validation, and retry patterns. Mapped to CCA-F Domain 4 (20% of the exam).

By Solomon Udoh · AI Architect & Certification Lead

Claude Jarman: Prompt Patterns for Strict Structured Output

The phrase "claude jarman" circulates in engineering forums as shorthand for a disciplined, craft-first approach to prompting Claude: precise constraints, explicit schema contracts, and no wishful thinking about what the model will infer. Whether you encountered it in a Slack thread or a GitHub discussion, the underlying question is the same: how do you make Claude produce structured output that is reliable enough to wire directly into a production pipeline? This post answers that question with patterns that map directly to Prompt Engineering & Structured Output, Domain 4 of the CCA-F exam, which carries 20% of the total exam weight.

What does "structured output" actually mean in a Claude agent?

Structured output means Claude returns data in a machine-readable format, typically JSON, that your code can parse without a second LLM call to clean it up. It is not the same as "output that looks tidy." A response can be beautifully formatted prose and still be unstructured from a pipeline perspective.

In practice, three mechanisms produce structured output with Claude:

  1. Constrained prompting - you instruct Claude to return JSON and describe the schema in the system prompt.
  2. tool_use with a JSON schema - you define a tool whose input schema is the structure you want; Claude is forced to populate it.
  3. Post-processing validation with retry - you parse the output, catch schema violations, and re-prompt with the error.

Each mechanism has a different reliability profile. Understanding when to use which one is the core of what practitioners mean by the claude jarman approach.

Which mechanism is most reliable for strict schema adherence?

tool_use is the most reliable mechanism for strict schema adherence when the schema is non-trivial. When you define a tool with a JSON Schema, Claude must produce a syntactically valid tool call; the API enforces this at the transport layer. You do not get free-form text that happens to contain JSON: you get a structured tool_use block.

json
{
"name": "extract_order",
"description": "Extract structured order data from the user message.",
"input_schema": {
"type": "object",
"properties": {
"order_id": { "type": "string" },
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"sku": { "type": "string" },
"quantity": { "type": "integer", "minimum": 1 }
},
"required": ["sku", "quantity"]
}
},
"status": {
"type": "string",
"enum": ["pending", "confirmed", "shipped", "cancelled"]
}
},
"required": ["order_id", "line_items", "status"]
}
}

Constrained prompting alone is less reliable for complex schemas. Claude will usually comply, but "usually" is not a contract. For high-stakes pipelines, the CCA-F exam consistently rewards deterministic solutions over probabilistic ones, which means tool_use over "please return JSON."

That said, tool_use is not a silver bullet. Semantic errors, wrong enum values, or logically inconsistent field combinations can still slip through. Validation and retry remain necessary.

What schema design choices reduce failures before they happen?

Schema design is upstream of prompting. A poorly designed schema produces failures that no prompt technique can fully prevent.

Design choiceLower failure rateHigher failure rate
Field namesDescriptive (invoice_total_usd)Ambiguous (total, amount)
EnumsTight, exhaustive listOpen string with "any value"
Required vs optionalMark truly optional fields optionalMark everything required
Nullable fieldsUse sparingly, only when absence is meaningfulOveruse as a catch-all
Catch-all pattern"category": "other", "category_detail": "..."Single free-text field
Nesting depthFlat or one level deepThree or more levels

The "other + detail" pattern deserves particular attention. When you have a classification field with a finite enum, add a companion detail field that is only required when category is "other". This prevents Claude from forcing a poor fit into a constrained enum while still keeping the primary field machine-filterable.

json
{
"category": {
"type": "string",
"enum": ["billing", "technical", "shipping", "other"]
},
"category_detail": {
"type": "string",
"description": "Required when category is 'other'. Describe the category in plain text."
}
}

For exam purposes, note that schema design questions appear in both Domain 4 (Prompt Engineering & Structured Output, 20%) and Domain 2 (Tool Design & MCP Integration, 18%). The overlap is intentional: the exam treats schema design as a cross-cutting concern.

Which prompt patterns actually improve structured output quality?

Four techniques have measurable impact. We rank them by leverage, not by complexity.

1. XML tags for section demarcation

Claude's training includes a large volume of XML-structured content. Wrapping input sections in semantic tags reduces the chance that Claude conflates source material with output instructions.

text
<source_document>
{{document_text}}
</source_document>
<task>
Extract the fields defined in the tool schema. Do not infer values not present in the source document. If a required field is absent, set it to null only if the schema permits; otherwise return an error flag.
</task>

This is especially valuable in retrieval-augmented pipelines where the retrieved context is long and potentially noisy. Without demarcation, Claude can "leak" retrieved text into output fields.

2. Explicit negative constraints

Tell Claude what not to do, not just what to do. "Return only the JSON object" is weaker than "Return only the JSON object. Do not include explanatory prose, markdown fences, or commentary before or after the object."

text
Return a single JSON object matching the tool schema.
Do not wrap it in markdown code fences.
Do not add keys not defined in the schema.
Do not hallucinate values for fields absent from the source.

3. Few-shot examples in the system prompt

Few-shot examples are the highest-leverage technique for ambiguous edge cases. A single well-chosen example of a hard case outperforms three paragraphs of instruction. Keep examples in the system prompt, not the human turn, so they persist across multi-turn conversations.

text
<example>
<input>Customer said: "I got the wrong size, want to swap for a medium."</input>
<output>{"intent": "exchange", "reason": "wrong_size", "requested_variant": "medium"}</output>
</example>
<example>
<input>Customer said: "Just checking where my order is."</input>
<output>{"intent": "status_inquiry", "reason": null, "requested_variant": null}</output>
</example>

Note the second example explicitly shows null for optional fields. This teaches Claude the null pattern without requiring prose explanation.

4. Self-check instruction

For high-value extractions, append a self-check step before the final output:

text
Before returning the JSON, verify:
- Every required field is populated.
- Enum values match the allowed list exactly (case-sensitive).
- No field contains a value inferred beyond what the source states.
If any check fails, correct the output before returning it.

This adds tokens but reduces downstream validation failures. On the CCA-F exam, proportionate fixes are rewarded: use self-check for high-stakes extractions, not for every trivial classification.

Prompts should be treated as code: version-controlled, tested against a regression suite, and reviewed before deployment to production agents.

Anthropic , Claude Documentation (Prompt Engineering Overview)

When should a tool do the final transform instead of the prompt?

This is one of the sharper design questions in agentic systems. The answer depends on whether the transformation is deterministic.

If the transform is deterministic (for example, converting a date string to ISO 8601, or summing line items), implement it in code. Do not ask Claude to do arithmetic or string normalisation that a function handles perfectly. The Tool Design & MCP Integration domain covers this boundary explicitly: tools should do what code does better; prompts should do what language understanding does better.

If the transform requires judgment (for example, classifying a free-text complaint into a taxonomy, or extracting implicit intent), that is a prompt task. The tool schema constrains the output shape; the prompt provides the judgment.

A common anti-pattern is asking Claude to both extract and transform in a single step when the transform is deterministic. Split the work:

python
# Step 1: Claude extracts (judgment task)
raw = call_claude_with_tool(document, extract_tool_schema)
# Step 2: Code transforms (deterministic task)
normalised = {
"order_id": raw["order_id"].upper().strip(),
"total_usd": round(sum(i["unit_price"] * i["quantity"] for i in raw["line_items"]), 2),
"created_at": parse_date(raw["created_at"]).isoformat()
}

This pattern also makes regression testing easier: you can test the Claude extraction step and the transform step independently.

How do you build a validation and retry loop that actually works?

Validation without retry is just error logging. Retry without structured error feedback is just burning tokens. The effective pattern combines both.

python
import anthropic
import json
from jsonschema import validate, ValidationError
client = anthropic.Anthropic()
def extract_with_retry(document: str, tool: dict, schema: dict, max_retries: int = 2) -> dict:
messages = [{"role": "user", "content": document}]
for attempt in range(max_retries + 1):
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
tools=[tool],
tool_choice={"type": "any"},
messages=messages
)
tool_block = next(
(b for b in response.content if b.type == "tool_use"), None
)
if tool_block is None:
raise RuntimeError("No tool_use block in response")
extracted = tool_block.input
try:
validate(instance=extracted, schema=schema)
return extracted
except ValidationError as e:
if attempt == max_retries:
raise
# Feed the error back as a tool result so Claude sees what went wrong
messages += [
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_block.id,
"is_error": True,
"content": f"Schema validation failed: {e.message}. Please correct and retry."
}
]
}
]
raise RuntimeError("Unreachable")

Key points in this implementation:

  • tool_choice: {"type": "any"} forces Claude to use a tool rather than responding in prose.
  • The validation error is fed back as a tool_result with is_error: true, which matches the MCP isError flag pattern the exam tests.
  • We cap retries at two. Beyond two retries, the failure is usually a schema design problem, not a prompting problem.

When a tool call fails validation, return the error in a tool_result block with is_error set to true. This preserves the conversation structure and gives the model the context it needs to self-correct.

Anthropic , Claude Tool Use Documentation

How do you handle structured output in long-context or retrieved-context tasks?

Long context introduces two failure modes: the attention dilution problem and grounding drift. Attention dilution means Claude's adherence to output constraints weakens when the context window is very full. Grounding drift means Claude starts synthesising or inferring rather than extracting.

Three mitigations:

  1. Place the schema and output instructions at the end of the system prompt, not the beginning. Claude attends more strongly to recent content. If your schema is buried under 2,000 tokens of background, it will be underweighted.

  2. Use XML tags to isolate retrieved chunks. Each retrieved document gets its own <source id="1">...</source> wrapper. This makes it easier for Claude to attribute extracted values to specific sources and reduces hallucination.

  3. Request provenance fields in the schema. Add a source_id or evidence_quote field alongside each extracted value. This forces Claude to ground its output in the retrieved text and makes validation easier.

json
{
"findings": [
{
"claim": "Revenue grew 12% year-over-year.",
"source_id": "3",
"evidence_quote": "Total revenue increased from $4.2M to $4.7M in FY2025."
}
]
}

For multi-step agent workflows, consider structured context passing between pipeline stages rather than passing raw text. Each stage receives only the structured output of the previous stage, which keeps context windows lean and output grounding tight.

How should teams test structured output prompts for regressions?

Prompt regression testing is not optional for production agentic systems. A prompt change that improves average-case output can silently break edge cases that your users encounter regularly.

A minimal regression harness has three components:

ComponentWhat it checks
Schema validation suiteEvery example in the test set passes JSON Schema validation
Semantic accuracy suiteExtracted values match ground-truth labels for a stratified sample
Edge case suiteKnown hard cases (ambiguous input, missing fields, adversarial phrasing) produce correct handling

Run the harness in CI before any prompt change reaches production. For MCP integrations, include at least one test that exercises the is_error retry path: a malformed input that should trigger validation failure and self-correction.

The CCA-F exam tests this mindset directly. Domain 3 (Claude Code Configuration & Workflows, 20%) includes task statements about CI integration and test-driven iteration. The exam rewards candidates who treat prompts as artefacts that require the same engineering discipline as code.

For candidates preparing for the exam, our concept library at /concepts covers all 174 atomic concepts mapped to the five domains and 30 task statements, including the structured output patterns discussed here.

What does this mean for CCA-F exam preparation?

The patterns above map to three of the five exam domains:

DomainWeightRelevant patterns
Domain 2: Tool Design & MCP Integration18%tool_use schema design, is_error retry, tool vs prompt boundary
Domain 3: Claude Code Configuration & Workflows20%CI regression testing, test-first task setup
Domain 4: Prompt Engineering & Structured Output20%XML tags, few-shot examples, self-check, negative constraints

Together these three domains account for 58% of the exam. A candidate who can reason fluently about schema design, validation loops, and prompt regression testing is well-positioned across more than half the question set.

The exam consistently presents scenario questions where you must choose between a prompt-only solution and a tool-enforced solution. The decision rule is straightforward: when the cost of a schema violation is high (downstream code breaks, data is written to a database, a financial transaction is triggered), use tool_use with validation and retry. When the cost is low (a summary that a human will review), constrained prompting is proportionate.

AI Skill Certs is an independent prep platform and is not affiliated with or endorsed by Anthropic. Our practice exams are 60 questions scored on the same 100-to-1000 scale as the real exam, with 720 as the passing bar, so you know exactly where you stand before exam day.

Frequently asked questions

What is the claude jarman approach to structured output?
The claude jarman approach refers to a disciplined, constraint-first prompting style: define an explicit JSON schema, use tool_use to enforce it at the API layer, write negative constraints alongside positive ones, and validate with structured retry rather than hoping the model self-corrects. It treats prompts as engineering artefacts, not natural-language wishes.
Is tool_use always better than constrained prompting for JSON output?
For non-trivial schemas in production pipelines, yes. tool_use enforces syntactic validity at the transport layer, which constrained prompting cannot guarantee. For simple, low-stakes classifications that a human will review, constrained prompting is proportionate and avoids the overhead of defining a full tool schema.
How many retries should a structured output validation loop attempt?
Two retries is a reasonable ceiling. The first retry catches transient compliance failures; the second catches cases where the error feedback itself needed clarification. Beyond two retries, the failure is almost always a schema design problem or an ambiguous prompt, not a model reliability issue, and further retries waste tokens without fixing the root cause.
Which CCA-F exam domains cover structured output and schema design?
Primarily Domain 4 (Prompt Engineering & Structured Output, 20%) and Domain 2 (Tool Design & MCP Integration, 18%). Domain 3 (Claude Code Configuration & Workflows, 20%) is also relevant for CI-based prompt regression testing. Together these three domains account for 58% of the CCA-F exam weight.
How do you prevent Claude from hallucinating values in a retrieval-augmented extraction pipeline?
Three mitigations work together: wrap each retrieved document in a labelled XML tag so Claude can attribute values to specific sources; add a provenance field (such as source_id or evidence_quote) to the schema so Claude must ground each claim; and place the output schema instructions at the end of the system prompt where attention is strongest.
What is the 'other + detail' enum pattern and when should you use it?
The 'other + detail' pattern adds a companion free-text field alongside a constrained enum. When the primary classification field is set to 'other', the detail field captures the actual value in plain text. Use it when your taxonomy is finite but not exhaustive, so Claude is never forced to misclassify an edge case into a poor-fit enum value.

People also ask

What is claude jarman used for in AI engineering?
Claude jarman refers to a disciplined prompting methodology for Claude that prioritises explicit schema contracts, tool_use enforcement, and validation-with-retry over relying on the model to infer output structure. It is used in production pipelines where schema violations cause downstream failures, such as database writes or API integrations.
Does Claude support strict JSON schema output natively?
Claude enforces syntactic JSON validity when you use tool_use with a defined input_schema. The API guarantees a well-formed tool call block. Semantic correctness, such as correct enum values or logically consistent fields, still requires prompt design and validation. Constrained prompting alone does not provide the same transport-layer guarantee.
How do few-shot examples improve Claude structured output?
Few-shot examples in the system prompt show Claude the exact output shape for ambiguous edge cases, which outperforms prose instructions for hard cases. A single example demonstrating null handling for optional fields is more reliable than a paragraph explaining when to use null. Keep examples in the system prompt so they persist across multi-turn conversations.
What prompt engineering techniques does the CCA-F exam test?
The CCA-F exam's Domain 4 (Prompt Engineering & Structured Output, 20%) tests XML tag demarcation, few-shot example design, negative constraints, self-check instructions, and the decision between tool_use and constrained prompting. The exam rewards proportionate solutions: tool enforcement for high-stakes output, constrained prompting for low-stakes tasks.
How should you test Claude prompts for regressions in production?
Run a three-part harness in CI: a schema validation suite confirming every test case passes JSON Schema validation; a semantic accuracy suite comparing extracted values to ground-truth labels; and an edge case suite covering known hard inputs. For MCP integrations, include at least one test that exercises the is_error retry path.

About the author

Solomon Udoh

AI Architect & Certification Lead

Solomon Udoh is an AI Architect who designs and ships production agent systems on the Claude API and Claude Code. He built AI Skill Certs' adaptive engine and authored its 174-concept knowledge graph, mapping every Claude Certified Architect - Foundations objective to hands-on, exam-aligned practice.

  • Designs production multi-agent systems on the Claude API and Agent SDK
  • Author of the AI Skill Certs knowledge graph (174 mapped exam concepts)
  • Builds with MCP, Claude Code, structured outputs, and agentic loops daily
  • Reviews every concept page against the official Anthropic exam guide

You might also like

Ready to put it into practice?

Study every exam concept with an adaptive tutor.

Start studying