Concept deep dive·10 min read·20 June 2026

Claude Jarman: Prompt Patterns for Strict Structured Output

Claude Jarman techniques for reliable structured output: XML tags, schema design, tool_use validation, and retry patterns. Mapped to CCA-F Domain 4 (20% of the exam).

By Solomon Udoh · AI Architect & Certification Lead

Claude Jarman: Prompt Patterns for Strict Structured Output

The phrase "claude jarman" circulates in engineering forums as shorthand for a disciplined, craft-first approach to prompting Claude: precise constraints, explicit schema contracts, and no wishful thinking about what the model will infer. Whether you encountered it in a Slack thread or a GitHub discussion, the underlying question is the same: how do you make Claude produce structured output that is reliable enough to wire directly into a production pipeline? This post answers that question with patterns that map directly to Prompt Engineering & Structured Output, Domain 4 of the CCA-F exam, which carries 20% of the total exam weight.

What does "structured output" actually mean in a Claude agent?

Structured output means Claude returns data in a machine-readable format, typically JSON, that your code can parse without a second LLM call to clean it up. It is not the same as "output that looks tidy." A response can be beautifully formatted prose and still be unstructured from a pipeline perspective.

In practice, three mechanisms produce structured output with Claude:

Constrained prompting - you instruct Claude to return JSON and describe the schema in the system prompt.
tool_use with a JSON schema - you define a tool whose input schema is the structure you want; Claude is forced to populate it.
Post-processing validation with retry - you parse the output, catch schema violations, and re-prompt with the error.

Each mechanism has a different reliability profile. Understanding when to use which one is the core of what practitioners mean by the claude jarman approach.

Which mechanism is most reliable for strict schema adherence?

tool_use is the most reliable mechanism for strict schema adherence when the schema is non-trivial. When you define a tool with a JSON Schema, Claude must produce a syntactically valid tool call; the API enforces this at the transport layer. You do not get free-form text that happens to contain JSON: you get a structured tool_use block.

json

{
  "name": "extract_order",
  "description": "Extract structured order data from the user message.",
  "input_schema": {
    "type": "object",
    "properties": {
      "order_id": { "type": "string" },
      "line_items": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "sku": { "type": "string" },
            "quantity": { "type": "integer", "minimum": 1 }
          },
          "required": ["sku", "quantity"]
        }
      },
      "status": {
        "type": "string",
        "enum": ["pending", "confirmed", "shipped", "cancelled"]
      }
    },
    "required": ["order_id", "line_items", "status"]
  }
}

Constrained prompting alone is less reliable for complex schemas. Claude will usually comply, but "usually" is not a contract. For high-stakes pipelines, the CCA-F exam consistently rewards deterministic solutions over probabilistic ones, which means tool_use over "please return JSON."

That said, tool_use is not a silver bullet. Semantic errors, wrong enum values, or logically inconsistent field combinations can still slip through. Validation and retry remain necessary.

What schema design choices reduce failures before they happen?

Schema design is upstream of prompting. A poorly designed schema produces failures that no prompt technique can fully prevent.

Design choice	Lower failure rate	Higher failure rate
Field names	Descriptive (`invoice_total_usd`)	Ambiguous (`total`, `amount`)
Enums	Tight, exhaustive list	Open string with "any value"
Required vs optional	Mark truly optional fields optional	Mark everything required
Nullable fields	Use sparingly, only when absence is meaningful	Overuse as a catch-all
Catch-all pattern	`"category": "other", "category_detail": "..."`	Single free-text field
Nesting depth	Flat or one level deep	Three or more levels

The "other + detail" pattern deserves particular attention. When you have a classification field with a finite enum, add a companion detail field that is only required when category is "other". This prevents Claude from forcing a poor fit into a constrained enum while still keeping the primary field machine-filterable.

json

{
  "category": {
    "type": "string",
    "enum": ["billing", "technical", "shipping", "other"]
  },
  "category_detail": {
    "type": "string",
    "description": "Required when category is 'other'. Describe the category in plain text."
  }
}

For exam purposes, note that schema design questions appear in both Domain 4 (Prompt Engineering & Structured Output, 20%) and Domain 2 (Tool Design & MCP Integration, 18%). The overlap is intentional: the exam treats schema design as a cross-cutting concern.

Which prompt patterns actually improve structured output quality?

Four techniques have measurable impact. We rank them by leverage, not by complexity.

1. XML tags for section demarcation

Claude's training includes a large volume of XML-structured content. Wrapping input sections in semantic tags reduces the chance that Claude conflates source material with output instructions.

text

<source_document>
{{document_text}}
</source_document>

<task>
Extract the fields defined in the tool schema. Do not infer values not present in the source document. If a required field is absent, set it to null only if the schema permits; otherwise return an error flag.
</task>

This is especially valuable in retrieval-augmented pipelines where the retrieved context is long and potentially noisy. Without demarcation, Claude can "leak" retrieved text into output fields.

2. Explicit negative constraints

Tell Claude what not to do, not just what to do. "Return only the JSON object" is weaker than "Return only the JSON object. Do not include explanatory prose, markdown fences, or commentary before or after the object."

text

Return a single JSON object matching the tool schema.
Do not wrap it in markdown code fences.
Do not add keys not defined in the schema.
Do not hallucinate values for fields absent from the source.

3. Few-shot examples in the system prompt

Few-shot examples are the highest-leverage technique for ambiguous edge cases. A single well-chosen example of a hard case outperforms three paragraphs of instruction. Keep examples in the system prompt, not the human turn, so they persist across multi-turn conversations.

text

<example>
<input>Customer said: "I got the wrong size, want to swap for a medium."</input>
<output>{"intent": "exchange", "reason": "wrong_size", "requested_variant": "medium"}</output>
</example>

<example>
<input>Customer said: "Just checking where my order is."</input>
<output>{"intent": "status_inquiry", "reason": null, "requested_variant": null}</output>
</example>

Note the second example explicitly shows null for optional fields. This teaches Claude the null pattern without requiring prose explanation.

4. Self-check instruction

For high-value extractions, append a self-check step before the final output:

text

Before returning the JSON, verify:
- Every required field is populated.
- Enum values match the allowed list exactly (case-sensitive).
- No field contains a value inferred beyond what the source states.
If any check fails, correct the output before returning it.

This adds tokens but reduces downstream validation failures. On the CCA-F exam, proportionate fixes are rewarded: use self-check for high-stakes extractions, not for every trivial classification.

Prompts should be treated as code: version-controlled, tested against a regression suite, and reviewed before deployment to production agents.

Anthropic , Claude Documentation (Prompt Engineering Overview)

When should a tool do the final transform instead of the prompt?

This is one of the sharper design questions in agentic systems. The answer depends on whether the transformation is deterministic.

If the transform is deterministic (for example, converting a date string to ISO 8601, or summing line items), implement it in code. Do not ask Claude to do arithmetic or string normalisation that a function handles perfectly. The Tool Design & MCP Integration domain covers this boundary explicitly: tools should do what code does better; prompts should do what language understanding does better.

If the transform requires judgment (for example, classifying a free-text complaint into a taxonomy, or extracting implicit intent), that is a prompt task. The tool schema constrains the output shape; the prompt provides the judgment.

A common anti-pattern is asking Claude to both extract and transform in a single step when the transform is deterministic. Split the work:

python

# Step 1: Claude extracts (judgment task)
raw = call_claude_with_tool(document, extract_tool_schema)

# Step 2: Code transforms (deterministic task)
normalised = {
    "order_id": raw["order_id"].upper().strip(),
    "total_usd": round(sum(i["unit_price"] * i["quantity"] for i in raw["line_items"]), 2),
    "created_at": parse_date(raw["created_at"]).isoformat()
}

This pattern also makes regression testing easier: you can test the Claude extraction step and the transform step independently.

How do you build a validation and retry loop that actually works?

Validation without retry is just error logging. Retry without structured error feedback is just burning tokens. The effective pattern combines both.

python

import anthropic
import json
from jsonschema import validate, ValidationError

client = anthropic.Anthropic()

def extract_with_retry(document: str, tool: dict, schema: dict, max_retries: int = 2) -> dict:
    messages = [{"role": "user", "content": document}]
    
    for attempt in range(max_retries + 1):
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=1024,
            tools=[tool],
            tool_choice={"type": "any"},
            messages=messages
        )
        
        tool_block = next(
            (b for b in response.content if b.type == "tool_use"), None
        )
        if tool_block is None:
            raise RuntimeError("No tool_use block in response")
        
        extracted = tool_block.input
        
        try:
            validate(instance=extracted, schema=schema)
            return extracted
        except ValidationError as e:
            if attempt == max_retries:
                raise
            # Feed the error back as a tool result so Claude sees what went wrong
            messages += [
                {"role": "assistant", "content": response.content},
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": tool_block.id,
                            "is_error": True,
                            "content": f"Schema validation failed: {e.message}. Please correct and retry."
                        }
                    ]
                }
            ]
    
    raise RuntimeError("Unreachable")

Key points in this implementation:

tool_choice: {"type": "any"} forces Claude to use a tool rather than responding in prose.
The validation error is fed back as a tool_result with is_error: true, which matches the MCP isError flag pattern the exam tests.
We cap retries at two. Beyond two retries, the failure is usually a schema design problem, not a prompting problem.

When a tool call fails validation, return the error in a tool_result block with is_error set to true. This preserves the conversation structure and gives the model the context it needs to self-correct.

Anthropic , Claude Tool Use Documentation

How do you handle structured output in long-context or retrieved-context tasks?

Long context introduces two failure modes: the attention dilution problem and grounding drift. Attention dilution means Claude's adherence to output constraints weakens when the context window is very full. Grounding drift means Claude starts synthesising or inferring rather than extracting.

Three mitigations:

Place the schema and output instructions at the end of the system prompt, not the beginning. Claude attends more strongly to recent content. If your schema is buried under 2,000 tokens of background, it will be underweighted.
Use XML tags to isolate retrieved chunks. Each retrieved document gets its own <source id="1">...</source> wrapper. This makes it easier for Claude to attribute extracted values to specific sources and reduces hallucination.
Request provenance fields in the schema. Add a source_id or evidence_quote field alongside each extracted value. This forces Claude to ground its output in the retrieved text and makes validation easier.

json

{
  "findings": [
    {
      "claim": "Revenue grew 12% year-over-year.",
      "source_id": "3",
      "evidence_quote": "Total revenue increased from $4.2M to $4.7M in FY2025."
    }
  ]
}

For multi-step agent workflows, consider structured context passing between pipeline stages rather than passing raw text. Each stage receives only the structured output of the previous stage, which keeps context windows lean and output grounding tight.

How should teams test structured output prompts for regressions?

Prompt regression testing is not optional for production agentic systems. A prompt change that improves average-case output can silently break edge cases that your users encounter regularly.

A minimal regression harness has three components:

Component	What it checks
Schema validation suite	Every example in the test set passes JSON Schema validation
Semantic accuracy suite	Extracted values match ground-truth labels for a stratified sample
Edge case suite	Known hard cases (ambiguous input, missing fields, adversarial phrasing) produce correct handling

Run the harness in CI before any prompt change reaches production. For MCP integrations, include at least one test that exercises the is_error retry path: a malformed input that should trigger validation failure and self-correction.

The CCA-F exam tests this mindset directly. Domain 3 (Claude Code Configuration & Workflows, 20%) includes task statements about CI integration and test-driven iteration. The exam rewards candidates who treat prompts as artefacts that require the same engineering discipline as code.

For candidates preparing for the exam, our concept library at /concepts covers all 174 atomic concepts mapped to the five domains and 30 task statements, including the structured output patterns discussed here.

What does this mean for CCA-F exam preparation?

The patterns above map to three of the five exam domains:

Domain	Weight	Relevant patterns
Domain 2: Tool Design & MCP Integration	18%	`tool_use` schema design, `is_error` retry, tool vs prompt boundary
Domain 3: Claude Code Configuration & Workflows	20%	CI regression testing, test-first task setup
Domain 4: Prompt Engineering & Structured Output	20%	XML tags, few-shot examples, self-check, negative constraints

Together these three domains account for 58% of the exam. A candidate who can reason fluently about schema design, validation loops, and prompt regression testing is well-positioned across more than half the question set.

The exam consistently presents scenario questions where you must choose between a prompt-only solution and a tool-enforced solution. The decision rule is straightforward: when the cost of a schema violation is high (downstream code breaks, data is written to a database, a financial transaction is triggered), use tool_use with validation and retry. When the cost is low (a summary that a human will review), constrained prompting is proportionate.

AI Skill Certs is an independent prep platform and is not affiliated with or endorsed by Anthropic. Our practice exams are 60 questions scored on the same 100-to-1000 scale as the real exam, with 720 as the passing bar, so you know exactly where you stand before exam day.

Frequently asked questions

What is the claude jarman approach to structured output?

The claude jarman approach refers to a disciplined, constraint-first prompting style: define an explicit JSON schema, use tool_use to enforce it at the API layer, write negative constraints alongside positive ones, and validate with structured retry rather than hoping the model self-corrects. It treats prompts as engineering artefacts, not natural-language wishes.

Is tool_use always better than constrained prompting for JSON output?

For non-trivial schemas in production pipelines, yes. tool_use enforces syntactic validity at the transport layer, which constrained prompting cannot guarantee. For simple, low-stakes classifications that a human will review, constrained prompting is proportionate and avoids the overhead of defining a full tool schema.

How many retries should a structured output validation loop attempt?

Two retries is a reasonable ceiling. The first retry catches transient compliance failures; the second catches cases where the error feedback itself needed clarification. Beyond two retries, the failure is almost always a schema design problem or an ambiguous prompt, not a model reliability issue, and further retries waste tokens without fixing the root cause.

Which CCA-F exam domains cover structured output and schema design?

Primarily Domain 4 (Prompt Engineering & Structured Output, 20%) and Domain 2 (Tool Design & MCP Integration, 18%). Domain 3 (Claude Code Configuration & Workflows, 20%) is also relevant for CI-based prompt regression testing. Together these three domains account for 58% of the CCA-F exam weight.

How do you prevent Claude from hallucinating values in a retrieval-augmented extraction pipeline?

Three mitigations work together: wrap each retrieved document in a labelled XML tag so Claude can attribute values to specific sources; add a provenance field (such as source_id or evidence_quote) to the schema so Claude must ground each claim; and place the output schema instructions at the end of the system prompt where attention is strongest.

What is the 'other + detail' enum pattern and when should you use it?

The 'other + detail' pattern adds a companion free-text field alongside a constrained enum. When the primary classification field is set to 'other', the detail field captures the actual value in plain text. Use it when your taxonomy is finite but not exhaustive, so Claude is never forced to misclassify an edge case into a poor-fit enum value.

What does "structured output" actually mean in a Claude agent?

Which mechanism is most reliable for strict schema adherence?

What schema design choices reduce failures before they happen?

Which prompt patterns actually improve structured output quality?

1. XML tags for section demarcation

2. Explicit negative constraints

3. Few-shot examples in the system prompt

4. Self-check instruction

When should a tool do the final transform instead of the prompt?

How do you build a validation and retry loop that actually works?

How do you handle structured output in long-context or retrieved-context tasks?

How should teams test structured output prompts for regressions?

What does this mean for CCA-F exam preparation?

Frequently asked questions

People also ask