AI Skill Certs
Prompt Engineering & Structured Output·Task 4.2·Bloom: apply·Difficulty 3/5·8 min read·Updated 2026-06-07

Few Shot Prompting for Data Extraction Quality

Apply few-shot prompting to improve output consistency and quality

SUBy Solomon UdohReviewed by Solomon UdohAI-assisted · human-reviewed
In short
Few-shot prompting for extraction means seeding the prompt with a few examples that each handle a different document structure, such as an inline citation, an end bibliography, a table, or a narrative mention, so Claude maps every layout onto the same output schema. It is the fix for extraction that works on tidy documents but returns empty or wrong fields on differently formatted ones.

What few shot prompting for data extraction fixes

Few shot prompting for data extraction is the technique of placing a small number of worked examples in the prompt that, between them, demonstrate how to pull the same target fields out of structurally different documents. It directly addresses the failure mode where an extraction prompt performs beautifully on the documents you tested it against and then produces empty fields, partial records, or misplaced values the moment a differently formatted document arrives.

The exam frames this under scenario 6, structured data extraction, where the corpus is rarely uniform. Real document sets mix papers with inline citations, reports with end-of-document bibliographies, spreadsheets exported as tables, and prose that mentions the very same facts in passing. An instruction-only prompt encodes one mental picture of the document, and the model quietly underperforms on every layout that does not match that picture.

Few-shot extraction prompting
Seeding an extraction prompt with examples that each cover a different document structure, so Claude learns to map inline citations, bibliographies, tables, and narrative mentions onto a single consistent output schema.

Why varied document structures defeat instruction-only prompts

Write the clearest extraction instruction you can, "pull every cited source with its author, year, and title," and it still leaves the model guessing about layout. In one document a source is a tidy row in a references table. In another it is a parenthetical "(Smith, 2021)" buried mid-sentence with the title nowhere nearby. In a third it is a footnote. The instruction describes what to extract; it says nothing about where it hides in each structure.

Faced with a layout it has not been shown, Claude tends to take the conservative path and emit nothing for that field rather than risk a wrong guess. From the outside this looks like the model failing to find data that is obviously present. The root cause is not comprehension; it is that the prompt never demonstrated that this layout also contains the field. That is precisely the gap a small, format-spanning example set closes.

Covering the format space with a handful of examples

The design move is to treat your examples as a map of the structural territory rather than a stack of similar cases. If your documents come in four meaningfully different shapes, your two-to-four examples should sample across those shapes, not cluster in the most common one.

One schema, many document layouts
Loading diagram...
Each example anchors a different layout to the same target fields, so no format is left undemonstrated.

Notice the contrast with the previous knowledge point. There the examples taught a judgement rule by exposing reasoning. Here they teach a recognition skill by exposing variety. The same few-shot mechanism serves both, but for extraction the lever you pull is breadth of structure: an example that shows the messy parenthetical citation resolving into the same author or year or title fields teaches Claude that the field survives even when the layout disguises it.

Empty fields are a diagnostic, not a defect

The most useful habit in extraction work is to read null fields as evidence. When a field comes back empty, ask first whether the information truly was absent or whether it was present in a structure your examples never covered. If a spot check shows the data was there in a format you did not demonstrate, you have found your next example rather than a model limitation.

This reframing matters on the exam. A tempting wrong answer to an "empty fields" symptom is to tighten the instruction or lower a threshold. The knowledge point says otherwise: inconsistent or missing extraction across document types is a signal to add examples that cover those types, because the model already understands the field, it just has not been shown that this layout contains it.

Giving the model a clean way to say nothing

The diagnostic habit above assumes the value is present. The opposite case, where a field is genuinely absent, needs its own design, and Anthropic's extraction examples handle it explicitly. The guidance for extraction prompts is to have Claude return complete, valid JSON and nothing else, with no reasoning or commentary wrapped around it, and to emit a defined sentinel when there is nothing to extract: an empty object {} or an explicit null rather than a prose line such as "no data found."

That sentinel does real work. It gives your downstream parser a single, predictable shape to handle, and it lets your own code separate a legitimate empty result from the false null described above. A false null means the value was there in an undemonstrated layout, and the fix is another example. A true empty {} means the field was genuinely absent, and no example will conjure it. Without an agreed empty value, those two very different situations look identical at the output, and the pipeline cannot tell a clean miss from a recoverable one.

So a strong example set demonstrates both halves of the contract: the same target fields recovered from every layout that contains them, and a strict, valid empty result for the documents that do not. Showing the empty case at least once is what stops the model from narrating its failures in prose instead of reporting them in the schema your consumer expects.

How this is tested

Domain 4 questions in the extraction scenario like to describe a pipeline that extracts cleanly from structured sources and then degrades on a new batch of differently formatted documents, with reviewers noticing blank fields where data clearly exists. The strongest answer adds few-shot examples drawn from the failing formats. Weaker answers reach for more elaborate instructions, a larger model, or a confidence cutoff, none of which address the real problem, which is undemonstrated layout variety.

This builds directly on constructing effective few-shot examples, and it pairs with the judgement of when few-shot is the right technique at all, since some extraction problems are actually structural (needing tool use) or semantic (needing validation) rather than format-driven.

Worked example

A research-extraction pipeline pulls citations from PDFs. It works on journal articles with a references table but returns empty author and year fields on conference papers that cite inline.

Your prompt instructs Claude to extract each cited source as an object with author, year, and title. On the journal set it is near-perfect, because every source sits in a clean references table that maps one-to-one to your schema. On the conference set, the same facts appear as inline parentheticals, "as shown by Lee and Park (2019) in their work on retrieval," and the title is mentioned a paragraph earlier. The pipeline returns author and year as null for most conference papers.

Rather than rewriting the instruction, you add two examples. The first shows a references-table entry resolving into the schema, confirming the behaviour you already have. The second shows an inline parenthetical: the input prose contains "Lee and Park (2019)" and an earlier sentence naming the paper, and the demonstrated output correctly fills author as "Lee and Park," year as 2019, and title from the earlier sentence, with a short note that inline citations often separate the title from the author and year.

With both layouts demonstrated, the model now recognises that the author and year fields can live inside running prose, not only in a table cell. Extraction on the conference set jumps, and a third format you add later, footnoted citations, slots in the same way: one example anchoring the new layout to the existing schema. The schema never changed; what changed is that every format the corpus throws at Claude has now been shown to contain the target fields.

Inline citations versus bibliographies in practice

It helps to make the format contrast concrete, because the exam scenarios are concrete. A bibliography is a structurally generous format: each source is a self-contained record with author, year, and title sitting side by side, which maps almost mechanically onto an extraction schema. An inline citation is structurally hostile: the author and year are fused into a parenthetical mid-sentence, the title may appear paragraphs earlier or nowhere at all, and the same source may be cited several times in different abbreviated forms. A prompt tuned on bibliographies has simply never been shown that a lone parenthetical can carry two of your schema fields at once.

Tables and narrative prose pose the mirror images of this problem. A table makes structure explicit but can bury meaning in column headers the model must interpret; narrative prose makes meaning explicit but scatters the structure across whole sentences. An example for each teaches Claude that the target fields persist through every one of these disguises, and that its job is to recover them no matter how the document chose to present them. The breadth of the example set, not its size, is what carries the lesson.

Keeping the schema stable as the formats change

A subtle discipline underpins all of this: the output schema must stay identical across every example, no matter how different the inputs look. If one example emits a year as a number and another as a string, or one nests the title and another flattens it, you teach Claude that the output shape is negotiable, which quietly reintroduces the very inconsistency you set out to remove. The inputs are allowed to vary wildly; the output must not budge.

Treat the schema as the fixed point and the examples as varied routes to it. Each demonstration takes a structurally different document and lands on the same record shape, so the model learns that format is purely an input concern and never an output one. That separation, messy and diverse inputs all converging on one clean, invariant schema, is the heart of robust extraction and the thing reviewers are really checking when they audit your pipeline.

Misconceptions that cost marks

Misconception

If extraction returns empty fields, the prompt instructions are not detailed enough, so I should describe the fields more precisely.

What's actually true

More description rarely helps when the data exists but sits in an undemonstrated layout. The fix is to add examples that show the same field appearing in the failing document structures, which teaches Claude to recognise it there.

Misconception

Examples from one representative document are enough; the model will transfer to any other format on its own.

What's actually true

Examples drawn from a single layout tend to fail on structurally different ones. Effective extraction examples deliberately span the format variety in your corpus, because each distinct structure can hide the target field in a different place.

Diversity without drift

A risk lurks inside the push for format diversity: if your examples vary too freely, Claude may latch onto an incidental pattern instead of the lesson you intend. The guard is to vary only the dimension that matters, document structure, while holding everything else steady. Same fields, same schema, same depth of reasoning, same ordering of the output. When the single thing that changes from one example to the next is the layout of the source, the model correctly concludes that layout is the variable it must learn to see through, and nothing else. This is the same instinct that keeps the schema invariant, applied one level up: control the variation so that every difference the model observes is a difference you meant to teach. Diversity here is a scalpel, not a firehose. You are sampling the structural space on purpose, not tossing in arbitrary variety and hoping the model sorts it out for you.

A practical recipe

Audit a sample of failing documents, group them by structural shape, and make sure your small example set has at least one demonstration per distinct shape. Keep the output schema identical across every example so Claude sees one stable target. Then watch the null fields: each persistent blank that turns out to contain real data is a pointer to the next format you should demonstrate.

1 schema
every example maps to the same output shape
per format
span distinct layouts, not similar samples
null = clue
empty fields flag an undemonstrated layout
Check your understanding

A structured-extraction pipeline reliably pulls product specs from supplier datasheets formatted as tables, but returns empty fields for suppliers who describe the same specs in narrative paragraphs. Reviewers confirm the data is present in the narratives. What is the most effective fix?

People also ask

Why does Claude leave extraction fields empty when the data exists?
Because the value sits in a document layout the prompt never demonstrated. The model recognises the field in formats it has seen and conservatively returns null for formats it has not, even when a human can spot the value.
How do few-shot examples improve data extraction?
They show the same target field appearing across different document structures and how each maps to the output schema, turning format variety from a cause of inconsistency into something Claude has been explicitly taught to handle.
Do I need a separate example for every document format?
Not one per document, but your small example set should span the structurally distinct formats you expect, because an example from only one layout typically fails on the others.

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

Anthropic

Prompting 101 | Code w/ Claude

Why watch: Uses a real extraction task (analysing varied Swedish car-accident reports) to show how few-shot examples covering different document structures dramatically improve extraction quality across diverse inputs.

More videos for this concept

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying