Claim Source Mapping in Claude Synthesis

In short: A claim source mapping is a structured record that binds every individual finding to the evidence behind it: the claim, the source URL, the document name, the exact excerpt, and the publication date. Carrying that record alongside the text is what lets attribution survive when downstream agents summarise and merge results.

What a claim source mapping actually is

A claim source mapping is the small unit of bookkeeping that keeps a multi-source synthesis honest. Instead of letting findings float as free prose, you attach to every single claim a record of where it came from. In its canonical Domain 5 form that record has five fields: the claim, the source URL, the document name, the relevant excerpt, and the publication date. When an agent reports that a market grew by twelve percent, the mapping is what lets a reader pull up the exact sentence in the exact document, dated, that the number rests on.

The reason this is a distinct knowledge point rather than an afterthought is that synthesis is lossy by nature. Each time an agent rewrites, condenses, or merges material, the cheapest thing to drop is the provenance, because provenance is not part of the sentence a reader sees. The mapping defends against that loss by making attribution a structured payload that rides alongside the prose rather than living inside it.

Claim source mapping: A structured record that pairs an individual claim with its evidence, source URL, document name, exact excerpt, and publication date, so that attribution can be carried, merged, and verified through every downstream synthesis step.

The five fields every finding should carry

Each field in the mapping earns its place by answering a different verification question, and dropping any one of them quietly weakens the whole record.

Claim, the specific assertion being made, stated narrowly enough that a reader can check it against the excerpt.
Source URL or identifier, where the evidence lives, so the trail does not dead-end at a document name nobody can locate.
Document name, a human-readable handle for the source, useful when several findings come from the same report.
Excerpt, the exact span of text the claim rests on. This is the field that turns I read it somewhere into here is the sentence.
Publication date, when the source was published or the data collected, which later lets you tell a genuine contradiction from two figures measured at different times.

5 fields

claim, URL, document, excerpt, date

per claim

granularity of the mapping

survives merge

what good attribution guarantees

The excerpt and the date are the two fields novices skip and experts insist on. A synthesis can usually reconstruct a URL or a document name after the fact, but if the exact excerpt was never captured, nobody can confirm that the claim was faithful to the source rather than an over-confident paraphrase.

Why attribution dies during summarisation

The failure this knowledge point guards against is silent. An agent retrieves a dozen passages, each cleanly attributed, and then produces a tidy executive summary. The summary is accurate, but it reads as a single confident voice with no markers showing which sentence came from which source. Attribution did not get corrupted; it simply was not carried forward, and a downstream reader now has a polished paragraph they cannot verify.

This is why the mapping has to be a first-class structure rather than a stylistic preference. If provenance lives only as inline phrases like according to one report, the next summarisation pass will smooth those phrases away as redundant. If provenance lives as a structured mapping attached to each finding, the summariser has something concrete to preserve and merge. The exam frames this as a design choice: produce synthesis with the mapping intact, or produce synthesis without source attribution and accept that the result is unverifiable.

How Claude's Citations feature operationalises the mapping

You do not have to invent this machinery by hand. Anthropic's Citations feature implements the claim-to-source binding directly in the Messages API. When you enable citations on the documents you supply, Claude returns its answer as a series of text blocks where each block can carry a list of citations. Every citation points to an exact location in a named source: a cited text span, a document index identifying which document, and an index range that is character-based for plain text, page-based for PDFs, or content-block-based for custom content.

Two details matter for the exam. First, the returned cited text is guaranteed to be a valid pointer into the documents you provided, because it is extracted rather than generated, which removes the risk of a hallucinated quotation. Second, the cited text does not count against output tokens, so faithful attribution is cheap rather than something you trade away under cost pressure. The feature is, in effect, a managed claim source mapping: the model supplies the claim and the binding, and your code keeps the URL, document name, and date metadata alongside it.

The shape of a finding object

It helps to picture the mapping as a small object that flows through the system rather than as prose. The diagram below shows a single finding moving from retrieval, through the model that grounds a claim in an excerpt, into a structured record that downstream agents can merge without losing the link.

A finding carries its provenance as structured data

Loading diagram...

Provenance is a payload that travels with the claim, not a sentence that gets summarised away.

Worked example: a research assistant that has to stay verifiable

Worked example

A research assistant agent is asked to summarise three industry reports on electric-vehicle adoption for a partner who will quote the result in a board memo.

The agent retrieves passages from each report and is tempted to write straight to a clean three-paragraph summary. If it does, the board memo will contain numbers nobody can stand behind. Instead, the agent builds a finding record for every assertion before it writes a word of prose.

For the claim that adoption rose to eighteen percent of new sales, it stores the claim, the report URL, the document name, the exact sentence it found that figure in, and the report's publication month. It does the same for each of the other findings. Only then does it compose the summary, and it keeps the records attached so that the synthesis step has something to merge rather than something to flatten.

When the partner later asks where the eighteen percent came from, the answer is one lookup away: the named report, the dated edition, and the quoted sentence. Had the agent skipped the mapping and gone straight to prose, that same question would have triggered a frantic re-retrieval, and there would be no guarantee the rediscovered source actually said eighteen rather than a number the model rounded in passing.

Common misreadings to avoid

Misconception

If the final summary is accurate, source attribution is just extra clutter I can leave out.

What's actually true

Accuracy and verifiability are different properties. A correct-sounding summary with no claim source mapping cannot be checked, corrected, or defended, and on the exam an unattributed synthesis is treated as a failure regardless of whether the numbers happen to be right.

Misconception

Naming the source once at the end of the document is enough provenance.

What's actually true

A single trailing source list cannot tell a reader which sentence rests on which document, and it is the first thing a later summarisation pass discards. Attribution has to be bound per claim, claim, URL, document, excerpt, and date, so the link survives merging.

Bind the mapping to the smallest checkable claim

Granularity decides whether a claim source mapping is genuinely useful or merely decorative. If you attach one source to a whole paragraph, a reader who doubts the third sentence still has to read the entire cited document to find the part that supports it, and the binding has bought them almost nothing. The discipline is to bind attribution to the smallest claim a reader might want to check on its own, which in practice usually means a single sentence-sized assertion rather than a block.

Anthropic's Citations feature reflects this instinct directly. By default it chunks plain-text and PDF documents into sentences, so a citation can point at one sentence or chain a few consecutive sentences together for a longer claim. When you need even finer or differently-shaped control, bullet points, transcript turns, or pre-chunked retrieval results, custom content documents let you define the chunk boundaries yourself, so the granularity of the mapping matches the granularity of the claims you actually make. The lesson for the exam is that a mapping is only as trustworthy as it is specific: a coarse binding invites the same doubt an unattributed claim does.

What a missing excerpt actually costs

Among the five fields, the excerpt is the one that quietly does the most work, and it is worth being concrete about what its absence costs. With the exact quoted span captured, a reader can confirm in seconds that the claim is a faithful reading of the source rather than an over-eager paraphrase. Without it, even a perfect URL and document name leave a gap: the reader knows where to look but not what they are looking for, and they have to take on faith that the agent did not round, reframe, or subtly overstate what the source said.

That gap is exactly where synthesis errors hide. A model under instruction to be concise will compress a hedged statement into a flat assertion, and a claim that read may be associated with in the source can quietly become causes in the summary. The captured excerpt is the antidote, because it freezes the original wording next to the claim and makes any drift between them visible. This is also why Anthropic notes that its citations return extracted cited text rather than generated text: an extracted span cannot be a hallucinated quotation, so the excerpt is a guaranteed-faithful pointer rather than the model's paraphrase of itself.

Provenance as a contract between agents

There is one more way to think about the mapping that pays off later in Task Statement 5.6. A claim source mapping is a promise the producing agent makes to every consumer downstream: handle this finding and you will always be able to say where it came from. In a single-agent setting the promise is easy to keep. In a multi-agent pipeline it becomes a contract that every intermediate agent has to honour, and the whole knowledge point of attribution preservation through the pipeline is about what happens when one agent quietly breaks it. Getting the mapping right here, at the point of creation, is what makes that later guarantee even possible.

How this shows up on the exam

Domain 5 carries fifteen percent of the exam, and Task Statement 5.6 is its provenance backbone. Questions on this knowledge point rarely ask you to define the term. Instead they describe an agent producing a confident synthesis and ask you to spot what is missing, or they hand you two candidate designs and ask which one keeps attribution alive through summarisation. The right answer always reduces to the same instinct: attach the five-field mapping to each finding so that provenance is structured data, not a sentence waiting to be edited out. Get this foundation right and the harder 5.6 knowledge points, conflict handling, temporal awareness, and pipeline preservation, have somewhere solid to build.

Check your understanding

A research agent returns a polished one-paragraph summary of three reports. A reviewer asks which report a key statistic came from, and nobody can answer without re-running the retrieval. What design change best prevents this?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Claim Source Mapping: Preserving Attribution Through Synthesis