- In short
- The full conversation history requirement is the rule that every request to the Messages API must carry the complete accumulated transcript of prior turns, because the API holds no state between calls. Each user message and assistant reply is appended to a growing list and resent in full, and omitting earlier turns strips the context the model needs to stay coherent.
The rule in one sentence
The full conversation history requirement is the most foundational fact in Domain 5: to continue a conversation, you must hand the model the entire accumulated transcript on every request, because nothing about the previous turns is stored for you. The model you are calling is not resuming a session it remembers; it is reading, fresh, whatever list of messages you chose to include this time. If a turn is not in that list, for the purposes of this call it never happened.
Anthropic's documentation describes the mechanic plainly: as a conversation advances through turns, each user message and assistant response accumulates within the context window, and previous turns are preserved completely. The key word is accumulate. You are not updating a remote conversation; you are growing a local list and resending it. The transcript is your responsibility, and the linear growth of that transcript is the price of a clean, stateless interface.
- Full conversation history requirement
- The rule that each Messages API request must include the complete accumulated transcript of prior turns, since the API stores no state between calls and the model only sees what the current request contains.
What full history actually means
Full history means every prior turn the model needs in order to reason correctly, in order, in the messages list you send. Concretely, after the model replies, you append that assistant reply to your list, and when the user speaks again you append their new message too, then send the whole thing. Turn three carries turns one and two; turn ten carries all nine that came before. The list only ever grows (until you deliberately manage it), and the most recent request always contains the longest transcript.
This is why people sometimes find the behaviour surprising. There is no flag that switches on server-side memory and no session identifier that quietly threads calls together. Coherence across a multi-turn exchange is something you construct by faithfully carrying the transcript forward, not something the platform maintains on your behalf. The Building with the Claude API course demonstrates exactly this in its multi-turn lessons: a working chatbot is, at heart, a loop that appends to a message list and resends it every time the user says something.
Two rules for assembling the array
The messages list holds the turns, but two of its structural rules trip up newcomers and show up on the exam. The first concerns the system prompt. Standing instructions, the agent's role, its policies, the output format you want, do not belong as the first entry in the messages array; they go in the separate top-level system field. Anthropic's guidance is explicit that a system message should not be the first item in messages, because system content is configuration that applies from the start, not a conversational turn that accumulates alongside the transcript.
The second rule concerns the other end of the list. You can seed the assistant's next reply by placing a partial assistant message in the final position of messages, a technique called prefill that nudges the model toward a particular shape or opening. It is useful but model-dependent: some models reject a prefilled request with a 400 error, and Anthropic advises using structured outputs or a clear system instruction instead when you need to guarantee a format. Treat prefill as an optimisation to validate against your target model, not a universal lever.
Neither rule changes the headline requirement, since you still resend the accumulated turns on every call, but both shape how you build the request around that history. The transcript is the body of the conversation, the system field is the standing brief that frames it, and an optional prefill is a hint about how the next turn should begin. Keeping the three roles distinct is what turns a correct mental model of statelessness into a request the API will actually accept.
The failure mode: an agent with amnesia
The signature failure is instantly recognisable. A developer, trying to save tokens or simply misunderstanding the contract, sends only the newest user message on each call. The result is an assistant that cannot follow a conversation: it asks for information the user gave a moment ago, contradicts something it just said, and generally behaves as though every turn is the first. Users describe it as forgetful, but the model is not forgetting. It is answering correctly given an input that contains no history.
Because this knowledge point is the remember-level root of task statement 5.1, the exam often dresses it in a support-agent scenario and asks you to name the cause. The correct diagnosis is always the same: the prior turns were not included in the request, so the model had no context to be coherent with. The remedy is equally simple: accumulate and resend. Recognising this instantly, and not reaching for temperature, model size, or an imagined session flag, is the whole point.
How it connects to the rest of context management
If full history is required, then a long conversation inevitably grows toward the window limit, and that is precisely what the rest of Domain 5 manages. The dependency runs one way: you cannot intelligently shrink a transcript until you accept that you must carry one. That is why this page is foundational to the progressive summarisation trap, which is what happens when you compress that growing history badly, and to the persistent case facts block, which protects specifics while the rest of the history is curated.
The right mental model is a balance. You must include enough history for coherence (this requirement) while keeping the transcript lean enough to stay accurate and affordable (summarisation, trimming, and pinning). The two pressures are not in conflict; they are the two halves of context management. Skip the first and the agent is incoherent. Skip the second and the agent runs out of room. Domain 5 is, in large part, the craft of holding both at once.
Worked example
A team ships a customer-support chatbot and, to reduce cost, configures each call to send only the customer's latest message.
The first message works fine. A customer writes, "I want to return order 8891," and the assistant responds helpfully, asking which items they want to return. The customer replies, "Just the headphones." On this second call the app sends only that latest line, "Just the headphones," with no prior turns attached. The model receives a single, contextless sentence. It has no idea what order, what return, or what the headphones relate to, so it responds with a confused clarifying question, as if the conversation had only just begun.
From the user's seat this looks like the bot forgetting what they said ten seconds ago, and trust evaporates immediately. But the model behaved correctly: handed one isolated sentence, it produced a reasonable response to one isolated sentence. The defect is entirely in how the request was assembled. The app discarded the history that gave the latest message its meaning.
The fix requires no new model and no special setting. On the second call, the app sends the accumulated list: the original return request, the assistant's clarifying reply, and then the new line about the headphones. Now the model sees the whole thread, understands that the headphones are an item on order 8891 being returned, and continues coherently. The only change was resending the full transcript, which is exactly what the requirement demands. Cost grows with the conversation, but coherence is non-negotiable, and curating that growth is a later, separate problem.
The cost of carrying everything
Because the transcript grows with every exchange and is resent in full, the number of input tokens you pay for climbs steadily as a conversation lengthens. Each turn re-bills the entire history plus the newest message, so a fiftieth turn is processing the accumulated weight of the forty-nine that preceded it. This is not a defect; it is the direct and predictable consequence of a stateless design that asks you to supply context explicitly. But it does mean that long conversations carry a real and rising cost, and that cost is what the rest of context management exists to contain.
Understanding this growth is what separates an architect who is surprised by their bill from one who planned for it. The pattern is roughly linear in the simplest case: every turn adds about the size of one exchange to what you resend next time. Tool results, if appended without restraint, make the slope steeper, which is one more reason trimming them matters. The point of this knowledge point is not to optimise that cost yet; it is to understand why the cost exists at all. Once you accept that coherence requires carrying the history, the natural follow-up question is how to carry it efficiently, and that question is answered by summarisation, pinned facts, and trimming working together on a transcript you have already committed to sending.
Managing a history you must keep
Accepting the full-history requirement does not mean letting the transcript grow without limit. It means that any technique for controlling its size has to preserve coherence while it works. That is the constraint every other Domain 5 method respects. Summarisation replaces older turns with a faithful recap so the thread still makes sense; it does not simply delete them. A pinned facts block keeps the exact specifics available even as the surrounding turns are compressed. Trimming keeps verbose tool output from inflating the very history you are obliged to resend.
The mental model that ties these together is a single growing list that you are responsible for curating, not a remote session that the platform maintains for you. You decide what goes in, you decide what gets condensed, and you decide what is protected from condensation. The requirement on this page is the first of those responsibilities and the precondition for the rest: you cannot meaningfully curate a transcript until you have accepted that you must assemble and send one on every call. Get that reflex right and the more advanced techniques have a stable foundation to stand on; get it wrong and no amount of clever summarisation can rescue a conversation whose history was never sent in the first place.
In practice this responsibility is freeing rather than burdensome. Because you own the list, you can shape it to the task: keep a verbatim window of the most recent turns where precision matters most, summarise the older middle, and pin the handful of facts an action will need. The platform imposes the requirement to send history; it does not dictate how you assemble it, and that latitude is exactly where good context engineering lives. Two architects can satisfy the same coherence requirement with very different transcripts, one bloated and one lean, and the gap in cost and reliability between them is entirely a product of how deliberately each curated the list they were always obliged to send.
Misconceptions to retire
Misconception
There must be a setting or session flag that makes the API remember earlier turns so I do not have to resend them.
What's actually true
Misconception
Sending only the latest message is a valid way to save tokens on long conversations.
What's actually true
Why this is worth getting reflexively right
Although this is the easiest knowledge point in the cluster, it is load-bearing. Every more advanced context-management technique presupposes that you carry the transcript and then shape it. Get this reflex wrong and you will misdiagnose a whole class of coherence bugs as model failures. Get it right and the rest of Domain 5, tool result trimming included, reads as a set of strategies for managing a history you have already accepted you must send.
A support chatbot replies sensibly to the first message but then keeps asking customers to repeat what they just said. The team configured each API call to send only the customer's most recent message to save cost. What is the cause and fix?
People also ask
Does the Claude API remember previous messages?
Do you have to send the full conversation history on every request?
What happens if you only send the latest message?
Watch and learn
Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.
Multi-Turn conversations
Why watch: Demonstrates that each new turn must resend the entire prior exchange because the API is stateless, which is the core requirement of this KP.
More videos for this concept
References & primary sources
Master this concept with Archie
Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.