Claude Messages API Request-Response Cycle

In short: The Claude Messages API is a stateless HTTP endpoint: you send the full conversation as a list of role-tagged messages and receive back a response made of content blocks plus a stop_reason. Because the endpoint keeps no memory between calls, every request must carry the entire history the model needs to reason about.

What the Claude Messages API actually is

The Claude Messages API is the single HTTP endpoint your code calls to talk to the model. You POST a request describing the conversation so far, and you get back one assistant response. That is the whole contract. Everything more sophisticated, tool use, multi-agent orchestration, long-running autonomous workflows, is built by calling this one endpoint repeatedly and changing what you send each time.

Two properties of the cycle do most of the work on the Claude Certified Architect exam. First, the request is structured as a list of messages, not a single prompt string. Second, the response is structured as a list of content blocks with metadata, not a single block of text. Internalising those two shapes is the difference between reasoning about agents correctly and guessing.

Messages API request-response cycle: One round trip to the /v1/messages endpoint: you submit a list of role-tagged messages, and the model returns an assistant message composed of content blocks plus a stop_reason describing why it halted.

The shape of a request

A request carries a model, a max_tokens ceiling, and a messages array. Each entry in that array has a role, either user or assistant, and a content field. The model is trained on alternating user and assistant turns, so the history reads like a transcript: the human speaks, the assistant replies, the human speaks again.

Crucially, content can be a plain string or an array of typed blocks. A short question can be a string; a turn that includes an image, a document, or a tool result must be an array of blocks. Understanding that content is polymorphic is what later lets you append tool results correctly, because a tool result is just another kind of content block inside a user-role message.

role, who is speaking on this turn: user or assistant.
content, a string for simple text, or an array of content blocks for anything richer.
model and max_tokens, top-level fields that apply to the whole request, not to a single message.

Where system instructions live

One part of the request does not belong to any single turn: the top-level system parameter. System instructions sit outside the messages array entirely, as a sibling of model and max_tokens, and they apply from the very first turn of the conversation onward. This is where you put the persona, the standing rules, and the global context you want in force for every response, rather than burying that guidance inside a user message.

A common mistake is to treat system as just another entry in the messages list, tagged with a role of system. The API does not work that way: there is no system role inside messages, and a system-style instruction cannot be the first item in that array. Because the model is trained on alternating user and assistant turns, the conversation must open with a user turn; the global instructions ride alongside it in the dedicated system field instead.

Keeping that separation straight matters for agent design, because the system prompt is the most reliable place to shape behaviour across an entire loop. Since the endpoint keeps no memory and you resend everything each turn, the system field travels with every request, steering each fresh response without ever being mistaken for conversational content. Later domains build on this when they steer tool use and enforce guardrails through the system prompt.

The shape of a response

When the endpoint replies, you do not get a string back. You get an assistant message whose content is an array of blocks. The most common block is a text block, but when tools are configured the response can also contain tool_use blocks, and it can contain both at once, narrating in text while also requesting a tool.

Alongside the content, the response carries a stop_reason (covered in depth in stop_reason field inspection) and a usage object reporting input and output token counts. Treating the response as a rich object rather than a string is the habit that prevents a whole category of agent bugs, because the signal your loop needs to act on lives in those structured fields, not in the prose.

messages[]

request is a list of turns

content[]

response is a list of blocks

stateless

no memory between calls

Why statelessness changes how you design

The endpoint remembers nothing. Call it twice and the second call has no idea the first ever happened unless you tell it. That is the single most consequential fact about the cycle, and it is the most common trap the exam sets for this knowledge point.

Because the API is stateless, you own the conversation history. After each response, your code must append the assistant's reply to your running list of messages, and on the next turn you send the whole list again. There is no session you resume, no hidden server-side thread. If a multi-turn agent suddenly behaves as though it has amnesia, the cause is almost always history that was not carried forward.

One turn of the request-response cycle

Loading diagram...

Each call is independent; your code is the only thing that remembers the conversation.

Reading content blocks in practice

Because the response content is an array, your code has to iterate it rather than reach for a single field. A response might contain only a text block, only a tool_use block, or a text block followed by a tool_use block. Each block announces its own type, and your handling branches on that type. Text blocks expose a text field; tool_use blocks expose a name, an input object, and an id that you will need later to match the result back to the request.

This is why experienced architects never write response.content[0].text and assume it is the whole answer. Index zero might be a narration block while the meaningful action sits in a later tool_use block. Walking every block, checking each one's type, and collecting what you need is the disciplined pattern, and it generalises cleanly to richer turns that mix text, tool calls, and other block types in a single response.

What the cycle deliberately does not do

It is just as useful to know the boundaries of the request-response cycle. The endpoint does not execute tools for you: when the model emits a tool_use block, it is requesting that your code run the function and report back. The endpoint does not maintain conversation state, schedule retries, or decide when an agent is finished. All of that orchestration lives in your application around the call.

That division of labour is intentional and is exactly what makes the API composable. Because the endpoint does one narrow thing, turn a history into one more assistant turn, you can wrap it in any control flow you like: a simple while loop, a multi-agent coordinator, or a long-running autonomous workflow. The model supplies judgement on each turn; your code supplies the loop, the memory, and the tool execution.

How this becomes an agentic loop

A single request-response cycle answers one question. An agent answers an open-ended goal, and it does that by running the cycle in a loop. The pattern is: send the history, read the response, decide whether more work is needed, optionally append something new, and send again. Anthropic's own guidance describes agents as systems where the model "dynamically directs its own processes and tool usage" across exactly this kind of repeated loop.

So the request-response cycle is not a small implementation detail you can skip past. It is the atomic operation that every other Domain 1 pattern composes. Once you can describe precisely what goes up and what comes down on a single call, the stop_reason field tells you when to loop again, tool result appending tells you what to add before you do, and the anti-patterns of loop control tell you which shortcuts to avoid. None of those concepts make sense until the cycle underneath them is clear.

Worked example

A customer-support agent answers a billing question across two turns, with a get_invoice tool available.

You build a messages array containing one user message: "Why was I charged twice in May?" You POST it with the get_invoice tool defined. The response comes back as an array of blocks, a short text block plus a tool_use block requesting get_invoice, and a stop_reason.

Now statelessness bites. To continue, you cannot just send the tool's output on its own. You must rebuild the history: the original user message, then the assistant message you just received (with its tool_use block), then a new user message carrying the tool result. That full list goes back to the endpoint. The model reads the whole transcript afresh, sees the invoice data, and produces the final billing explanation. Two cycles, one growing history, and nothing remembered by the server.

Notice what your code had to do between the two calls: inspect the response blocks to find the tool request, run the function, and assemble a new message list. The endpoint did none of that for you. If you had instead grabbed the first content block as text and shown it to the customer, you would have surfaced a half-formed narration and never run the invoice lookup at all. The cycle is simple, but the responsibilities it leaves to your code are precisely where careful agent design lives.

Common misreadings to avoid

The exam rewards architects who can spot where a design quietly assumes server-side memory. The two failure modes below are both rooted in misunderstanding the cycle.

Misconception

The Claude Messages API remembers previous turns, so I only need to send the newest message.

What's actually true

The endpoint is stateless. It has no memory of earlier calls. Each request must include the entire conversation history, prior user and assistant messages, or the model loses all context from earlier turns.

Misconception

A response from the Messages API is a string of text I can use directly.

What's actually true

A response is an assistant message whose content is an array of blocks (text and/or tool_use), accompanied by a stop_reason and token usage. Reading only response.content[0].text discards the structured signals an agent loop depends on.

Token usage and the cost of carrying history

The stateless design has a practical consequence worth understanding early: because you resend the whole conversation on every turn, the input token count grows with each round trip. The usage object in each response reports input_tokens and output_tokens, and on a long agentic run the input side dominates, since you are re-submitting an ever-lengthening transcript. This is not a flaw, it is the price of a clean, stateless contract, but it is the reason later Domain 5 knowledge points care so much about context management, summarisation, and trimming history that is no longer relevant.

For the foundations exam you do not need to optimise token spend here; you need to understand why it grows. Every turn you add the model's previous reply plus any new tool results, and that accumulated history is what you pay to process again next time. Architects who grasp the cycle can predict this growth; those who imagine a stateful session are surprised by it.

How it shows up on the exam

Domain 1 (Agentic Architecture & Orchestration) is the most heavily weighted domain at 27% of the exam, and this knowledge point sits at its root. Questions rarely ask you to recite the endpoint name; instead they describe an agent that "forgets" earlier context, a developer surprised that the API did not retain a conversation, or a response that was read as a string and lost its tool call, and ask you to name the cause. The correct answer always traces back to the same two facts: the cycle is stateless so the caller owns the history, and the response is structured so you must read its blocks. Master those and the rest of the agentic-loop knowledge points have somewhere solid to stand.

Check your understanding

A developer builds a multi-turn assistant by sending only the user's latest message on each Messages API call. Users report that the assistant constantly forgets what was said moments earlier. What is the root cause?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Claude Messages API Request-Response Cycle Explained