AI Skill Certs
Prompt Engineering & Structured Output·Task 4.5·Bloom: apply·Difficulty 3/5·9 min read·Updated 2026-06-07

Synchronous vs Batch API: The Decision Rule

Design efficient batch processing strategies

SUBy Solomon UdohReviewed by Solomon UdohAI-assisted · human-reviewed
In short
The synchronous versus batch decision rule says to match the delivery model to who is waiting: use the synchronous API for blocking workflows where a person or a downstream step is held up until Claude answers, and use the Message Batches API for latency-tolerant bulk work that can finish any time inside a 24-hour window. The half-price batch discount is only a benefit when nothing is blocked on the result.

The synchronous vs batch API choice in one sentence

The synchronous vs batch API choice is not a cost optimisation puzzle; it is a question about who is waiting. If a person is staring at a screen, or a pipeline step cannot advance until Claude answers, the work is blocking and it must run synchronously. If the output can be produced any time over the next several hours and nothing downstream is stalled in the meantime, the work is latency-tolerant and it belongs on the Message Batches API, where it costs half as much. Everything else in this knowledge point is an application of that single test.

This is an apply-level skill, which is why the exam rarely asks you to define the two APIs. Instead it hands you a workflow and expects you to route it correctly. Getting there reliably means resisting the pull of the 50% discount and asking the blocking question first. The discount is real, but it is worthless on a workflow that cannot tolerate a 24-hour window. The constraints behind that window are covered in Message Batches API constraints; here we turn them into a routing decision.

Blocking vs latency-tolerant
A blocking workflow holds a person or a downstream step idle until Claude responds, so it needs the synchronous API. A latency-tolerant workflow can accept its results any time within hours, so it can use the asynchronous Message Batches API and earn the 50% discount.

Map the workflow to who is waiting

The cleanest way to apply the rule is to picture the moment Claude is called and ask what happens next in the real system. In a blocking workflow, the next thing that happens is waiting: a developer sits on a pull request until a review check returns, a support agent holds the customer while a draft is generated, a build step refuses to advance until a generated test compiles. None of those can absorb a delay measured in hours, so synchronous is the only correct answer regardless of cost.

In a latency-tolerant workflow, the next thing that happens is nothing: the job was kicked off by a scheduler at 2 a.m., the results land in a table, and a human reads them the following morning. A nightly summary of yesterday's tickets, a weekly compliance audit over a document corpus, an overnight regeneration of product descriptions, a large evaluation sweep over a test set. Nobody is blocked, so the half-price asynchronous path is the obvious win.

  • Synchronous fits: interactive chat, pre-merge CI checks a developer waits on, live agent assistance, anything user-facing.
  • Batch fits: overnight report generation, scheduled audits, nightly test generation, bulk enrichment, offline evaluations.
Routing a workload by who is waiting
Loading diagram...
The blocking question is the only branch that matters; cost is a consequence of the answer, not the question.

The classic exam trap: batch everything

The signature question for this knowledge point, mirrored in the official sample exam, presents a manager who has noticed the 50% saving and proposes moving every Claude call to batch. The proposal is wrong, and explaining why is the whole skill. Batching the latency-tolerant jobs is correct and should be done. But the same change applied to blocking workflows is a regression: a pre-merge check that previously returned in seconds would now take up to 24 hours, so developers could not merge; a customer-facing reply would arrive a day after the customer left. The discount does not compensate for a workflow that no longer functions.

The disciplined answer is therefore never "batch everything" and never "keep everything synchronous." It is to split the portfolio: route blocking work to the synchronous API, route latency-tolerant work to batch, and capture the saving exactly where it does no harm. An architect who can articulate that split, rather than reaching for a single global setting, is demonstrating the judgment the exam is checking for.

Edge cases that sharpen the rule

A few situations look ambiguous until you re-apply the blocking test. "Near-real-time" dashboards that refresh every few minutes are still blocking, because a 24-hour window cannot feed a few-minute refresh. A bulk job with a hard contractual deadline in two hours is not safely batchable, because batch only guarantees the 24-hour ceiling, not a two-hour one. Conversely, a job a user triggers but does not wait on, such as "email me this 5,000-row analysis when it is ready," is latency-tolerant even though a person initiated it, because nobody is held idle. The trigger does not decide the routing; the waiting does.

The interactive-loop limitation reinforces the rule from the other direction. Even a cost-sensitive agent that must call a tool, read the result, and continue cannot be batched, because a batch request is a single shot. So some workloads are forced synchronous by shape, not just by latency. When you suspect a workload might be blocked by either timing or shape, default to synchronous and look for a different cost lever.

Some capabilities force the synchronous path

Latency is the usual deciding factor, but a few synchronous-only capabilities can force a workflow onto the live API even when nothing is waiting on the result. The batch path does not support response streaming, so any experience that renders Claude output token by token as it is produced cannot run as a batch; the asynchronous model only ever hands back a finished result file. The low-latency fast speed mode is likewise unavailable on batch, as are the live conversation-threading fields, because none of them carry meaning in an offline, scheduled run.

This sharpens the decision rule in a useful way. Before you route a latency-tolerant job to batch for the discount, confirm it does not depend on a synchronous-only feature. A workload that streams partial output to a dashboard, or that leans on fast mode to hit a tight interactive target, is pinned to the synchronous API by its required capabilities, not merely by who is waiting. When capability and latency disagree, capability wins: a feature the batch path cannot offer is a hard constraint, while latency tolerance is only a permission.

Worked example

A platform team runs two Claude workloads: a pre-merge code-review check developers wait on before merging, and a nightly job that generates regression tests for the next day. Finance asks them to cut the Claude bill in half by switching to batch.

Take each workload through the blocking test separately, because the right answer is different for each.

The pre-merge review is blocking by definition: a developer opens a pull request and cannot merge until the check returns. If it moved to batch, the developer would wait up to 24 hours for a result that used to arrive in seconds, stalling every merge. No discount justifies that, so the review stays synchronous.

The nightly test-generation job is latency-tolerant: a scheduler starts it after hours, and the tests only need to exist before the team arrives in the morning. The 24-hour window sits comfortably inside that gap, and nobody is blocked overnight, so this job moves to batch and earns the 50% saving.

The correct recommendation to finance is not "yes, switch everything" or "no, we cannot save anything." It is "we will batch the nightly job, which is where the volume and the saving actually are, and keep the pre-merge check synchronous so developers are not blocked." That split honours the budget without breaking the workflow people wait on, and it is exactly the reasoning the exam rewards.

Run the test in your head in seconds

In an exam setting you do not have time to draw a decision matrix, so it helps to rehearse the test as a quick internal monologue. Read the scenario and immediately ask, in your mind, who is waiting here? If the answer names a person at a keyboard or a pipeline step that cannot move on, you have a blocking workflow and the synchronous API is the answer before you read another word about cost. If the answer is nobody, it just needs to be done by morning, you have a latency-tolerant workflow and batch is on the table. The cost details, the volumes, and the discount are all secondary; they only tell you how much you gain once the blocking question has already settled which side you are on.

This habit matters because exam questions are deliberately written to tempt you with the discount before you have asked who is waiting. The figure of 50% will be dangled in front of a blocking workflow precisely to see whether you reach for it reflexively. An architect who has internalised the monologue is immune to the bait: they answer the blocking question first, every time, and only then let cost break ties among the workflows that are genuinely latency-tolerant.

What the discount is really buying

It is worth being precise about what the saving represents, because that precision sharpens the decision. The discount is not a reward for using a particular endpoint; it is compensation for surrendering control over timing. When you batch, you give the provider the freedom to schedule your work whenever capacity is convenient, and that flexibility is what makes the work cheaper to serve. Seen that way, the rule is almost a tautology: you can only sell timing flexibility you actually have. A blocking workflow has no timing flexibility to sell, so it cannot earn the discount, no matter how much you would like it to.

This framing also guards against a subtler mistake, treating batch as a way to dodge rate limits on urgent work. Batch does raise throughput, but it does so by spreading work across a long window, not by accelerating any single result. If the urgency is real, the long window is exactly what you cannot afford, and the apparent throughput win is illusory for that workload. Matching the delivery model to the timing you genuinely have, rather than the price you wish you could pay, is the whole of the skill. The same logic answers the tempting middle-ground question of whether a workflow that is only mildly time-sensitive can be batched to split the difference. It cannot, because batch offers no middle setting: there is no fast-batch tier, only the asynchronous window with its 24-hour ceiling. A workflow either tolerates that window or it does not, and pretending a half-urgent job will reliably finish in minutes is just the blocking trap wearing a more reasonable disguise.

Why the rule generalises

The synchronous vs batch API decision is one instance of a broader architectural habit: match the delivery mechanism to the consumption pattern, not to the price tag. The same instinct shows up when you choose deterministic enforcement over a probabilistic prompt for a high-stakes enforcement decision. In both cases the cheap or convenient default is wrong for the cases that actually matter, and the architect's value is in spotting which cases those are. Once you can reliably split a portfolio into blocking and latency-tolerant work, you are ready to design the end-to-end pipeline that the batch processing strategy design knowledge point assembles.

Misconception

The Batch API is just a cheaper version of the synchronous API, so defaulting everything to batch saves money with no downside.

What's actually true

Batch is asynchronous with up to a 24-hour window and no latency SLA. Routing blocking work to it breaks that work: developers and users would wait up to a day for answers they used to get instantly. The discount only helps on latency-tolerant work.

Misconception

If a user starts the job, it must be synchronous; if a scheduler starts it, it must be batch.

What's actually true

What decides the routing is whether anyone is blocked waiting on the result, not who triggered it. A user can trigger a job they do not wait on, like an emailed report, which is latency-tolerant; and a scheduled near-real-time refresh can still be blocking.
Check your understanding

A manager sees the 50% batch discount and directs the team to move all Claude API calls to the Message Batches API, including the live customer-support reply generator and the overnight analytics summariser. What is the best response?

People also ask

When should I use the Batch API instead of the synchronous API?
When the work is latency-tolerant and high volume and nothing is blocked waiting on each result, such as overnight reports, scheduled audits, and nightly enrichment.
Can the Batch API be used for real-time responses?
No. It is asynchronous with up to a 24-hour window and no latency guarantee, so it cannot serve chat replies, live checks, or anything a person is waiting on.
Is it a good idea to batch every Claude request to save money?
No. Batching blocking workflows breaks them by introducing up to a 24-hour delay. Split the portfolio: batch the latency-tolerant work and keep blocking work synchronous.
What workflows are a good fit for batch processing?
Overnight report generation, weekly compliance audits, nightly test generation, bulk dataset enrichment, and large offline evaluation runs.

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

Big Data Landscape

How to Reduce Claude API Costs with Batch Processing

Why watch: Frames when batch processing is the right choice versus single-request synchronous calls, reinforcing the decision rule of using batch for latency-tolerant bulk work and synchronous for blocking workflows.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying