Parallel Subagent Spawning in Claude Code

In short: Parallel subagent spawning is emitting multiple Task tool calls inside a single coordinator response so the subagents execute concurrently rather than one after another. For independent subtasks this collapses total latency to roughly the time of the slowest worker instead of the sum of all of them.

What parallel subagent spawning means

Parallel subagent spawning is a latency technique, and the mechanism is simpler than it sounds: the coordinator emits more than one Task tool call in the same response. When it does, Claude Code launches those subagents concurrently, each in its own isolated context, and collects their summaries when they finish. The alternative, spawning one subagent, waiting for its summary, then spawning the next on a later turn, is sequential and pays the full cost of every worker added together.

The official guidance states the payoff plainly: multiple subagents can run concurrently, so independent subtasks finish in the time of the slowest one rather than the sum of all of them. That is the whole idea. If you have three research jobs that do not depend on each other, you do not have to suffer three round-trips of waiting. You fan them out in a single turn and wait once.

Parallel subagent spawning: Issuing multiple Task tool calls within one coordinator turn so the spawned subagents execute at the same time. It reduces end-to-end latency for independent subtasks from the sum of their durations to approximately the longest single duration.

Why one turn is the trigger

The detail that earns marks is that concurrency is bound to the single response. Subagents run in parallel because the coordinator asked for all of them at once, in one turn. Emit the calls across separate turns and you have serialised them by construction, even if each task could have run alongside the others. So the practical instruction is to recognise independence first, then batch every independent spawn into the same coordinator response.

This builds directly on the Task tool and AgentDefinition: each parallel call still names a defined subagent and still needs the coordinator to hold the spawning tool in its allowed tools. Parallelism does not change what a spawn is; it changes how many you issue at the same moment. And because each subagent runs in isolated context, the workers do not interfere with one another while they run.

When parallelism is the wrong call

Concurrency is only free when the tasks are truly independent. Two constraints flip the decision. First, dependency: if subagent B needs subagent A's output, they cannot run at the same time, because B would start before A's result exists. That work is inherently sequential, and trying to parallelise it just produces a B that ran on missing information. Second, shared writes: subagents that edit the same file should never run in parallel, because their changes can collide. Read-only fan-out is the safe, common case; concurrent writers to one resource is the dangerous one.

The exam trap on this knowledge point cuts both ways. Spawning independent tasks sequentially wastes time, and that is the failure the knowledge point names directly. But blindly parallelising dependent or conflicting work is the opposite mistake. An architect reasons about the dependency graph first and only fans out the parts that have no edges between them.

1 turn

all parallel Task calls share one coordinator response

≈ slowest

total latency for independent workers

sequential

what dependent or same-file tasks must stay

Scaling beyond a handful

Parallel spawning works well for a few delegated tasks per turn. When a job needs to coordinate dozens or hundreds of agents, the recommended tool changes: a workflow construct moves the orchestration into a script the runtime executes outside the conversation, which keeps the coordinator context from drowning in fan-out bookkeeping. For the foundations exam you mainly need the core case, a small number of independent subagents launched together, but it is worth knowing that the pattern has a ceiling and a successor.

Sequential versus parallel spawning of independent subagents

Loading diagram...

Independence is the gate: only independent subtasks should be fanned out into one turn.

Reading the dependency graph before you fan out

The decision to parallelise is really a graph question, and learning to read that graph quickly is the practical skill. Sketch the subtasks as nodes and draw an edge wherever one task needs another's output. Tasks with no edges between them are independent and can share a turn; any chain of edges must run in order. A clean fan-out, one coordinator feeding several leaf workers that never reference each other, is the ideal shape for parallelism, because every leaf can start immediately with what it was given.

The shared-resource edge is the one people forget to draw. Two tasks can be logically independent yet still collide if they write the same file, so a write to a shared resource counts as a dependency for the purpose of this decision even when neither task reads the other's result. Read-only work over distinct inputs is almost always safe to fan out; write work over a common target is not. Training yourself to add those hidden edges before batching is what separates a correct parallel design from one that races and corrupts state.

Isolating shared-file work with worktrees

The same-file constraint is real, but it is not always the end of the discussion. When two tasks are logically independent and collide only because they would write the same files, the documented remedy is to give each worker its own isolated checkout, a worktree, so the writes land in separate working trees instead of on top of one another. The Claude Code guidance is explicit that when delegated tasks touch the same files you should isolate them with worktrees rather than assume concurrent writers will be safe. With that isolation in place, work that looked sequential purely because of file contention can run in parallel after all, and you merge the results afterward.

This sharpens the dependency check from the previous section, because a shared-write edge is really two different things wearing one label. If task B needs the content task A produces, that is a genuine logical dependency and no amount of isolation removes it, since B cannot start before A's result exists. But if A and B merely happen to write the same path while needing nothing from each other, that is a contention problem, and a worktree per worker dissolves it. Reading which kind you have is the judgement: logical dependencies stay sequential, while pure contention can be isolated and then parallelised.

For the foundations exam the point to carry is that "they edit the same file" is a reason to isolate, not automatically a reason to serialise. The default still holds that concurrent writers to one shared target are unsafe, so you never simply hope the writes interleave cleanly. You either sequence the work or you give each worker an isolated worktree, and only then do you fan the contention-free tasks out into a single coordinator turn.

What parallel spawning does not fix

It is just as important to know the limits of the technique, because the exam offers parallelism as a tempting non-answer to problems it cannot solve. Parallelising does not reduce total token cost; if anything it runs the same workers at the same time, so you pay for all of them, just sooner. It does not improve correctness, because running wrong work concurrently simply produces wrong answers faster. And it adds coordination overhead, since the coordinator must still gather, reconcile, and make sense of several summaries arriving together.

So parallel subagent spawning is precisely a latency optimisation for independent work, and nothing more. When a scenario's real problem is cost, the answer lies in smaller models or less redundant work; when the problem is accuracy, it lies in better prompts or context; when the problem is a dependency, the honest answer is that the work must stay sequential. Matching the technique to the kind of problem it actually addresses, and refusing it where it does not apply, is the disciplined judgement this knowledge point is testing.

How this is tested on the exam

Task statement 1.3 frames this as a design judgement under a latency constraint. A scenario describes a multi-agent system that feels slow, gives you the shape of its subtasks, and asks how to speed it up without breaking correctness. When the subtasks are independent, the answer is to emit the Task calls together in one turn so they run concurrently. The wrong answers usually involve a bigger model or a bigger context window, neither of which touches the real cost, which is serial waiting.

The harder variant tests restraint. The scenario describes tasks that share a dependency or write the same file, and a tempting distractor suggests parallelising them. The correct response is that those tasks must remain sequential. The skill being assessed is reading the dependency structure and applying concurrency only where it is genuinely safe.

Worked example

A coordinator audits three independent microservices for outdated dependencies. Each audit takes about eight seconds, and the whole run currently takes around twenty-four seconds.

The first implementation spawns the auditors one at a time. The coordinator launches the service-A auditor, waits for its summary, then launches the service-B auditor on the next turn, waits again, then service C. Three eight-second jobs run back to back, so the user stares at a spinner for roughly twenty-four seconds even though the three audits never touch each other.

Reading the dependency graph reveals there are no edges: service A, service B, and service C share nothing, and each auditor only reads its own service. That independence is the green light for concurrency.

The fix is to emit all three Task calls inside a single coordinator response. Claude Code launches the three auditors concurrently in their own isolated contexts, they finish at about the same time, and their summaries return together. Total wall-clock time falls to roughly eight seconds, the duration of the slowest single auditor, a three-fold speed-up with no change to what each worker does. Had the audits instead fed into one another, this batching would have been wrong, and the sequential version would have been the correct, if slower, design.

Common misreadings to avoid

Misconception

Subagents automatically run in parallel, so I do not need to do anything special.

What's actually true

Concurrency happens only when the coordinator emits multiple Task calls in a single turn. Spawning one subagent per turn serialises them. You have to batch the independent spawns into one response to get parallel execution.

Misconception

Running subagents in parallel is always faster, so I should parallelise everything.

What's actually true

Parallelism is only safe for independent tasks. Dependent tasks must run sequentially so each sees the previous result, and subagents editing the same file should not run concurrently because their writes can collide.

Where it fits

Parallel spawning is one of three levers task statement 1.3 hands you for shaping a multi-agent run, alongside how you phrase each worker with goal-based prompts and how you brief it with structured context passing. Used together, they let you design a coordinator that is both fast and correct: independent work fanned out in one turn, each worker given a clear goal and a self-contained briefing.

A final framing helps the idea stick. Latency in a sequential pipeline is additive: every worker you bolt on lengthens the wait by its full duration, because each one starts only after the last has finished. Parallelism converts that addition into a maximum: the wait becomes the single longest worker, no matter how many independent workers you add alongside it. That is a profound shape change for a multi-agent system, and it is why recognising independence is worth the effort. The architect who can look at a slow design, separate the genuinely independent work from the truly dependent chain, and fan out only the former, turns a pipeline that scales linearly with the number of tasks into one that scales with the slowest single task. Holding both halves of that judgement, the speed-up where it is safe and the restraint where it is not, is what this knowledge point ultimately asks you to demonstrate.

Keep one last practical habit in mind when you reach for the technique. Independence is a property to verify, not assume, so before you batch a set of spawns into one turn, state out loud what each worker reads and what each worker writes. If no worker reads another worker output and no two workers write the same target, the fan-out is safe and the latency win is yours to take. The moment that check fails for even one pair, peel those two tasks out of the parallel batch and sequence them, while still fanning out everything that remains independent. That mixed shape, a parallel core with a sequenced tail, is often the fastest design that is still correct, and recognising it is the mark of an architect who reads concurrency as a property of the work rather than a switch to flip.

Check your understanding

A coordinator runs three independent code-analysis subagents, each taking roughly ten seconds, by spawning them one per turn for a total of about thirty seconds. The tasks share no data and read different files. What is the most effective change?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

People also ask

Watch and learn

References & primary sources

Master this concept with Archie