Claude Haiku vs Sonnet vs Opus: Pick the Right Model
A practical guide to claude haiku vs sonnet vs opus for engineers building agents: capability tiers, cost trade-offs, and routing logic for production MCP workflows.
By Solomon Udoh · AI Architect & Certification Lead

Choosing between claude haiku vs sonnet vs opus is not a question of prestige; it is a FinOps decision that compounds across every agentic loop you run. Get it wrong and you will either overspend on capability you do not need or under-provision on tasks where accuracy failures cost more than the token savings. This guide gives engineers the framework to route intelligently from day one.
What are the three Claude model tiers and what does each do well?
Anthropic publishes three named tiers in the Claude 3 and Claude 3.5/3.7 families: Haiku (fastest, cheapest), Sonnet (balanced), and Opus (most capable, most expensive). Each tier is designed for a distinct cost-capability band, and Anthropic updates models within each tier over time, so the specific version you call matters less than understanding the tier's role in your architecture.
| Tier | Primary strength | Typical use case |
|---|---|---|
| Haiku | Low latency, low cost | Classification, routing, short extraction |
| Sonnet | Balanced reasoning and cost | Code generation, summarisation, tool use |
| Opus | Highest reasoning depth | Complex multi-step planning, nuanced judgment |
Models within each tier share a design philosophy: Haiku optimises for throughput, Sonnet for versatility, Opus for depth.
The practical implication: a well-designed agent system rarely uses a single tier. It routes tasks to the cheapest model that can handle them reliably.
How do token costs differ across Haiku, Sonnet, and Opus?
Anthropic publishes per-token pricing on its pricing page. The ratios between tiers matter more than the absolute numbers, because those ratios determine the ROI of intelligent routing.
As a rule of thumb, Opus input tokens cost roughly 15x more than Haiku input tokens, and Sonnet sits at roughly 3x Haiku. Output tokens carry a higher multiplier in every tier. This means a single Opus call that could have been handled by Haiku wastes an order of magnitude more budget.
For agentic loops specifically, the cost gap widens. Agentic loops consume 4 to 15 times more tokens than single-turn chat interactions because each iteration re-sends tool results, conversation history, and system prompts. At 10x token amplification, routing a loop incorrectly to Opus instead of Sonnet can inflate your bill by 50x compared with a Haiku baseline.
The CCA-F exam weights Domain 1 (Agentic Architecture and Orchestration) at 27%, the largest single domain. Cost-aware routing is a core competency tested there.
What tasks belong to each model tier in a production agent?
When should you use Haiku?
Use Haiku for any task where the input is short, the output is structured, and correctness can be validated programmatically. Good candidates:
- Intent classification ("is this a billing query or a technical query?")
- Slot extraction from a user utterance
- Routing decisions in a hub-and-spoke orchestrator
- Generating short, templated responses where a schema enforces correctness
Because Haiku is fast, it also works well as a pre-filter before an expensive Opus call. If Haiku can answer with high confidence, you never pay for Opus.
When should you use Sonnet?
Sonnet is the workhorse tier for most production workloads. It handles multi-step reasoning, code generation, and tool-use chains without the latency or cost of Opus. In MCP integrations, Sonnet is usually the right default for tool-calling agents because it balances schema adherence with reasoning depth.
Sonnet is also the sensible choice for summarisation pipelines, document Q&A, and most prompt engineering tasks where you need nuanced output but not frontier-level reasoning.
When should you use Opus?
Opus earns its cost premium on tasks where:
- The reasoning chain is long and interdependent (planning a multi-week project, auditing a complex codebase).
- Errors are expensive to recover from (financial decisions, legal document review).
- You need the model to catch subtle contradictions or edge cases that Sonnet misses.
The CCA-F exam consistently rewards deterministic, proportionate solutions. Routing every task to Opus because it is "safer" is not proportionate; it is a FinOps anti-pattern. Reserve Opus for the tasks where its incremental accuracy gain justifies the cost.
How do you build a cost-aware routing layer in practice?
A routing layer is a lightweight orchestrator that inspects each task and assigns it to the cheapest model tier that meets a confidence threshold. Here is a minimal pattern:
import anthropicclient = anthropic.Anthropic()ROUTING_SYSTEM = """You are a task router. Classify the incoming task as one of:- haiku: short extraction, classification, or templated response- sonnet: multi-step reasoning, code, tool use, summarisation- opus: complex planning, high-stakes judgment, long reasoning chainsRespond with a JSON object: {"tier": "<haiku|sonnet|opus>", "reason": "<one sentence>"}"""def route_task(task_description: str) -> dict:response = client.messages.create(model="claude-haiku-4-5", # Use Haiku to route; never pay Opus to decidemax_tokens=64,system=ROUTING_SYSTEM,messages=[{"role": "user", "content": task_description}])import jsonreturn json.loads(response.content[0].text)MODEL_MAP = {"haiku": "claude-haiku-4-5","sonnet": "claude-sonnet-4-5","opus": "claude-opus-4-5",}def execute_task(task_description: str, task_payload: str) -> str:routing = route_task(task_description)model = MODEL_MAP[routing["tier"]]response = client.messages.create(model=model,max_tokens=1024,messages=[{"role": "user", "content": task_payload}])return response.content[0].text
Key point: the router itself runs on Haiku. You never pay Opus to decide whether to use Opus.
For context management in long-running agents, pair this router with a summarisation step so that the context passed to Opus is as compact as possible. Opus's cost advantage disappears if you are sending it 50,000 tokens of stale history.
How does model selection interact with MCP tool design?
Tool-calling amplifies cost differences between tiers because each tool call adds a round-trip: the model emits a tool-use block, your server executes the tool, and the result re-enters the context. In a five-tool chain, you pay for five model calls plus the growing context.
The implication for tool design: narrow, well-described tools reduce the number of round-trips required. A tool that does one thing precisely lets Sonnet succeed where a vague, multi-purpose tool might require Opus to reason through ambiguity.
Concretely:
{"name": "get_invoice_status","description": "Returns the payment status of a single invoice by invoice_id. Returns one of: paid, pending, overdue, void. Do NOT use for bulk invoice queries.","input_schema": {"type": "object","properties": {"invoice_id": {"type": "string", "description": "The unique invoice identifier, e.g. INV-20240312-001"}},"required": ["invoice_id"]}}
A description this precise lets Sonnet select the right tool on the first attempt. Ambiguous descriptions force the model to hedge, sometimes calling the wrong tool and burning an extra round-trip at full context cost.
What does the CCA-F exam test about model selection?
The CCA-F exam covers model selection implicitly across several domains. Domain 4 (Prompt Engineering and Structured Output, 20%) tests whether you can design prompts that work reliably at a given capability tier. Domain 5 (Context Management and Reliability, 15%) tests whether you understand how context length and model tier interact.
Exam scenarios typically present a production constraint (latency budget, cost ceiling, accuracy requirement) and ask you to identify the appropriate model tier and architecture. The exam rewards proportionate solutions: if a task can be solved reliably with Haiku, choosing Opus is wrong even if Opus would also work.
The exam consistently rewards deterministic solutions over probabilistic ones when stakes are high, proportionate fixes, and root-cause tracing.
Our concept library maps 174 atomic concepts to the five exam domains, including model selection trade-offs, routing patterns, and context management strategies. AI Skill Certs is independent of Anthropic and not endorsed by Anthropic.
How should you instrument cost observability in an agent?
Routing to the right tier is step one. Knowing whether your routing is working is step two. Build cost observability into your orchestrator from the start:
import anthropicfrom dataclasses import dataclass, fieldfrom typing import Optional@dataclassclass AgentCallRecord:task_id: strmodel: strinput_tokens: intoutput_tokens: inttier: strrouted_by: str = "haiku-router"def tracked_call(client: anthropic.Anthropic,task_id: str,model: str,tier: str,system: str,user_message: str,max_tokens: int = 1024,) -> tuple[str, AgentCallRecord]:response = client.messages.create(model=model,max_tokens=max_tokens,system=system,messages=[{"role": "user", "content": user_message}])record = AgentCallRecord(task_id=task_id,model=model,input_tokens=response.usage.input_tokens,output_tokens=response.usage.output_tokens,tier=tier,)return response.content[0].text, record
Aggregate AgentCallRecord objects per workflow run. If your Opus spend exceeds a threshold you set in advance, that is a signal your routing logic is misclassifying tasks, not a signal to raise the budget.
The agentic architecture domain of the CCA-F covers orchestrator design patterns in depth, including how coordinators should handle cost signals and when to escalate versus retry.
What is the right default model for a new agent project?
Start with Sonnet as your default. It handles the majority of production workloads reliably, and its cost is low enough that early-stage token waste is not catastrophic. As you instrument your agent and gather real task distributions, you will find a subset of tasks that Haiku handles correctly (move those down) and a subset where Sonnet fails in ways that matter (move those up to Opus).
This iterative approach is more reliable than guessing upfront. It also gives you empirical data to justify model spend to stakeholders, which matters when you are scaling from pilot to production.
As of 3 June 2026, over 10,000 individuals have earned the CCA-F certification, and the exam tests exactly this kind of production-grade, cost-proportionate thinking. If you are preparing for the exam, the model selection trade-offs covered here map directly to Domain 1, Domain 4, and Domain 5 scenario questions.
Frequently asked questions
Can I switch between Haiku, Sonnet, and Opus mid-conversation in an agentic loop?
Does prompt caching work across all three Claude model tiers?
How does the CCA-F exam test knowledge of model selection between Haiku, Sonnet, and Opus?
Is there a latency difference between Haiku, Sonnet, and Opus that matters for real-time applications?
Should I use Opus for all high-stakes production decisions to minimise risk?
How do I measure whether my model routing is actually saving money?
People also ask
What is the difference between Claude Haiku and Claude Sonnet?
When should I use Claude Opus instead of Sonnet?
Is Claude Haiku good enough for production use?
How much more expensive is Claude Opus than Haiku?
Can Claude Haiku use tools and MCP integrations?
About the author
AI Architect & Certification Lead
Solomon Udoh is an AI Architect who designs and ships production agent systems on the Claude API and Claude Code. He built AI Skill Certs' adaptive engine and authored its 174-concept knowledge graph, mapping every Claude Certified Architect - Foundations objective to hands-on, exam-aligned practice.
- Designs production multi-agent systems on the Claude API and Agent SDK
- Author of the AI Skill Certs knowledge graph (174 mapped exam concepts)
- Builds with MCP, Claude Code, structured outputs, and agentic loops daily
- Reviews every concept page against the official Anthropic exam guide
You might also like
Ready to put it into practice?
Study every exam concept with an adaptive tutor.