Anthropic Certification: Master All 5 Domains to Pass
The Anthropic certification (CCA-F) spans 5 weighted domains. Learn which to prioritise, what scenarios appear, and how to reach the 720 passing score.
By Solomon Udoh · AI Architect & Certification Lead

The Anthropic certification that practitioners are chasing in 2026 is the Claude Certified Architect, Foundations exam (CCA-F), launched 12 March 2026 at $99 per attempt. It delivers 60 scenario-based questions scored on a 100-to-1000 scale, with 720 as the passing mark. As of 3 June 2026, more than 10,000 individuals have already cleared it. This guide maps every domain, weights the study time you should allocate, and flags the anti-patterns the exam consistently penalises.
What are the five CCA-F domains and how are they weighted?
The exam blueprint publishes five domains with explicit percentage weights. Those weights should drive your study calendar directly: spend roughly the same proportion of your prep time on each domain as the exam spends questions on it.
| Domain | Topic | Weight |
|---|---|---|
| 1 | Agentic Architecture & Orchestration | 27% |
| 2 | Tool Design & MCP Integration | 18% |
| 3 | Claude Code Configuration & Workflows | 20% |
| 4 | Prompt Engineering & Structured Output | 20% |
| 5 | Context Management & Reliability | 15% |
Domain 1 is the single heaviest domain at 27%, which means roughly one in four questions will test agentic loop design, multi-agent coordination, and orchestration patterns. Domains 3 and 4 are tied at 20% each, making them collectively the largest block. Domain 5 is the lightest at 15%, but its reliability scenarios carry outsized consequence in production contexts, so do not skip it.
Our Claude Certification Concepts library maps all 174 atomic concepts to these five domains and their 30 task statements, which is a useful cross-reference as you work through each section below.
How does Domain 1 (Agentic Architecture) show up on the exam?
Domain 1 accounts for 27% of the exam and is the area where most candidates lose points. The questions are almost always scenario-based: you are given a broken or underspecified agentic loop and asked to diagnose the failure or choose the correct fix.
The most common failure mode the exam tests is checking for task completion by inspecting model-generated text rather than using structured signals. A loop that reads if "task complete" in response_text is fragile; a loop that inspects stop_reason and parses a typed result object is robust. The exam rewards the latter consistently.
# Fragile: text-based termination checkif "task complete" in response.content[0].text.lower():break# Robust: structured stop_reason inspectionif response.stop_reason == "end_turn" and result.status == "complete":break
Key concepts to master for this domain include Agentic Loop Anti-Patterns, Hub-and-Spoke Architecture, and Parallel Subagent Spawning. The exam also tests when a coordinator should select subagents dynamically versus following a pre-configured route, a distinction covered in Model-Driven vs Pre-Configured Decision Making.
Multi-agent questions frequently involve a coordinator that must pass structured context to subagents without losing attribution. Structured Context Passing and Diagnosing Attribution Loss in Synthesis are both testable task statements in this domain.
The exam consistently rewards deterministic solutions over probabilistic ones when stakes are high, proportionate fixes, and root-cause tracing.
How does Domain 2 (Tool Design & MCP Integration) show up on the exam?
At 18%, Domain 2 is the third-largest domain. Its questions cluster around three themes: writing tool descriptions that route correctly, handling structured errors from MCP servers, and deciding how many tools to assign per agent.
Tool description quality is the lever the exam tests most. A vague description causes the model to misroute calls; a precise, action-oriented description with explicit scope constraints routes correctly. The fix is almost always low-effort and high-leverage: rewrite the description rather than restructure the architecture.
{"name": "search_customer_orders","description": "Search orders for a specific customer by customer_id. Use ONLY for order lookup. Do NOT use for product catalogue queries or inventory checks.","input_schema": {"type": "object","properties": {"customer_id": { "type": "string" },"date_range_days": { "type": "integer", "default": 30 }},"required": ["customer_id"]}}
The MCP isError Flag Pattern is a specific testable concept: MCP servers signal tool-level errors by setting isError: true in the result rather than throwing an exception. Candidates who conflate these two error channels consistently choose wrong answers on error-propagation questions.
Tool overload is another recurring scenario. When an agent has access to too many tools, selection quality degrades. The exam tests whether you can identify this as the root cause and apply Tool Splitting for Specificity or constrained tool_choice configuration as the fix.
How does Domain 3 (Claude Code Configuration) show up on the exam?
Domain 3 carries 20% of the exam and focuses on the three-level CLAUDE.md configuration hierarchy, custom skills, plan mode, and CI/CD integration patterns. Many candidates underestimate this domain because it feels operational rather than architectural. The exam disagrees.
The three configuration levels are: user-level (global defaults), project-level (CLAUDE.md at the repository root), and directory-level (CLAUDE.md files in subdirectories). Each level can override the one above it for path-specific rules. Questions test whether you know which level to modify for a given scope requirement and what the version-control implications are of each choice.
# Project-level CLAUDE.md at repo root/project-root/CLAUDE.md# Directory-level override for a specific service/project-root/services/payments/CLAUDE.md# User-level config (not version-controlled)~/.claude/CLAUDE.md
Plan mode questions ask when to require explicit user approval before execution. The exam pattern is: high-irreversibility actions (file deletion, deployment, schema migration) warrant plan mode; low-risk read operations do not. The exam penalises both over-use (slowing safe workflows) and under-use (executing destructive actions without review).
CI/CD questions typically involve the -p flag for non-interactive execution and structured JSON output piped to downstream steps. Expect at least one scenario where a candidate must choose between interactive and non-interactive modes for a given pipeline stage.
How does Domain 4 (Prompt Engineering & Structured Output) show up on the exam?
Domain 4 is tied with Domain 3 at 20% and covers explicit criteria design, few-shot prompting, JSON schema construction, and validation-retry loops. The exam is particularly focused on when each technique is necessary rather than merely helpful.
Few-shot examples are the highest-leverage technique for ambiguous or edge-case inputs. The exam tests whether you can identify scenarios where zero-shot instructions are insufficient and a small number of well-chosen examples would resolve the ambiguity. Critically, the examples must be representative of the failure cases, not just the happy path.
System: Classify customer sentiment as POSITIVE, NEGATIVE, or NEUTRAL.Return JSON: {"sentiment": "...", "confidence": 0.0-1.0}Examples:User: "The product arrived on time."Assistant: {"sentiment": "POSITIVE", "confidence": 0.92}User: "It works, I guess."Assistant: {"sentiment": "NEUTRAL", "confidence": 0.71}User: "Third time contacting support for the same issue."Assistant: {"sentiment": "NEGATIVE", "confidence": 0.88}
Validation-retry loops are a structured output reliability pattern the exam tests directly. When a model returns malformed JSON or a value outside the allowed enum, the correct response is to feed the error back with the original prompt and retry, not to silently accept the output or escalate immediately.
Multi-pass review architecture appears in Domain 4 as well: a single review pass has known limitations (self-review bias, attention dilution in long contexts), and the exam tests whether candidates know when to deploy independent review instances or sequential passes with different criteria.
Our Prompt Engineering & Structured Output concept section covers the full task statement list for this domain.
How does Domain 5 (Context Management & Reliability) show up on the exam?
Domain 5 is the lightest domain at 15%, but its questions are among the most practically consequential. They test stale context detection, session management decisions, summary injection, and structured handoff to human agents.
The stale context problem arises in long-running sessions: information injected early in a conversation degrades in influence as the context window fills. The exam tests whether candidates can identify this as the root cause of reliability failures and apply the correct fix (summary injection into a fresh session rather than continuing to extend the existing one).
# Inject a structured summary when starting a fresh sessionsystem_prompt = f"""You are continuing a long-running analysis task.## Confirmed findings so far{json.dumps(prior_findings, indent=2)}## Remaining scope{remaining_scope}Continue from this state. Do not re-investigate confirmed findings."""
Session management questions ask candidates to choose between resuming an existing session, forking it for divergent exploration, or starting fresh with a summary. The decision rule is: resume when context is still valid and relevant; fork when you need to explore an alternative without losing the main thread; start fresh when the context has degraded or the task scope has shifted significantly.
Structured handoff to human agents is a reliability pattern that appears when the model encounters a situation outside its authorised scope. The exam tests whether the handoff includes sufficient structured context for the human to act without re-reading the entire conversation history.
When in doubt, prefer the minimal footprint: request only necessary permissions, avoid storing sensitive information beyond immediate needs, prefer reversible over irreversible actions.
What scenario types appear most frequently across all domains?
The exam uses six recurring scenario archetypes. Recognising the archetype quickly lets you apply the right diagnostic framework rather than reasoning from scratch.
| Scenario Type | Primary Domain(s) | Key Diagnostic |
|---|---|---|
| Support agent with escalation | 1, 5 | Structured handoff vs. silent failure |
| Code generation and review | 3, 4 | Multi-pass review, plan mode gates |
| Multi-agent research pipeline | 1, 2 | Attribution preservation, coordinator routing |
| CI/CD automation | 3 | Non-interactive mode, structured output |
| Structured data extraction | 4 | Schema design, validation-retry loop |
| MCP tool integration | 2 | Error signalling, description quality |
Across all six archetypes, the exam applies a consistent scoring philosophy: deterministic, programmatic enforcement beats prompt-only rules when stakes are high; proportionate fixes beat architectural overhauls for isolated failures; root-cause tracing beats symptom suppression.
How should you allocate study time across the five domains?
A straightforward approach is to mirror the domain weights directly. For a 40-hour study plan, that maps as follows:
| Domain | Weight | Hours |
|---|---|---|
| Agentic Architecture & Orchestration | 27% | 10.8 |
| Claude Code Configuration & Workflows | 20% | 8.0 |
| Prompt Engineering & Structured Output | 20% | 8.0 |
| Tool Design & MCP Integration | 18% | 7.2 |
| Context Management & Reliability | 15% | 6.0 |
Within each domain, prioritise scenario-based practice over passive reading. The exam's 60 questions are all scenario-based with one correct answer and three plausible distractors. The distractors are designed to be attractive to candidates who know the concept but have not applied it to a realistic situation.
AI Skill Certs (independent of Anthropic) offers 60-question practice exams scored on the same 100-to-1000 scale with 720 as the passing bar. The adaptive engine uses Bayesian Knowledge Tracing with a 0.90 mastery threshold, which means it continues surfacing a concept until you demonstrate reliable recall, not just a single correct answer. Archie, the platform's Socratic tutor, guides you through the reasoning behind each distractor rather than simply confirming the correct choice.
The Agentic Architecture & Orchestration and Tool Design & MCP Integration concept sections are good starting points given their combined 45% share of the exam.
Frequently asked questions
How much does the Anthropic CCA-F certification exam cost?
What is the passing score for the Claude Certified Architect Foundations exam?
How many questions are on the CCA-F exam and what format are they?
Which domain is the hardest on the Claude Certified Architect exam?
Is AI Skill Certs affiliated with or approved by Anthropic?
When was the Claude Certified Architect Foundations exam launched?
People also ask
What is the Anthropic certification exam?
How hard is the Anthropic Claude certification?
How many people have passed the Anthropic certification?
What topics does the Anthropic Claude certification cover?
Are there more Anthropic certifications planned after CCA-F?
About the author
AI Architect & Certification Lead
Solomon Udoh is an AI Architect who designs and ships production agent systems on the Claude API and Claude Code. He built AI Skill Certs' adaptive engine and authored its 174-concept knowledge graph, mapping every Claude Certified Architect - Foundations objective to hands-on, exam-aligned practice.
- Designs production multi-agent systems on the Claude API and Agent SDK
- Author of the AI Skill Certs knowledge graph (174 mapped exam concepts)
- Builds with MCP, Claude Code, structured outputs, and agentic loops daily
- Reviews every concept page against the official Anthropic exam guide
You might also like
Ready to put it into practice?
Study every exam concept with an adaptive tutor.