Claude Certification: Exam Format, Domains, and How to Prep
Everything you need to know about the Claude certification (CCA-F): 60 scenario MCQs, a 720/1000 pass mark, five weighted domains, and a structured study plan.
By Solomon Udoh · AI Architect & Certification Lead

The Claude certification most candidates are searching for is the Claude Certified Architect, Foundations exam (CCA-F), launched by Anthropic on 12 March 2026. It costs $99 per attempt, runs to 60 scenario-based multiple-choice questions, and requires a scaled score of 720 out of 1000 to pass. This guide walks through the format, the five weighted domains, the anti-patterns that trip up prepared candidates, and a practical study sequence.
What is the CCA-F exam format?
The exam delivers 60 scenario-based multiple-choice questions, each with one correct answer and three plausible distractors. You sit it either online-proctored or at a test centre. Anthropic scores responses on a 100-to-1000 scale; the passing threshold is 720. Because Anthropic does not publish the raw-to-scaled conversion formula, you cannot reliably reverse-engineer an exact question count as the pass mark. On a linear reading, 720/1000 corresponds to roughly 41 to 42 correct answers, but treat that as orientation, not a target.
The scenario style is deliberate. Every question describes a realistic production situation and asks you to choose the most appropriate architectural or engineering decision. Distractors are written to look plausible; they typically represent real techniques applied in the wrong context, which is why rote memorisation of definitions fails here.
Anthropic does not publish the raw-to-scaled conversion, so never state an exact question count as the pass mark.
How are the five domains weighted?
The exam blueprint divides 60 questions across five domains. Understanding the weighting tells you where to invest study time.
| Domain | Topic | Weight |
|---|---|---|
| 1 | Agentic Architecture & Orchestration | 27% |
| 2 | Tool Design & MCP Integration | 18% |
| 3 | Claude Code Configuration & Workflows | 20% |
| 4 | Prompt Engineering & Structured Output | 20% |
| 5 | Context Management & Reliability | 15% |
Domain 1 alone accounts for more than a quarter of the exam. Domains 3 and 4 are tied at 20% each. Together, those three domains represent 67% of the total score, so a candidate who masters them and merely passes the remaining two is in a strong position.
What does Domain 1 (Agentic Architecture) actually test?
Domain 1 (27%) is the heaviest domain and the one where candidates most often lose marks to anti-patterns. The core topics are:
- Agentic loop mechanics -- how the Messages API request-response cycle drives an agent forward, how
stop_reasonsignals determine whether to continue or terminate, and how tool results are appended to the conversation. See our concept on Agentic Loop Anti-Patterns for the failure modes that appear as distractors. - Multi-agent coordination -- hub-and-spoke architecture, coordinator responsibilities, dynamic subagent selection, and how to pass structured context between agents without losing attribution.
- Task decomposition -- fixed sequential pipelines versus dynamic adaptive decomposition, the attention dilution problem in long contexts, and how to choose the right strategy per task type.
- Session state management -- when to resume a session, when to fork it, and when to start fresh. The stale context problem is a recurring exam scenario.
- Hooks -- PostToolUse hooks for data normalisation, tool-call interception, and the decision framework for choosing hooks over prompt-based enforcement.
The exam consistently rewards deterministic solutions over probabilistic ones when stakes are high. If a scenario involves irreversible actions or compliance requirements, the correct answer almost always involves a programmatic guard, not a prompt instruction.
What does Domain 2 (Tool Design & MCP Integration) test?
Domain 2 (18%) focuses on how Claude selects and uses tools, and how MCP servers are configured correctly in production.
Key areas:
- Tool descriptions as the selection mechanism. Claude routes to tools based on their descriptions, not their names. A vague description causes misrouting; the fix is almost always a description rewrite, not a system-prompt addition. Our concept on writing effective tool descriptions covers the exact pattern the exam tests.
- Structured error responses. The MCP
isErrorflag pattern, the four error categories, and the difference between an access failure and a valid empty result are all testable. Returning an empty array when a query genuinely finds nothing is correct; returning an error is not. - Tool distribution across agents. In multi-agent systems, tools should be scoped to the agent that needs them. The tool overload problem -- giving one agent too many tools -- degrades routing accuracy.
- MCP scoping hierarchy and environment variable expansion. Configuration mistakes at the wrong scope level are a common distractor.
{"mcpServers": {"inventory": {"command": "npx","args": ["-y", "@acme/inventory-mcp"],"env": {"INVENTORY_API_KEY": "${INVENTORY_API_KEY}"}}}}
The snippet above shows correct environment variable expansion in an MCP config block. A distractor version might hardcode the key or omit the env field entirely.
What does Domain 3 (Claude Code Configuration) test?
Domain 3 (20%) covers the three-level configuration hierarchy (project, user, enterprise), CLAUDE.md file placement and import syntax, custom skills and commands, and CI/CD integration patterns.
Exam scenarios in this domain tend to ask:
- Which configuration level should a rule live at, given its intended scope?
- When should you use plan mode versus direct execution?
- How do path-scoped rules reduce token overhead without sacrificing coverage?
- What are the version control implications of committing CLAUDE.md files?
A common distractor is placing a project-wide rule at the user level (or vice versa), which either over-restricts individual developers or fails to enforce the rule for the whole team.
What does Domain 4 (Prompt Engineering & Structured Output) test?
Domain 4 (20%) is where candidates who have only used Claude conversationally tend to underperform. The exam tests engineering-grade prompt design, not chat prompting.
The highest-leverage topics, per our Prompt Engineering & Structured Output concept library:
| Topic | Why it matters on the exam |
|---|---|
| Explicit categorical criteria | Vague rubrics produce inconsistent scores; explicit criteria are testable |
| Few-shot examples | The highest-leverage technique for ambiguous edge cases |
| JSON schema design | Prevents hallucinated fields; schema constraints are testable |
| Validation-retry loops | When to retry with error feedback versus escalate |
| Multi-pass review | Independent review instances outperform self-review |
The exam rewards knowing when to apply each technique, not just that it exists. A scenario asking how to improve extraction quality from unstructured documents will have "add a JSON schema" and "add few-shot examples" as two separate options; the correct choice depends on whether the failure mode is structural or semantic.
# Validation-retry loop skeletonimport anthropic, jsonclient = anthropic.Anthropic()def extract_with_retry(text: str, schema: dict, max_retries: int = 2) -> dict:messages = [{"role": "user", "content": f"Extract per schema:\n\n{text}"}]for attempt in range(max_retries + 1):response = client.messages.create(model="claude-opus-4-5",max_tokens=1024,system=f"Return valid JSON matching this schema: {json.dumps(schema)}",messages=messages,)raw = response.content[0].texttry:return json.loads(raw)except json.JSONDecodeError as exc:if attempt == max_retries:raisemessages += [{"role": "assistant", "content": raw},{"role": "user", "content": f"Invalid JSON: {exc}. Correct and retry."},]
What does Domain 5 (Context Management & Reliability) test?
Domain 5 (15%) is the smallest domain but contains some of the subtlest questions. The core tension is between keeping enough context for coherent reasoning and avoiding the degradation that comes from an overloaded window.
Key concepts:
- Summarisation risks. Progressive summarisation can silently drop provenance. The exam tests whether you know when to summarise, when to inject a structured summary, and when to start a fresh session with a summary injection.
- Context degradation in extended sessions. The lost-in-the-middle effect means that facts placed in the middle of a long context window are retrieved less reliably than facts at the edges.
- Escalation decisions. Three valid triggers exist for escalating to a human: ambiguity that cannot be resolved from available context, a required action that exceeds the agent's authorisation scope, and detection of a situation the system was not designed to handle. Two unreliable triggers -- low confidence scores and long elapsed time -- appear as distractors.
When in doubt, don't. It's better to err on the side of doing less and confirming with users when uncertain about intended scope in order to preserve human oversight and avoid making hard-to-fix mistakes.
What anti-patterns appear as distractors?
This is the question that separates candidates who have studied the concepts from those who have only read the documentation. Anthropic writes distractors to represent real techniques misapplied. The most common anti-pattern families:
| Anti-pattern | Why it looks correct | Why it is wrong |
|---|---|---|
| Prompt-based enforcement for compliance rules | Prompts are flexible and fast to write | Prompts can be overridden; compliance needs programmatic guards |
| Retrying indefinitely on tool errors | Persistence seems robust | Infinite retries mask root causes and can exhaust rate limits |
| Giving all tools to one coordinator agent | Centralisation feels clean | Tool overload degrades routing accuracy |
| Summarising aggressively to save tokens | Token efficiency is a real goal | Aggressive summarisation loses provenance and attribution |
Using stop_reason: end_turn as a success signal | The loop did stop | end_turn without a result check can mask silent failures |
Recognising these patterns under time pressure is a skill. Our Agentic Loop Anti-Patterns concept page works through each one with scenario examples.
How should you structure your study plan?
Given the domain weights, a rational allocation of study time across a four-week preparation period looks like this:
| Week | Focus | Domains |
|---|---|---|
| 1 | Agentic loop mechanics, multi-agent coordination, decomposition | D1 (27%) |
| 2 | Claude Code configuration hierarchy, CI/CD patterns; Prompt engineering, structured output | D3 (20%), D4 (20%) |
| 3 | Tool design, MCP integration, error handling | D2 (18%) |
| 4 | Context management, escalation logic; full practice exams | D5 (15%), all |
Our concept library at /concepts maps 174 atomic concepts to all five domains and 30 task statements. Each concept is linked to the task statements it covers, so you can verify coverage rather than guess. The adaptive engine uses Bayesian Knowledge Tracing with a 0.90 mastery threshold, which means it will keep surfacing a concept until your response pattern demonstrates reliable recall, not just one lucky correct answer.
Practice exams on AI Skill Certs are 60 questions, scored on the same 100-to-1000 scale with 720 as the passing bar, so you get a calibrated signal before the real attempt. AI Skill Certs is an independent prep platform; we are not affiliated with or endorsed by Anthropic.
Where does the CCA-F sit in Anthropic's certification roadmap?
As of 3 June 2026, more than 10,000 individuals hold the CCA-F certification, and over 40,000 firms have applied to the Claude Partner Network, the $100M programme within which the certification sits. Anthropic has announced further architect, developer, and seller certifications planned for later in 2026, per Anthropic's Partner Network announcements. The CCA-F is explicitly positioned as the foundations tier, meaning the concepts it tests will underpin the harder specialist exams when they arrive.
Investing in Domain 1 depth now -- particularly subagent context isolation, coordinator responsibilities, and structured context passing -- is likely to compound into the architect-level exams as well.
Frequently asked questions
How much does the Claude certification exam cost?
How many questions do you need to get right to pass the CCA-F?
Is the Claude certification exam available online or only at a test centre?
Which domain has the most questions on the CCA-F exam?
Are there more Claude certifications coming after the CCA-F?
What is the best way to study for the scenario-based questions on the CCA-F?
People also ask
What is the passing score for the Claude certification exam?
How long does it take to prepare for the Claude certification?
What topics are covered in the Claude Certified Architect Foundations exam?
Is the Claude certification worth it for developers?
What is the difference between the CCA-F and other Anthropic certifications?
About the author
AI Architect & Certification Lead
Solomon Udoh is an AI Architect who designs and ships production agent systems on the Claude API and Claude Code. He built AI Skill Certs' adaptive engine and authored its 174-concept knowledge graph, mapping every Claude Certified Architect - Foundations objective to hands-on, exam-aligned practice.
- Designs production multi-agent systems on the Claude API and Agent SDK
- Author of the AI Skill Certs knowledge graph (174 mapped exam concepts)
- Builds with MCP, Claude Code, structured outputs, and agentic loops daily
- Reviews every concept page against the official Anthropic exam guide
You might also like
Ready to put it into practice?
Study every exam concept with an adaptive tutor.