Exam guide·9 min read·15 June 2026

Claude Certification: Exam Format, Domains, and How to Prep

Everything you need to know about the Claude certification (CCA-F): 60 scenario MCQs, a 720/1000 pass mark, five weighted domains, and a structured study plan.

By Solomon Udoh · AI Architect & Certification Lead

Claude Certification: Exam Format, Domains, and How to Prep

The Claude certification most candidates are searching for is the Claude Certified Architect, Foundations exam (CCA-F), launched by Anthropic on 12 March 2026. It costs $99 per attempt, runs to 60 scenario-based multiple-choice questions, and requires a scaled score of 720 out of 1000 to pass. This guide walks through the format, the five weighted domains, the anti-patterns that trip up prepared candidates, and a practical study sequence.

What is the CCA-F exam format?

The exam delivers 60 scenario-based multiple-choice questions, each with one correct answer and three plausible distractors. You sit it either online-proctored or at a test centre. Anthropic scores responses on a 100-to-1000 scale; the passing threshold is 720. Because Anthropic does not publish the raw-to-scaled conversion formula, you cannot reliably reverse-engineer an exact question count as the pass mark. On a linear reading, 720/1000 corresponds to roughly 41 to 42 correct answers, but treat that as orientation, not a target.

The scenario style is deliberate. Every question describes a realistic production situation and asks you to choose the most appropriate architectural or engineering decision. Distractors are written to look plausible; they typically represent real techniques applied in the wrong context, which is why rote memorisation of definitions fails here.

Anthropic does not publish the raw-to-scaled conversion, so never state an exact question count as the pass mark.

Anthropic , CCA-F Exam Guide

How are the five domains weighted?

The exam blueprint divides 60 questions across five domains. Understanding the weighting tells you where to invest study time.

Domain	Topic	Weight
1	Agentic Architecture & Orchestration	27%
2	Tool Design & MCP Integration	18%
3	Claude Code Configuration & Workflows	20%
4	Prompt Engineering & Structured Output	20%
5	Context Management & Reliability	15%

Domain 1 alone accounts for more than a quarter of the exam. Domains 3 and 4 are tied at 20% each. Together, those three domains represent 67% of the total score, so a candidate who masters them and merely passes the remaining two is in a strong position.

What does Domain 1 (Agentic Architecture) actually test?

Domain 1 (27%) is the heaviest domain and the one where candidates most often lose marks to anti-patterns. The core topics are:

Agentic loop mechanics -- how the Messages API request-response cycle drives an agent forward, how stop_reason signals determine whether to continue or terminate, and how tool results are appended to the conversation. See our concept on Agentic Loop Anti-Patterns for the failure modes that appear as distractors.
Multi-agent coordination -- hub-and-spoke architecture, coordinator responsibilities, dynamic subagent selection, and how to pass structured context between agents without losing attribution.
Task decomposition -- fixed sequential pipelines versus dynamic adaptive decomposition, the attention dilution problem in long contexts, and how to choose the right strategy per task type.
Session state management -- when to resume a session, when to fork it, and when to start fresh. The stale context problem is a recurring exam scenario.
Hooks -- PostToolUse hooks for data normalisation, tool-call interception, and the decision framework for choosing hooks over prompt-based enforcement.

The exam consistently rewards deterministic solutions over probabilistic ones when stakes are high. If a scenario involves irreversible actions or compliance requirements, the correct answer almost always involves a programmatic guard, not a prompt instruction.

What does Domain 2 (Tool Design & MCP Integration) test?

Domain 2 (18%) focuses on how Claude selects and uses tools, and how MCP servers are configured correctly in production.

Key areas:

Tool descriptions as the selection mechanism. Claude routes to tools based on their descriptions, not their names. A vague description causes misrouting; the fix is almost always a description rewrite, not a system-prompt addition. Our concept on writing effective tool descriptions covers the exact pattern the exam tests.
Structured error responses. The MCP isError flag pattern, the four error categories, and the difference between an access failure and a valid empty result are all testable. Returning an empty array when a query genuinely finds nothing is correct; returning an error is not.
Tool distribution across agents. In multi-agent systems, tools should be scoped to the agent that needs them. The tool overload problem -- giving one agent too many tools -- degrades routing accuracy.
MCP scoping hierarchy and environment variable expansion. Configuration mistakes at the wrong scope level are a common distractor.

json

{
  "mcpServers": {
    "inventory": {
      "command": "npx",
      "args": ["-y", "@acme/inventory-mcp"],
      "env": {
        "INVENTORY_API_KEY": "${INVENTORY_API_KEY}"
      }
    }
  }
}

The snippet above shows correct environment variable expansion in an MCP config block. A distractor version might hardcode the key or omit the env field entirely.

What does Domain 3 (Claude Code Configuration) test?

Domain 3 (20%) covers the three-level configuration hierarchy (project, user, enterprise), CLAUDE.md file placement and import syntax, custom skills and commands, and CI/CD integration patterns.

Exam scenarios in this domain tend to ask:

Which configuration level should a rule live at, given its intended scope?
When should you use plan mode versus direct execution?
How do path-scoped rules reduce token overhead without sacrificing coverage?
What are the version control implications of committing CLAUDE.md files?

A common distractor is placing a project-wide rule at the user level (or vice versa), which either over-restricts individual developers or fails to enforce the rule for the whole team.

What does Domain 4 (Prompt Engineering & Structured Output) test?

Domain 4 (20%) is where candidates who have only used Claude conversationally tend to underperform. The exam tests engineering-grade prompt design, not chat prompting.

The highest-leverage topics, per our Prompt Engineering & Structured Output concept library:

Topic	Why it matters on the exam
Explicit categorical criteria	Vague rubrics produce inconsistent scores; explicit criteria are testable
Few-shot examples	The highest-leverage technique for ambiguous edge cases
JSON schema design	Prevents hallucinated fields; schema constraints are testable
Validation-retry loops	When to retry with error feedback versus escalate
Multi-pass review	Independent review instances outperform self-review

The exam rewards knowing when to apply each technique, not just that it exists. A scenario asking how to improve extraction quality from unstructured documents will have "add a JSON schema" and "add few-shot examples" as two separate options; the correct choice depends on whether the failure mode is structural or semantic.

python

# Validation-retry loop skeleton
import anthropic, json

client = anthropic.Anthropic()

def extract_with_retry(text: str, schema: dict, max_retries: int = 2) -> dict:
    messages = [{"role": "user", "content": f"Extract per schema:\n\n{text}"}]
    for attempt in range(max_retries + 1):
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=1024,
            system=f"Return valid JSON matching this schema: {json.dumps(schema)}",
            messages=messages,
        )
        raw = response.content[0].text
        try:
            return json.loads(raw)
        except json.JSONDecodeError as exc:
            if attempt == max_retries:
                raise
            messages += [
                {"role": "assistant", "content": raw},
                {"role": "user", "content": f"Invalid JSON: {exc}. Correct and retry."},
            ]

What does Domain 5 (Context Management & Reliability) test?

Domain 5 (15%) is the smallest domain but contains some of the subtlest questions. The core tension is between keeping enough context for coherent reasoning and avoiding the degradation that comes from an overloaded window.

Key concepts:

Summarisation risks. Progressive summarisation can silently drop provenance. The exam tests whether you know when to summarise, when to inject a structured summary, and when to start a fresh session with a summary injection.
Context degradation in extended sessions. The lost-in-the-middle effect means that facts placed in the middle of a long context window are retrieved less reliably than facts at the edges.
Escalation decisions. Three valid triggers exist for escalating to a human: ambiguity that cannot be resolved from available context, a required action that exceeds the agent's authorisation scope, and detection of a situation the system was not designed to handle. Two unreliable triggers -- low confidence scores and long elapsed time -- appear as distractors.

When in doubt, don't. It's better to err on the side of doing less and confirming with users when uncertain about intended scope in order to preserve human oversight and avoid making hard-to-fix mistakes.

Anthropic , Claude Documentation (Agentic Use)

What anti-patterns appear as distractors?

This is the question that separates candidates who have studied the concepts from those who have only read the documentation. Anthropic writes distractors to represent real techniques misapplied. The most common anti-pattern families:

Anti-pattern	Why it looks correct	Why it is wrong
Prompt-based enforcement for compliance rules	Prompts are flexible and fast to write	Prompts can be overridden; compliance needs programmatic guards
Retrying indefinitely on tool errors	Persistence seems robust	Infinite retries mask root causes and can exhaust rate limits
Giving all tools to one coordinator agent	Centralisation feels clean	Tool overload degrades routing accuracy
Summarising aggressively to save tokens	Token efficiency is a real goal	Aggressive summarisation loses provenance and attribution
Using `stop_reason: end_turn` as a success signal	The loop did stop	`end_turn` without a result check can mask silent failures

Recognising these patterns under time pressure is a skill. Our Agentic Loop Anti-Patterns concept page works through each one with scenario examples.

How should you structure your study plan?

Given the domain weights, a rational allocation of study time across a four-week preparation period looks like this:

Week	Focus	Domains
1	Agentic loop mechanics, multi-agent coordination, decomposition	D1 (27%)
2	Claude Code configuration hierarchy, CI/CD patterns; Prompt engineering, structured output	D3 (20%), D4 (20%)
3	Tool design, MCP integration, error handling	D2 (18%)
4	Context management, escalation logic; full practice exams	D5 (15%), all

Our concept library at /concepts maps 174 atomic concepts to all five domains and 30 task statements. Each concept is linked to the task statements it covers, so you can verify coverage rather than guess. The adaptive engine uses Bayesian Knowledge Tracing with a 0.90 mastery threshold, which means it will keep surfacing a concept until your response pattern demonstrates reliable recall, not just one lucky correct answer.

Practice exams on AI Skill Certs are 60 questions, scored on the same 100-to-1000 scale with 720 as the passing bar, so you get a calibrated signal before the real attempt. AI Skill Certs is an independent prep platform; we are not affiliated with or endorsed by Anthropic.

Where does the CCA-F sit in Anthropic's certification roadmap?

As of 3 June 2026, more than 10,000 individuals hold the CCA-F certification, and over 40,000 firms have applied to the Claude Partner Network, the $100M programme within which the certification sits. Anthropic has announced further architect, developer, and seller certifications planned for later in 2026, per Anthropic's Partner Network announcements. The CCA-F is explicitly positioned as the foundations tier, meaning the concepts it tests will underpin the harder specialist exams when they arrive.

Investing in Domain 1 depth now -- particularly subagent context isolation, coordinator responsibilities, and structured context passing -- is likely to compound into the architect-level exams as well.

Frequently asked questions

How much does the Claude certification exam cost?

The CCA-F exam costs $99 per attempt. Tiered Anthropic partners receive a discounted first attempt through the Claude Partner Network. There is no published bundle pricing for multiple attempts.

How many questions do you need to get right to pass the CCA-F?

The passing score is 720 on a 100-to-1000 scale. Anthropic does not publish the raw-to-scaled conversion formula, so there is no official correct-question count for passing. On a linear reading, 720/1000 suggests roughly 41 to 42 correct answers out of 60, but this is orientation only.

Is the Claude certification exam available online or only at a test centre?

Both options are available. The CCA-F is delivered either online-proctored, which you sit from your own machine under webcam supervision, or at a physical test centre. Anthropic does not restrict candidates to one mode.

Which domain has the most questions on the CCA-F exam?

Domain 1, Agentic Architecture and Orchestration, is weighted at 27%, making it the largest single domain. It covers agentic loops, multi-agent coordination, task decomposition, hooks, and session state management.

Are there more Claude certifications coming after the CCA-F?

Yes. Anthropic has announced further architect, developer, and seller certifications planned for later in 2026. The CCA-F is the foundations tier of what will become a broader certification programme within the Claude Partner Network.

What is the best way to study for the scenario-based questions on the CCA-F?

Scenario questions test applied judgement, not recall. The most effective preparation combines concept-level study mapped to the 30 task statements, deliberate practice with scenario questions scored on the real 100-to-1000 scale, and explicit review of anti-patterns that appear as plausible distractors.