- In short
- Explicit categorical criteria are prompt instructions that name the exact categories to act on and the exact categories to ignore, instead of vague confidence language such as be conservative or only report high-confidence findings. They give the model an unambiguous decision boundary, which is what actually raises precision.
What explicit categorical criteria are
Explicit categorical criteria are prompt instructions that swap a subjective confidence adjective for a concrete, named list of what to report and what to leave alone. Instead of asking the model to use judgement about how sure it feels, you hand it a boundary it can check each finding against. The Claude Certified Architect exam treats this as the first lever of Task 4.1, because almost every precision problem an architect meets in production starts with a prompt that described a vibe rather than a rule.
The distinction sounds small and is anything but. A prompt that says "only surface important problems" asks Claude to privately define important, apply that definition, and stay consistent across thousands of inputs. A prompt built on explicit categorical criteria removes all three of those burdens at once: the categories are stated, the application is mechanical, and consistency follows because the rule does not drift between calls.
- Explicit categorical criteria
- Prompt instructions that enumerate the named categories to act on and the named categories to ignore, replacing confidence language like be conservative with a concrete decision boundary the model can apply the same way every time.
Why vague confidence language fails
Telling a model to "be conservative" feels like control, but it transfers the hardest decision back to the model and hides it from you. Conservative against what threshold? Conservative for a security audit, where a missed bug ships a vulnerability, or conservative for a style linter, where noise is the only real cost? The word cannot answer those questions, so the model picks an interpretation, and a slightly different interpretation next time.
Newer Claude models follow instructions more faithfully, which paradoxically makes vague phrasing more dangerous, not less. If you write "only report high-confidence findings," the model may investigate thoroughly and then withhold genuine bugs it judged to sit below an invented bar. You wanted fewer false alarms and instead bought silent false negatives. The instruction did something; it just did not do the thing you could predict or measure.
There is also a measurement problem. A confidence adjective is not gradeable. You cannot write a test that checks whether the model was appropriately conservative, because there is no ground-truth definition of the word. Categories, by contrast, are testable: you can label whether a given finding belongs to a reportable category and check the model against that label. Task 4.1 rewards architects who reach for the version of an instruction that an evaluation can actually score.
Why naming categories works mechanically
Anthropic's own prompt-engineering guidance frames the underlying principle plainly: Claude responds well to clear, explicit instructions, and you should treat the model as a brilliant but brand-new employee who lacks context on your norms and workflows. A new hire told to "use good judgement about what to flag" will surface a different set of things than you would, not from incompetence but from missing context. A new hire handed a written list of what the team reports and what it leaves alone matches your intent on day one. Named categories are that written list.
The same guidance offers a test you can run on any review prompt: show it to a colleague who has minimal context on the task and ask them to follow it, because if they would be confused, Claude will be too. A confidence adjective like "be conservative" fails that test, since the colleague is left guessing where your bar sits. A categorical block passes it, because a person reading "report logic bugs and security vulnerabilities; treat style and naming as out of scope" knows exactly what to do without inventing a private definition first.
There is a model-behaviour reason this matters more now than it once did. Recent Claude models are trained for precise instruction following, so they take your words literally and act on them. That faithfulness rewards instructions with a concrete referent and punishes ones without: a named category gives the model a target it can map each candidate finding onto, much as a classifier maps an input to a label. A confidence adjective gives it nothing to map onto, so it manufactures a threshold, and that invented threshold is what drifts between calls.
How a categorical criteria block is built
A criteria block has two halves, and the second half matters as much as the first. You state the categories to report, and you state the categories to skip. Listing only what to report leaves the long tail of borderline observations undefined, and the model will fill that gap with noise. Naming the skip categories closes the gap explicitly.
- Report: behaviour bugs where the code contradicts its stated intent, security vulnerabilities, broken logic, and incorrect error handling.
- Skip: pure style and formatting, naming preferences, local conventions the team already accepts, and anything a linter or formatter already enforces.
- Anchor the boundary: for the cases that sit between the two lists, give a one-line rule, such as flag a naming issue only when the wrong name causes a real correctness or security problem.
Where the boundary gets hard
Two failure modes appear once a team starts writing categories. The first is overlap: a single finding can match both a report category and a skip category, such as a misleadingly named variable that also conceals a real defect. The boundary rule resolves this by ranking intent over surface, so the finding is reported for the bug it hides, not suppressed for the naming it happens to involve. The second is over-narrowing. Shrink the report list far enough and you stop emitting noise but also stop catching genuine issues, trading false positives for silent false negatives. That tension between catching everything and staying quiet is a knowledge point of its own, yet the categorical block is where an architect first feels it, because every category you decline to list is a class of finding you have chosen not to surface. Anthropic's managed Code Review exposes exactly this control through a REVIEW.md file, which lets a team name the paths and finding categories Claude should post nothing on, and set a higher bar for areas that warrant some scrutiny but not full attention.
Pair each category with a definition, not just a name
A named category is a strong start, but Anthropic's content-moderation guidance goes one step further: it pairs every category with a short written definition, not just a label. The pattern supplies a dictionary where each key is a category and each value spells out what that category covers, and both the names and the definitions go into the prompt. The model then maps each input against a stated boundary rather than against whatever the bare word happens to connote to it. Naming a category security vulnerability still leaves room for interpretation; adding code that exposes data, allows injection, or bypasses authentication closes most of that gap, and closing it is what drives the false-positive rate down.
The guidance shapes the output side too. A moderation prompt built this way returns a structured verdict, such as a boolean for whether any category was violated, the list of categories that matched, and an explanation included only when there is a violation. That shape stops the model from narrating on clean inputs, which is its own kind of noise, and it hands downstream automation a field it can route on. Anthropic's structured-outputs feature can enforce the shape directly, constraining the response to a JSON schema so the verdict and the categories always match the contract you defined. The lesson generalises to any review system: write the categories in words the prompt actually contains, and ask for an output whose structure mirrors those categories, so both the decision and its justification stay anchored to the criteria you set.
What this looks like in a real review prompt
Worked example
A team wires Claude into pull-request review and complains it comments on everything, drowning real bugs in nitpicks.
The original prompt read: "Review this diff and report anything that looks important. Be conservative and avoid noise." Developers got long comment threads arguing about variable names, import order, and personal style, while a genuine off-by-one error scrolled past unread. The word conservative had quietly come to mean thorough to the model.
The fix replaced the adjective with categories. The new prompt told Claude to report only four things: logic bugs where the code does not do what its surrounding comments and tests say it should, security vulnerabilities, broken or missing error handling, and changes that break a documented public contract. It then named what to skip outright: formatting, naming, import ordering, and any rule the project linter already enforces. Finally it added one boundary rule for the middle ground, so a naming choice was reportable only when the misleading name could cause a real defect.
The behaviour changed immediately and, more importantly, predictably. Comment volume dropped by roughly two-thirds, and the comments that remained were defensible because each one mapped to a stated category. When a developer disagreed with a finding, the argument was now concrete: either the finding fit a report category or it did not, and the prompt could be adjusted by editing the list rather than by re-tuning a feeling. That auditability is the quiet superpower of categorical criteria.
It is worth naming what the rewrite did not require: no new model, no temperature change, no extra tooling. The whole gain came from converting a feeling into a list that the model and the team could both read. That is among the cheapest high-leverage edits in Domain 4, which is one reason the task statement opens with it rather than with a heavier technique.
Why precision is what actually moves
Precision is the share of the model's findings that are genuinely worth a human's attention. Vague confidence language tries to raise precision by asking the model to hold back, but holding back is indiscriminate: it suppresses real bugs alongside noise. Explicit categories raise precision differently, by shrinking the definition of what counts as a finding in the first place. You are not asking the model to be quieter; you are telling it that a whole class of observations is simply not in scope.
That reframing is why this knowledge point sits upstream of so much of Domain 4. Once an architect can express a requirement as named categories, every downstream technique gets sharper: severity becomes a property of a category, false-positive problems can be traced to a specific category, and few-shot examples have a category to demonstrate. Anthropic's own managed Code Review takes the same stance by default, focusing on correctness bugs that would break production rather than on formatting preferences, and letting teams widen or narrow that scope explicitly.
The same lever beyond code review
Code review is the most common home for this skill, but the lever is general, and the exam draws its scenarios from other places too. Picture a support team using Claude to sort inbound tickets into billing, bug report, feature request and spam, with one instruction: route each ticket to the right queue and escalate anything urgent. Urgent against what definition? The model escalates inconsistently, and the on-call queue fills with tickets that never needed a human. Rewriting the instruction as categories fixes it the same way it fixes a review bot: define precisely what counts as urgent, such as a reported outage, a payment failure or a security report, and state plainly that pricing questions, how-to requests and feature ideas are never urgent. The escalation rate steadies because the model now classifies against a list instead of an adjective.
The pattern transfers because the problem was never about code. Any task where the model decides whether something belongs in a bucket improves when you name the buckets and their edges. A document-extraction prompt that asks for the important dates returns a different set on every run; one that asks for the contract start date, renewal date and termination date, and tells the model to ignore dates that appear inside examples, returns the same set every time. Recognising that explicit categorical criteria are a classification tool rather than a code-review trick is what lets an architect carry the skill into whatever scenario the exam puts in front of them.
Common misreadings to avoid
Misconception
Telling Claude to 'be conservative' or 'only report high-confidence findings' is a strong, safe instruction.
What's actually true
Misconception
Listing what to report is enough; the model will infer what to ignore.
What's actually true
Misconception
Anthropic says to tell Claude what to do rather than what not to do, so a skip list contradicts the guidance and should be dropped.
What's actually true
How it shows up on the exam
Domain 4 carries twenty percent of the exam, and Task 4.1 leads it. Questions in this area rarely ask you to define a term. They describe a review system that floods users with low-value findings, or a prompt full of words like careful, important, or conservative, and ask what change would most improve precision. The credited answer almost always replaces the subjective language with explicit categorical criteria, because that is the move that turns an unmeasurable instruction into a concrete, testable boundary. Recognise the vague-adjective smell in the stem and you will recognise the right answer.
Task 4.1 pitches this knowledge point at Bloom's understand level, so the exam wants more than the definition and less than a full system design. You should be able to read a prompt, explain why its confidence adjective produces inconsistent behaviour, and identify the categorical rewrite that fixes it. Scenarios drawn from review bots, ticket triage and extraction pipelines all reduce to the same recognition: a vague adjective is the smell, and named report and skip categories are the cure. Hold that mapping and the harder apply- and analyse-level questions later in the task statement have somewhere solid to stand.
A CI bot powered by Claude reviews every pull request with the instruction: 'Be conservative and only report high-confidence issues.' Developers complain it both misses real bugs and still posts nitpicks about naming. Which prompt change most directly improves precision?
A team uses Claude to classify support tickets and tells it to 'escalate anything urgent'. The on-call queue keeps filling with non-urgent tickets. Which change best reflects the explicit categorical criteria principle?
People also ask
Why are explicit criteria better than telling Claude to be conservative?
How do you reduce false positives in an AI code review prompt?
What does explicit categorical criteria mean in prompt engineering?
Watch and learn
Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.
System prompts
Why watch: Placing clear, specific instructions in the system prompt is where explicit categorical criteria beat vague guidance.
More videos for this concept
References & primary sources
Master this concept with Archie
Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.