- In short
- Category-specific prompt iteration is the practice of isolating one finding category that produces too many false positives, temporarily disabling it, refining just that category's instructions with concrete examples, and re-enabling it once its false-positive rate is acceptable. Iterating one category at a time keeps the rest of the reviewer stable while you fix the problem in isolation.
What category-specific prompt iteration is
Category-specific prompt iteration is a disciplined way to repair a misbehaving automated reviewer without destabilising the parts that already work. A review prompt usually covers several finding categories at once: security, logic, error handling, performance, style. When false positives spike, they almost never come evenly from every category; one or two are responsible for most of the noise. The whole idea of this prompt iteration technique is to isolate the guilty category and work on it alone, rather than rewriting everything and hoping the noise goes away.
The contrast that defines the skill is between broad, simultaneous editing and narrow, sequential editing. If you change the security instructions, the performance examples, and the style boundary all in one pass, the next run gives you a single blended result that you cannot attribute to any one change. Category-specific prompt iteration insists on the opposite: pick the worst category, freeze the rest, and iterate until that one category clears the bar. The narrowness is the method, not a limitation of it.
- Category-specific prompt iteration
- Isolating one finding category that produces excessive false positives, temporarily disabling it, refining only that category's instructions and examples, and re-enabling it once its false-positive rate is acceptable, while the other categories run unchanged.
Why isolating one category beats fixing everything at once
The strongest reason to iterate one category at a time is attribution. Prompt engineering is empirical: you change something, you measure, you keep what helped. That loop only works if each iteration has a single attributable cause. Touch five categories in one edit and you forfeit the signal, because an improvement in security findings can be masked by a regression in performance findings, and you will never untangle which edit did what. Anthropic's guidance on defining success criteria assumes exactly this kind of clean, measurable loop, where you iterate against a target and read the result honestly.
There is also a trust argument, and it is the one the exam cares about most. The categories that already perform well are an asset you should not put at risk. Rewriting the entire prompt to chase one noisy category gambles your working categories on a change that was never about them. By disabling only the offender and leaving the rest live, you keep delivering trusted findings throughout the repair, which is the difference between a reviewer that stays in service and one that gets switched off mid-fix.
The loop: disable, refine with examples, measure, re-enable
The mechanics are a short cycle. First, identify the category producing the false positives, usually obvious from a sample of recent comments. Second, disable it so it stops emitting noise immediately; this is a legitimate, reversible move that protects precision while you work. Third, refine that category in isolation, and the most effective refinement is concrete examples: a true positive it should flag and a false positive it should not. Anthropic's guidance on being clear and direct is the engine here, since a paired example teaches the boundary that prose alone cannot.
The fourth step is measurement against a small labelled sample so you know when to stop, and the fifth is re-enabling the category once its false-positive rate is acceptable. The loop is intentionally boring and repeatable, and that is its strength: each pass has one cause, one measurement, and one decision.
Choosing concrete examples over more rules
When a category misfires, the instinct is to bolt on another qualifying sentence: "do not flag X unless Y, except when Z." Stacked qualifiers tend to conflict and make the category harder to reason about, not easier. The more durable fix is a pair of examples that bracket the boundary, because the model generalises from the contrast between them far better than it parses a chain of exceptions. This is the same example-first principle that powers severity calibration with code examples, now aimed at the false-positive boundary of a single category.
Keeping the iteration example-driven also makes the measurement honest. A clear true-positive and false-positive pair doubles as two test cases, so the same examples you add to the prompt become part of the labelled sample you grade against. The category is fixed when it flags the true positive and stays silent on the false one across a handful of similar cases, and that simple, observable criterion is what lets you re-enable with confidence.
A worked example: one category poisoning the well
Worked example
A team's Claude reviewer covers five categories. Four are well trusted, but the performance category flags nearly every loop as a bottleneck, and the noise is starting to discredit the whole tool.
The team's first reflex was to rewrite the entire review prompt, tightening every category at once. They paused when they realised they would not be able to tell whether the rewrite helped performance findings or quietly degraded the security findings their engineers relied on. So they switched to category-specific prompt iteration instead.
They disabled the performance category outright. Immediately the reviewer went back to producing clean, trusted output from its other four categories, and engagement recovered overnight. Then they worked on performance in isolation, adding two examples: a genuine bottleneck it should flag, a nested loop over a tiny constant-size list it should ignore. They ran the category against a dozen recent diffs, watched its false-positive rate fall from roughly seventy percent to under ten, and only then re-enabled it.
The decisive choice was sequencing. By refusing to fix all five categories simultaneously, the team kept a working reviewer in service the entire time, got a clean signal on the one category that mattered, and avoided risking the four that were already fine. That sequencing discipline, isolate, disable, refine with examples, re-enable, is the whole of this knowledge point.
Common misreadings to avoid
This is an apply-level skill, so the exam checks whether you would run the loop correctly under pressure. The two misreadings below are the usual stumbles.
Misconception
When false positives spike, you should rewrite the whole review prompt in one pass to fix everything together.
What's actually true
Misconception
Disabling a noisy category is giving up; a good engineer fixes it while it stays live.
What's actually true
Where this sits in the knowledge graph
Category-specific prompt iteration depends on two upstream skills. It needs severity calibration with code examples, because you cannot tell whether a category improved if its findings are not graded consistently, and it is driven by the false positive trust problem, which is the very thing the loop exists to solve. The example-first refinement it relies on is a direct application of explicit categorical criteria.
It is the natural follow-on to the precision versus recall trade-off in review prompts: once you have decided that protecting precision is worth temporarily dropping recall in one category, this loop is how you actually repair that category and bring it back. Read together, the two knowledge points cover both the judgement and the mechanics of keeping a review system trustworthy.
Choosing which category to fix first
When several categories misbehave at once, the order you tackle them in matters. The right first target is usually the one generating the largest volume of false positives, because that is the category doing the most damage to trust per day it stays live. Sort a sample of recent comments by category, count how many were noise, and the worst offender announces itself. Fixing it first delivers the biggest immediate recovery in accuracy and buys you the goodwill to keep iterating on the rest.
Volume is not the only consideration, though. A category that fires rarely but is wrong in a spectacular, credibility-destroying way can deserve priority even if its raw count is low, because a single absurd comment can colour how developers see the entire tool. The triage judgement, biggest noise contributor first, with an eye for the uniquely embarrassing, is part of running this loop well rather than mechanically following a counter.
Knowing when a category is fixed enough to re-enable
A loop needs an exit condition, and guessing is not one. Before re-enabling a category, decide in advance what acceptable looks like, expressed as a false-positive rate on your sample: perhaps no more than one false positive in every ten findings for this category. Run the refined prompt against the sample, check the number against the bar, and only re-enable when it clears. Setting the bar before you measure stops you from rationalising a still-noisy category back into the live review because you have grown tired of iterating on it.
It is equally important not to over-tune. Once a category clears its bar, stop; squeezing out the last few false positives often costs real coverage, dragging you back into the very trade-off the rest of the system has already settled. The goal of the loop is a category that is trustworthy enough to rejoin its peers, not one polished to a perfection that sacrifices the issues it was built to catch.
Version each iteration so changes stay attributable
The discipline of fixing one category at a time only pays off if you can see, later, exactly what you changed and undo it cleanly. A small but high-leverage habit is to version every edit to a category's prompt the way you version code: keep the prior wording, record what changed and why, and tag the run that produced each false-positive measurement. When a refinement that looked good on the sample turns out worse on live diffs, a version history lets you roll straight back to the last trusted wording instead of reconstructing it from memory. Losing track of which prompt produced which result is the quiet way an otherwise sound iteration loop stops being measurable.
Versioning also guards against the loop's subtlest failure: overfitting the prompt to its own test sample. If you keep stacking qualifiers until the category scores perfectly on the dozen diffs you grade against, you may be encoding those specific cases rather than the general boundary, and the category will regress the moment it meets a diff the sample never contained. Anthropic's prompting guidance frames the antidote as iterating against a concrete bar and validating gains on held-out cases rather than on the examples you tuned with. In practice that means refreshing the sample with new diffs as you go, and treating a refinement as real only when it holds on cases it was never shown.
Why the loop protects the reviewer's reputation
The quiet payoff of working one category at a time is reputational. A review tool earns trust slowly and loses it fast, and a single noisy release can undo months of credibility. By disabling the offender and leaving the trusted categories running, you ensure developers never experience the tool as broken even while you are mid-repair. They keep getting clean, useful findings, the fix lands invisibly, and the category returns without fanfare. Protecting that continuity of trust is as much a part of the skill as the prompt edits themselves.
How it shows up on the exam
Within the continuous-integration review scenario, this knowledge point appears as a tooling-and-process question: a reviewer is mostly good but has one or two noisy categories, and you are asked for the soundest way to fix it. The answer that earns credit isolates the problem, disable the offending category, iterate on its prompt with examples, re-enable it when the false-positive rate drops, rather than rewriting the whole prompt, switching models, or simply tolerating the noise. The distractors are designed to tempt the candidate who treats the prompt as one indivisible blob. If you can explain why one-category-at-a-time gives a clean signal and protects the working categories, you have the answer.
A Claude code reviewer covers five categories. Four are trusted, but the performance category produces a flood of false positives that is discrediting the whole tool. What is the best approach?
People also ask
How do you reduce false positives in an LLM code reviewer?
Should you fix all review categories at once?
What is category-specific prompt iteration?
Watch and learn
Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.
Claude Certified Architect: Ep 14 | Prompt Engineering: Explicit Criteria & False Positives
Why watch: Directly teaches isolating high false-positive finding categories and iterating on category-specific prompts with concrete criteria, which is exactly the workflow this KP describes for the certification exam.
More videos for this concept
References & primary sources
Master this concept with Archie
Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.