- In short
- A Claude Code independent review instance is a separate session that reviews code without carrying the reasoning context of the session that generated it. Because a generating session is anchored to the decisions it already made, it is biased toward confirming them; a fresh, context-free reviewer evaluates the code on its merits and surfaces more of the subtle issues.
What a Claude Code independent review instance is
A Claude Code independent review instance is a reviewer that starts cold. It is given the code or the diff and asked to evaluate it, but it carries none of the conversation, intentions, or trade-offs that produced that code. This sounds like a small detail. It is in fact the whole mechanism, because the quality of a review depends far more on what the reviewer does not know than on what it does. A reviewer that already believes the design is sound, because it is the one that argued for it, will read the code looking for confirmation. A reviewer with no such history reads the code looking for problems.
The contrast that defines this knowledge point is between two ways of asking for a review. In the first, you generate a module and then, in the same session, ask "now review this for bugs". In the second, you open a separate instance, hand it only the resulting code, and ask the same question. The model is identical. The prompt is nearly identical. Yet the second consistently surfaces more, and more subtle, issues. Understanding why is the point of this concept.
- Independent review instance
- A review performed by a separate Claude Code session that has no access to the reasoning context of the session that generated the code, so it evaluates the code on its own merits rather than defending prior decisions.
The self-review limitation
When a session writes code, it accumulates a chain of reasoning: why this structure, why that edge case was skipped, why a certain shortcut is acceptable. That context does not vanish when you ask the same session to review. It colours the review. The session has already concluded that its choices were right, so when it re-reads its own work it tends to confirm rather than challenge. Anthropic names this self-review limitation directly: the same session that generated the code retains its reasoning context, which makes it less likely to question its own decisions.
This is not a quirk of language models so much as a property of any reviewer who is also the author. A human engineer reviewing their own pull request skims past the assumptions they have already internalised; the bug hides exactly where they are sure there is none. The independent instance breaks that loop by removing the shared context. It cannot defend a decision it never made, so it has to assess the decision on the evidence in front of it. That is why an independent review is structurally better positioned to catch the issues a self-review waves through.
Independence is about context, not capability
A frequent misreading is that you need a stronger or different model to get a good review. You do not. The independent instance can be the very same model that wrote the code. What changes is the context window, not the weights. Strip away the generation history and the same model becomes a more honest critic, because it no longer has a position to protect. This is liberating from a design standpoint: you do not need a second vendor, a bigger model, or any special review model. You need a clean session.
It also clarifies what "independent" must mean in practice. Resuming the generation session, or pasting the original design discussion into the review prompt, quietly reintroduces the bias you were trying to escape. True independence means the reviewer sees the artefact and the standards to judge it by, and nothing about how it came to be. The cleaner that separation, the more value the review adds.
How Anthropic productises the principle
The managed Claude Code review feature shows the principle taken to its conclusion. Rather than a single reviewer, it runs a fleet of specialised agents in parallel, each looking for a different class of issue, logic errors, security vulnerabilities, broken edge cases, subtle regressions, over the diff in the context of the full codebase. A separate verification step then checks each candidate finding against the actual code behaviour to filter out false positives before anything is posted. The findings are deduplicated, ranked by severity, and posted as inline comments on the exact lines where the issues live.
Two design choices echo this knowledge point. First, the review runs independently of whoever authored the change; it is not the author defending their work. Second, the verification pass exists precisely because an independent system is willing to question its own candidate findings, the inverse of the self-review bias. You can also reach the same outcome in your own CI: have one step generate or change code, then a distinct step, a fresh claude -p invocation or a separate GitHub Actions job, perform the review with no shared session.
Severity, verification, and the shape of an honest review
The managed reviewer does not just find issues; it grades and checks them, and both steps echo the independence principle. Every finding is tagged by severity. Important for a bug that should be fixed before merging, Nit for a minor issue worth fixing but not blocking, and Pre-existing for a bug that already lived in the codebase rather than one this change introduced. That last category is revealing. Reporting a Pre-existing bug only makes sense for a reviewer with no stake in the diff: an author defending their own change has little incentive to surface a bug they did not write, while an independent system reports it plainly because it has no narrative to protect and no authorship to flatter.
The verification step is the sharper echo. After the parallel agents propose candidate findings, a separate pass checks each candidate against the actual behaviour of the code to filter out false positives. Read that as a system willing to question its own first answers, the exact inverse of the self-review bias, where a session treats its first conclusion as settled. Independence at the generation boundary (the reviewer never saw how the code was written) and skepticism at the finding boundary (every candidate must survive a check) are two expressions of the same discipline: a conclusion is earned against the evidence in front of you, not inherited from reasoning you happen to remember.
A design consequence follows from how the result is reported. The managed review posts inline comments and populates a check run, but it tags findings by severity and neither approves nor blocks the pull request, the check completes with a neutral conclusion so existing workflows stay intact. Independence here means the reviewer informs the humans rather than gating them; the decision to merge stays with the team, and the reviewer's job is to make that decision better informed. Each finding even ships with a collapsible reasoning section, so a developer can inspect why the issue was flagged and how it was verified rather than taking the verdict on trust.
Steering the independent reviewer with REVIEW.md
Independence does not mean the reviewer judges to a generic standard you cannot influence. Anthropic exposes two repository files that shape what every review agent looks for, and knowing their precedence is an exam-relevant detail. A CLAUDE.md at the repository root supplies the shared project context the reviewer reads like any other Claude Code task: your conventions, your architecture, what the code is meant to do. A REVIEW.md, also at the repository root, is the review-specific override. Its contents are placed into the system prompt of every agent in the review pipeline and take precedence over the default review guidance, which makes it the highest-priority instruction the reviewer follows.
That precedence has a consequence teams often miss. For review behaviour specifically, REVIEW.md outranks CLAUDE.md. A newly introduced violation of a CLAUDE.md rule is surfaced as a nit-level finding rather than a blocking one, because the default review emphasis is production correctness, the bugs that would break in production, not formatting or missing coverage. A team that expects its CLAUDE.md conventions to be enforced strictly in review is surprised when those violations arrive as quiet nits. The lever for stronger enforcement is REVIEW.md, not CLAUDE.md.
Inside REVIEW.md you calibrate the independent reviewer to your repository: which classes of finding count as Important versus Nit, which paths or categories should produce no findings at all, the mandatory checks every PR must pass, and a verification bar such as requiring a file:line citation as evidence before a finding is posted. None of this compromises independence. The reviewer still never saw how the code was written, but the honest review it produces is now measured against your standards rather than a generic default.
Rolling your own independent reviewer
You do not need the managed service to apply the principle, and the exam expects you to see how to build it. The pattern is a separation of steps: one job or claude -p invocation generates or changes the code, and a distinct invocation, a fresh session, ideally a separate CI job, reviews the resulting diff with no shared conversation. Set the reviewer's standards with an appended system prompt rather than by pasting the original design discussion, because that discussion is exactly the context whose absence makes the review honest. Return the findings as structured output so a later step can act on them.
The architecture is deliberately plain, and that plainness is the lesson. Independence is achieved by what you withhold from the reviewer, not by what you add, so the cheapest correct design is simply to not share the generation session. There is no second model to license, no special review endpoint to call, and no elaborate orchestration to maintain. There is a clean session and a clear standard. An architect who reaches for a bigger model when an empty review appears has diagnosed the wrong layer; the fix lives in the context boundary, which costs nothing to draw.
How this knowledge point is tested
This is an understand-level concept, so the exam checks whether you grasp the reason, not just the rule. A typical Scenario 5 item describes an engineer who generates code and reviews it in the same session, then is surprised the review found almost nothing. The distractors offer mechanistic explanations that miss the point, the wrong model, a missing flag, an exhausted context window. The correct answer identifies the self-review limitation and prescribes an independent review instance. If you can articulate that independence comes from withholding the generation context rather than from a better model, you can answer any phrasing of this question, and you are ready for the incremental-review concept that builds directly on it.
Worked example
An engineer has Claude generate a payment module, then asks the same session to review it, and the review comes back nearly empty.
The workflow looks efficient: one session writes a new payment-capture module, and immediately afterwards the engineer types "review this module for bugs and edge cases". The review returns two cosmetic notes and declares the logic sound. A week later a rounding error in partial refunds reaches production.
Trace what the session was actually doing. While generating, it had already reasoned that the refund path was simple and the currency handling was fine. Those conclusions were still in its context when it "reviewed", so it re-read the code through the lens of decisions it had already endorsed. It was not lying; it was confirming, which is exactly the self-review limitation.
Now run it independently. A fresh claude -p step is handed only the diff and the project's standards, with no memory of the generation conversation, and asked to find correctness bugs. With nothing to defend, it examines the refund arithmetic directly and flags that partial refunds round in a way that can over-credit a customer. Same model, same code, but because the reviewer had no stake in the original design, it questioned the part the author was sure about. The lesson is not that the first session was incapable; it is that authorship and review should not share a context.
Misconceptions worth pinning down
These traps are the ones the exam dresses up as reasonable-sounding fixes.
Misconception
The same session can review its own code thoroughly as long as you explicitly tell it to be critical.
What's actually true
Misconception
To get an independent review you need a different, more capable model than the one that wrote the code.
What's actually true
Misconception
An automated independent review should block the pull request when it finds problems, the way a failing test does.
What's actually true
An engineer has Claude Code generate a new payment module in one session, then in that same session asks it to review the module for bugs. The review returns almost no findings. The exam expects you to identify the design flaw. Which statement is correct?
People also ask
Why is a separate session better at reviewing code?
Can Claude review code it just wrote?
How does Claude Code review pull requests?
Watch and learn
Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.
Github integration
Why watch: A reviewing instance separate from the author session catches more issues, the independent-review principle this KP defines.
More videos for this concept
References & primary sources
Master this concept with Archie
Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.