Claude Code Independent Review Instance

In short: A Claude Code independent review instance is a separate session that reviews code without carrying the reasoning context of the session that generated it. Because a generating session is anchored to the decisions it already made, it is biased toward confirming them; a fresh, context-free reviewer evaluates the code on its merits and surfaces more of the subtle issues.

What a Claude Code independent review instance is

A Claude Code independent review instance is a reviewer that starts cold. It is given the code or the diff and asked to evaluate it, but it carries none of the conversation, intentions, or trade-offs that produced that code. This sounds like a small detail. It is in fact the whole mechanism, because the quality of a review depends far more on what the reviewer does not know than on what it does. A reviewer that already believes the design is sound, because it is the one that argued for it, will read the code looking for confirmation. A reviewer with no such history reads the code looking for problems.

The contrast that defines this knowledge point is between two ways of asking for a review. In the first, you generate a module and then, in the same session, ask "now review this for bugs". In the second, you open a separate instance, hand it only the resulting code, and ask the same question. The model is identical. The prompt is nearly identical. Yet the second consistently surfaces more, and more subtle, issues. Understanding why is the point of this concept.

Independent review instance: A review performed by a separate Claude Code session that has no access to the reasoning context of the session that generated the code, so it evaluates the code on its own merits rather than defending prior decisions.

The self-review limitation

When a session writes code, it accumulates a chain of reasoning: why this structure, why that edge case was skipped, why a certain shortcut is acceptable. That context does not vanish when you ask the same session to review. It colours the review. The session has already concluded that its choices were right, so when it re-reads its own work it tends to confirm rather than challenge. Anthropic names this self-review limitation directly: the same session that generated the code retains its reasoning context, which makes it less likely to question its own decisions.

This is not a quirk of language models so much as a property of any reviewer who is also the author. A human engineer reviewing their own pull request skims past the assumptions they have already internalised; the bug hides exactly where they are sure there is none. The independent instance breaks that loop by removing the shared context. It cannot defend a decision it never made, so it has to assess the decision on the evidence in front of it. That is why an independent review is structurally better positioned to catch the issues a self-review waves through.

Self-review versus an independent review instance

Loading diagram...

The author session defends its decisions; the independent instance evaluates the code without that bias.

Independence is about context, not capability

A frequent misreading is that you need a stronger or different model to get a good review. You do not. The independent instance can be the very same model that wrote the code. What changes is the context window, not the weights. Strip away the generation history and the same model becomes a more honest critic, because it no longer has a position to protect. This is liberating from a design standpoint: you do not need a second vendor, a bigger model, or any special review model. You need a clean session.

It also clarifies what "independent" must mean in practice. Resuming the generation session, or pasting the original design discussion into the review prompt, quietly reintroduces the bias you were trying to escape. True independence means the reviewer sees the artefact and the standards to judge it by, and nothing about how it came to be. The cleaner that separation, the more value the review adds.

same model

no stronger model required

fresh context

the actual source of independence

more issues

what independence buys you

How Anthropic productises the principle

The managed Claude Code review feature shows the principle taken to its conclusion. Rather than a single reviewer, it runs a fleet of specialised agents in parallel, each looking for a different class of issue, logic errors, security vulnerabilities, broken edge cases, subtle regressions, over the diff in the context of the full codebase. A separate verification step then checks each candidate finding against the actual code behaviour to filter out false positives before anything is posted. The findings are deduplicated, ranked by severity, and posted as inline comments on the exact lines where the issues live.

Two design choices echo this knowledge point. First, the review runs independently of whoever authored the change; it is not the author defending their work. Second, the verification pass exists precisely because an independent system is willing to question its own candidate findings, the inverse of the self-review bias. You can also reach the same outcome in your own CI: have one step generate or change code, then a distinct step, a fresh claude -p invocation or a separate GitHub Actions job, perform the review with no shared session.

Severity, verification, and the shape of an honest review

The managed reviewer does not just find issues; it grades and checks them, and both steps echo the independence principle. Every finding is tagged by severity. Important for a bug that should be fixed before merging, Nit for a minor issue worth fixing but not blocking, and Pre-existing for a bug that already lived in the codebase rather than one this change introduced. That last category is revealing. Reporting a Pre-existing bug only makes sense for a reviewer with no stake in the diff: an author defending their own change has little incentive to surface a bug they did not write, while an independent system reports it plainly because it has no narrative to protect and no authorship to flatter.

The verification step is the sharper echo. After the parallel agents propose candidate findings, a separate pass checks each candidate against the actual behaviour of the code to filter out false positives. Read that as a system willing to question its own first answers, the exact inverse of the self-review bias, where a session treats its first conclusion as settled. Independence at the generation boundary (the reviewer never saw how the code was written) and skepticism at the finding boundary (every candidate must survive a check) are two expressions of the same discipline: a conclusion is earned against the evidence in front of you, not inherited from reasoning you happen to remember.

A design consequence follows from how the result is reported. The managed review posts inline comments and populates a check run, but it tags findings by severity and neither approves nor blocks the pull request, the check completes with a neutral conclusion so existing workflows stay intact. Independence here means the reviewer informs the humans rather than gating them; the decision to merge stays with the team, and the reviewer's job is to make that decision better informed. Each finding even ships with a collapsible reasoning section, so a developer can inspect why the issue was flagged and how it was verified rather than taking the verdict on trust.

Steering the independent reviewer with REVIEW.md

Independence does not mean the reviewer judges to a generic standard you cannot influence. Anthropic exposes two repository files that shape what every review agent looks for, and knowing their precedence is an exam-relevant detail. A CLAUDE.md at the repository root supplies the shared project context the reviewer reads like any other Claude Code task: your conventions, your architecture, what the code is meant to do. A REVIEW.md, also at the repository root, is the review-specific override. Its contents are placed into the system prompt of every agent in the review pipeline and take precedence over the default review guidance, which makes it the highest-priority instruction the reviewer follows.

That precedence has a consequence teams often miss. For review behaviour specifically, REVIEW.md outranks CLAUDE.md. A newly introduced violation of a CLAUDE.md rule is surfaced as a nit-level finding rather than a blocking one, because the default review emphasis is production correctness, the bugs that would break in production, not formatting or missing coverage. A team that expects its CLAUDE.md conventions to be enforced strictly in review is surprised when those violations arrive as quiet nits. The lever for stronger enforcement is REVIEW.md, not CLAUDE.md.

Inside REVIEW.md you calibrate the independent reviewer to your repository: which classes of finding count as Important versus Nit, which paths or categories should produce no findings at all, the mandatory checks every PR must pass, and a verification bar such as requiring a file:line citation as evidence before a finding is posted. None of this compromises independence. The reviewer still never saw how the code was written, but the honest review it produces is now measured against your standards rather than a generic default.

Rolling your own independent reviewer

You do not need the managed service to apply the principle, and the exam expects you to see how to build it. The pattern is a separation of steps: one job or claude -p invocation generates or changes the code, and a distinct invocation, a fresh session, ideally a separate CI job, reviews the resulting diff with no shared conversation. Set the reviewer's standards with an appended system prompt rather than by pasting the original design discussion, because that discussion is exactly the context whose absence makes the review honest. Return the findings as structured output so a later step can act on them.

The architecture is deliberately plain, and that plainness is the lesson. Independence is achieved by what you withhold from the reviewer, not by what you add, so the cheapest correct design is simply to not share the generation session. There is no second model to license, no special review endpoint to call, and no elaborate orchestration to maintain. There is a clean session and a clear standard. An architect who reaches for a bigger model when an empty review appears has diagnosed the wrong layer; the fix lives in the context boundary, which costs nothing to draw.

How this knowledge point is tested

This is an understand-level concept, so the exam checks whether you grasp the reason, not just the rule. A typical Scenario 5 item describes an engineer who generates code and reviews it in the same session, then is surprised the review found almost nothing. The distractors offer mechanistic explanations that miss the point, the wrong model, a missing flag, an exhausted context window. The correct answer identifies the self-review limitation and prescribes an independent review instance. If you can articulate that independence comes from withholding the generation context rather than from a better model, you can answer any phrasing of this question, and you are ready for the incremental-review concept that builds directly on it.

Worked example

An engineer has Claude generate a payment module, then asks the same session to review it, and the review comes back nearly empty.

The workflow looks efficient: one session writes a new payment-capture module, and immediately afterwards the engineer types "review this module for bugs and edge cases". The review returns two cosmetic notes and declares the logic sound. A week later a rounding error in partial refunds reaches production.

Trace what the session was actually doing. While generating, it had already reasoned that the refund path was simple and the currency handling was fine. Those conclusions were still in its context when it "reviewed", so it re-read the code through the lens of decisions it had already endorsed. It was not lying; it was confirming, which is exactly the self-review limitation.

Now run it independently. A fresh claude -p step is handed only the diff and the project's standards, with no memory of the generation conversation, and asked to find correctness bugs. With nothing to defend, it examines the refund arithmetic directly and flags that partial refunds round in a way that can over-credit a customer. Same model, same code, but because the reviewer had no stake in the original design, it questioned the part the author was sure about. The lesson is not that the first session was incapable; it is that authorship and review should not share a context.

Misconceptions worth pinning down

These traps are the ones the exam dresses up as reasonable-sounding fixes.

Misconception

The same session can review its own code thoroughly as long as you explicitly tell it to be critical.

What's actually true

Instructions help at the margin, but the session still carries the reasoning that justified its choices, so it remains biased toward confirming them. The reliable fix is structural: review in an independent instance that never saw the generation context.

Misconception

To get an independent review you need a different, more capable model than the one that wrote the code.

What's actually true

Independence comes from a fresh context, not a stronger model. The same model in a clean session, given only the code and the standards, reviews more honestly because it has no prior decisions to defend.

Misconception

An automated independent review should block the pull request when it finds problems, the way a failing test does.

What's actually true

Anthropic's managed Code Review tags findings by severity and posts them as inline comments, but completes its check run with a neutral conclusion that neither approves nor blocks the PR, so existing review workflows stay intact. Independence makes the review more honest; it does not turn the reviewer into a merge gate. If you want to block on findings, your own CI reads the severity counts from the check run and decides.

Check your understanding

An engineer has Claude Code generate a new payment module in one session, then in that same session asks it to review the module for bugs. The review returns almost no findings. The exam expects you to identify the design flaw. Which statement is correct?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Claude Code Independent Review Instance Explained

What a Claude Code independent review instance is

The self-review limitation

Independence is about context, not capability

How Anthropic productises the principle

Severity, verification, and the shape of an honest review

Steering the independent reviewer with REVIEW.md

Rolling your own independent reviewer

How this knowledge point is tested

Misconceptions worth pinning down

People also ask

Watch and learn

References & primary sources

Master this concept with Archie