Compliance Hooks for High-Stakes Actions

In short: A compliance hook is a PreToolUse gate that verifies a mandatory precondition, such as a completed sanctions or approval check, before a high-stakes tool call may run. It exists because enhanced prompts are probabilistic and can never guarantee the 100% enforcement compliance demands.

What a compliance hook guarantees

A compliance hook is a specific application of pre-execution interception to actions that carry legal or regulatory weight. Rather than checking an arbitrary threshold, it verifies that a mandatory precondition has been satisfied before letting a sensitive tool call proceed. Has the sanctions screening run and passed? Has the required approver signed off? Has the customer been verified? If the answer is not a clear yes, the hook denies the call, and because that denial lives in code, it holds on every single attempt without exception.

The reason this earns its own knowledge point is the stakes. The official guidance describes hooks as giving deterministic control that ensures certain actions always happen rather than relying on the model to choose them, and compliance is the textbook case where always is the only acceptable standard. A control that works ninety-nine times out of a hundred is not a compliance control; it is a liability with good intentions. Evaluating an agent design at this level means insisting on that distinction.

Compliance hook: A PreToolUse hook that permits a high-stakes, regulated action only when a required precondition, such as a passed sanctions or KYC check, or a recorded approval, is verified. It enforces the rule deterministically so the action cannot run without the precondition being met.

Why enhanced prompts are the wrong answer

The defining exam trap for this knowledge point is choosing enhanced prompts over hooks for compliance-critical operations. It is seductive because a carefully written prompt, full of warnings and examples about always running the compliance check first, performs beautifully in testing. But a prompt is processed inside the model's reasoning, and that reasoning can be diluted by a long conversation, redirected by an unusual request, or talked around by a determined user. The failure is rare, but compliance is judged precisely on the rare case.

The honest framing is statistical. An enhanced prompt cannot guarantee one hundred per cent compliance; it can only reduce the failure rate. For a regulated transfer, even a small residual rate multiplied across thousands of transactions becomes an inevitability and a reportable breach. A hook removes the model from the decision for that one rule, so the residual rate for the checked condition collapses to zero. This is the hooks versus prompts framework at its sharpest, where the cost of a single failure is not inconvenience but legal exposure.

100%

the only acceptable compliance rate

legal risk

what one failure can trigger

deny

the verdict when a check is missing

Designing the gate

A robust compliance gate has three properties, and evaluating a design means checking for all three. First, it must fail closed: if the hook cannot confirm the precondition, it denies the action rather than assuming the best. A gate that allows the call when it is unsure is no gate at all. Second, the precondition must be verifiable from data the hook can actually inspect, such as a recorded screening result, not something the model merely claims to have done. Third, the denial must be informative, so the model can take a compliant alternative path, such as triggering the missing check or escalating to a human.

This is the same discipline as prerequisite gate design, narrowed to regulated actions. The gate does not try to be clever; it tries to be certain. By keeping the precondition explicit and machine-checkable, you make the control auditable, which matters as much as the enforcement itself: a regulator wants evidence that the check could not be skipped, and a deterministic hook provides exactly that record.

A compliance gate on a regulated transfer

Loading diagram...

The transfer can only execute once the hook verifies the mandatory checks; unknown states fail closed.

Auditability is part of the requirement

Enforcement is only half of what a regulated action demands; the other half is proof. A compliance regime does not just require that the wrong thing cannot happen, it requires evidence that it could not have happened, and that evidence has to survive an auditor's scrutiny months later. A deterministic hook is uniquely good at producing it, because every decision flows through one explicit checkpoint that can log what was checked, what result it found, and whether it allowed or denied the call.

A prompt-based control cannot offer the same assurance. Even if you log the model's narration, you are recording what the model said it did, not a mechanical record that the gate ran and made a binding decision. When you evaluate a design at this level, treat the audit trail as a first-class requirement alongside the block itself: the gate should leave a durable, tamper-evident record keyed to the transaction. Designs that enforce but cannot prove are weaker than they look, and recognising that gap is part of judging the architecture rather than just admiring its intent.

When escalation beats outright denial

A subtle part of evaluating a compliance design is choosing the right verdict when a precondition is missing. Denial is not the only option; the hook can escalate. If a transfer arrives without a completed screening, an outright deny is correct when the screening simply must precede the action, but an ask that routes the decision to a compliance officer is often the better fit when a human is permitted to authorise an exception under documented conditions. The choice depends on whether the rule admits any human override at all.

What never changes is that the model is not the one granting the exception. Whether the hook denies or escalates, the authority to proceed sits with a deterministic rule or a named human, never with the agent reasoning its way to an allowance. Evaluating a design means checking that its handling of the missing-precondition case is both safe and workable: safe because nothing slips through, and workable because legitimate edge cases have a sanctioned path forward rather than a dead end. A gate that only ever denies can be as much of a problem as one that leaks, if it blocks transactions the business is entitled to complete.

Scoping the gate correctly

A compliance gate is only trustworthy if it fires on exactly the actions it is meant to govern, no more and no less, so scoping is part of the design rather than a detail. The matcher is where this is decided. The Agent SDK hooks documentation warns that an empty matcher matches every tool, so a gate written without a precise matcher can silently intercept the whole agent, adding latency and decisions to calls it has no business judging. The discipline is to scope the matcher exactly to the regulated tool, then confirm in testing that ordinary tools pass through untouched.

Subagents raise a second scoping question. When an agent spawns subagents, a hook can fire inside those nested sessions as well as the top-level one, which may not be what a compliance control intends. The docs expose a subagent indicator in the hook input so you can detect that case and scope enforcement to the top-level agent session when recursive gating would be wrong. When a gate behaves unexpectedly, the permissionDecisionReason it returns is the first thing to inspect, because it records why a given call was allowed or denied and turns a mysterious block into a traceable decision. Getting scope and observability right is what separates a gate that genuinely protects one regulated action from one that quietly polices everything.

Evaluating a proposed design

Because this knowledge point is pitched at the evaluate level, the skill being tested is judgement, not recall. You are handed a design and asked whether it is sound, or which of several designs best protects a regulated action. The winning analysis always asks the same things: does the control hold every time, is it enforced outside the model, does it fail closed, and is the precondition verifiable. A design that answers yes to all four is defensible. One that leans on prompt wording, however elaborate, fails the first two and is rejected.

A subtle distractor at this level argues that a hook adds latency or rigidity that makes a prompt preferable for user experience. For a compliance rule, that trade is not yours to make: the cost of a breach dwarfs a few milliseconds, and the rigidity is the entire point. Recognising that the usual flexibility argument does not apply to regulated actions is part of evaluating the design correctly.

Worked example

A banking agent can move money with an international_transfer tool. Regulation requires that every cross-border transfer is screened against sanctions lists and that the originating customer has completed KYC before the funds move.

A first team proposes a strong system prompt: a detailed instruction that the agent must always confirm sanctions screening and KYC status before calling international_transfer, with worked examples of refusals. In a hundred test transfers it behaves perfectly. Evaluated against the four criteria, though, it fails the first two immediately: the rule is enforced inside the model, so it cannot hold every time. On transfer number two thousand, buried in a long multi-step session, the model proceeds without re-confirming, and an unscreened transfer leaves the bank. That is a reportable breach.

A second team adds a PreToolUse compliance hook on international_transfer. The hook reads the customer and counterparty identifiers from the proposed arguments, looks up the recorded sanctions and KYC results, and only returns allow when both are present and passed. Anything else, including a missing or stale result, returns deny with a reason, so the agent either triggers the screening or escalates to a compliance officer. The unscreened transfer is now impossible, and every decision is logged for audit.

Evaluated side by side, the verdict is not close. Both designs read well, but only the hook satisfies the criteria that matter for a regulated action: deterministic, model-independent, fail-closed, and auditable. The prompt is a useful complement for tone and explanation, but it cannot be the control. Choosing the hook, and being able to say precisely why the prompt is insufficient, is the answer the exam is looking for.

It is worth noticing how the two designs fail differently under pressure, because that is what an evaluation should surface. The prompt design degrades silently: nothing announces the moment it lets an unscreened transfer through, and the breach is discovered only afterwards, if at all. The hook design degrades loudly and safely: when a screening result is missing it stops the transfer and says so, turning a would-be incident into a routine, visible escalation. A control that fails safe and visibly is categorically better than one that fails silently, and weighing failure modes rather than happy-path behaviour is the essence of evaluating a high-stakes design.

Common misreadings to avoid

Misconception

A really well-engineered prompt, with strong warnings and examples, can serve as the compliance control for a regulated transfer.

What's actually true

A prompt is probabilistic and cannot guarantee 100% enforcement, so it can never be the control for a regulated action. It may complement a hook for explanation, but the binding check must be a deterministic PreToolUse gate that fails closed.

Misconception

A compliance hook is undesirable because the extra check adds latency and makes the agent less flexible.

What's actually true

For a regulated action that trade is the wrong way round. The cost of a single breach far exceeds a small latency, and the inflexibility is the intended guarantee. Flexibility belongs to soft rules, not compliance-critical ones.

How it shows up on the exam

Compliance scenarios are a favourite of Domain 1, especially within the customer support and financial contexts, because they force a clear choice between deterministic and probabilistic control. The question typically contrasts an enhanced-prompt design with a hook-based gate and asks which is appropriate, or asks you to critique a design that protects a regulated action with prompt wording alone. The credited answer is always the hook, justified by the impossibility of a 100% guarantee from a prompt.

When you meet one of these items, anchor on the word must. If the action must not happen without a verified precondition, no amount of prompt engineering qualifies, and you select the deterministic gate. Carrying that conviction into hook pipeline architecture lets you place the compliance gate correctly among other hooks, ahead of any step that would act on an unverified operation.

Check your understanding

A fintech agent issues cross-border payments. The compliance team needs an absolute guarantee that no payment executes before sanctions screening passes. An architect must choose the control. Which design is defensible, and why?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Compliance Hook Design for High-Stakes Agent Actions