AI Skill Certs
Prompt Engineering & Structured Output·Task 4.2·Bloom: apply·Difficulty 3/5·9 min read·Updated 2026-06-07

Few Shot Examples for Edge Cases and Ambiguous Decisions

Apply few-shot prompting to improve output consistency and quality

SUBy Solomon UdohReviewed by Solomon UdohAI-assisted · human-reviewed
In short
Targeting few-shot examples at ambiguous edge cases means demonstrating the borderline inputs where reasonable people might disagree, rather than the obvious ones. By showing the chosen action and the reasoning at exactly those boundary points, you teach Claude where the decision line falls, which is the part of the task it actually gets wrong.

Why few shot examples for edge cases matter

Few shot examples for edge cases is the practice of deliberately choosing your demonstrations from the ambiguous, borderline region of a task, the inputs where a thoughtful human reviewer might pause or where two experts could reasonably land on different answers. It is a focused refinement of example construction: not just how to write a good example, but precisely which inputs deserve to become examples in the first place.

The instinct most people have is to demonstrate the task with clean, representative cases. That feels safe and illustrative, but it largely wastes the demonstration. A model that already classifies the obvious cases correctly learns nothing new from being shown them. The leverage lives at the edges, in the cases the model currently fumbles, because that is where a worked demonstration can actually move the decision.

Edge-case few-shot prompting
Selecting few-shot examples from the borderline inputs where the correct action is genuinely ambiguous, and pairing each with its chosen action and reasoning, so Claude learns the decision boundary rather than rehearsing cases it already handles.

Obvious cases teach almost nothing

Think about what an example does. It updates the model's behaviour at and around the input it shows. If that input is one the model already handles confidently, the update is near zero, you have confirmed a decision that was never in doubt. Spend two of your scarce example slots on obvious cases and you have effectively spent nothing, while the ambiguous inputs that actually drive your error rate remain undemonstrated.

This is the quiet failure the exam likes to probe. A prompt that adds few-shot examples but draws them all from clear-cut cases looks diligent and still underperforms on the hard inputs, because the examples never touched the part of the input space where the model was uncertain. The fix is not more examples; it is examples placed where the uncertainty is.

Where the decision boundary actually lives

The mental model worth carrying is a line through your input space. On one side the correct action is A, on the other it is B, and most inputs sit comfortably far from the line. The model already gets those. The trouble is the band of inputs hugging the line, the genuinely close calls, and your examples should cluster there.

Place examples at the boundary, not in the easy interior
Loading diagram...
Examples drawn from the ambiguous band reshape the boundary; examples from the easy interior barely move it.

When you demonstrate two inputs that sit close together but split to different decisions, you do something an obvious example cannot: you show the model the contrast that defines the line. A single borderline example fixes one point; a pair of near-neighbours on opposite sides of the call reveals the criterion separating them. That contrast is what generalises into a crisp boundary on the next close call the model has never seen.

Showing the road not taken

Edge-case examples lean even harder on reasoning than ordinary ones, because at the boundary the alternative was genuinely tempting. A good borderline example names it: this input could plausibly have warranted action B, and here is the specific feature that tips it to A instead. Without that, the model sees a decision at the boundary but no account of why the close call broke the way it did, and it cannot reconstruct the criterion.

So the construction is: pick an input where the wrong choice was attractive, state the chosen action, and articulate the deciding feature that resolved the ambiguity. You are not just labelling a hard case; you are teaching the tiebreaker. This is why edge-case work is an apply-level skill built on top of constructing effective few-shot examples: the example anatomy is the same, but the input selection and the emphasis on the deciding feature are specialised for ambiguity.

Canonical examples beat a wall of edge-case rules

A tempting alternative to edge-case examples is to write the boundary out as prose: a long list of if-then clauses that tries to name every exception in advance. Anthropic explicitly warns against this. Its context-engineering guidance notes that teams often try to encode every possible edge case as an explicit rule, and that this is not the recommended approach; a small set of diverse, canonical examples that represent the behaviour you want steers more reliably than an exhaustive rulebook, with the official prompting guidance putting the working number at roughly three to five relevant, diverse examples. The reason mirrors everything above: a curated demonstration shows the resolved judgement, whereas a rule still has to be interpreted, and when rules pile up they interact and spawn contradictions the model must adjudicate.

The same guidance is specific about how to structure the reasoning inside those examples. Anthropic recommends wrapping each demonstration in <example> tags, and the whole set in <examples> tags, so Claude reads them as demonstrations rather than instructions, and separating the deciding rationale from the final decision with <thinking> and <answer> tags. When extended thinking is enabled, examples that show reasoning in <thinking> blocks teach Claude a reasoning style it will generalise; when it is off, the same tags keep the borderline rationale from leaking into the final output. For ambiguous edge cases, where the deciding feature is the whole lesson, that separation is what lets the model absorb why a close call broke one way without copying the explanation verbatim into its answer.

One related pitfall is worth naming. A blanket directive such as "if in doubt, flag it" tends to over-trigger, pushing the model to act on cases it should have left alone. A single canonical example that shows a genuinely doubtful input being left alone teaches the boundary far more precisely than any catch-all instruction, which is the edge-case principle applied to the prompt's own wording rather than to its data.

How this is tested

Domain 4 questions on this knowledge point tend to describe a team that added few-shot examples and saw little improvement on the tricky inputs that motivated the change. You are asked to diagnose why. The answer the exam rewards is that the examples demonstrated obvious cases rather than the ambiguous ones, so they failed to inform the borderline judgements. The corrective is to re-select the examples from the genuinely close calls and to show the reasoning that resolves each.

A frequent distractor suggests adding many more examples or a stronger instruction. Both miss the point: the problem is placement, not quantity or wording. Before you even reach this refinement, you confirm few-shot is the right tool at all, which is the job of few-shot scenario evaluation; once you know it is, edge-case targeting is how you get the most out of it.

Worked example

A content-moderation assistant flags policy violations. It handles clear violations and clearly-safe posts well, but is inconsistent on sarcasm and reclaimed slurs. The team added few-shot examples of obvious violations and saw no improvement.

The team's first attempt demonstrated a handful of unambiguous cases: an overt threat (flag), a friendly greeting (allow). The assistant already handled both perfectly, so the examples changed nothing about its behaviour on the inputs that were actually failing.

You re-select the examples from the ambiguous band. Example one: a post using a slur in a clearly self-referential, reclaimed sense. Chosen action: allow. Reasoning: although the term is on the watch list, the speaker is referring to their own group and the surrounding context is supportive, which distinguishes reclaimed use from a slur directed at others. Example two: a superficially similar post using the same term aimed at another user as an insult. Chosen action: flag. Reasoning: the identical term is here directed at a target with hostile intent, which is the feature that flips the decision.

These two near-neighbours sit on opposite sides of the line and differ on exactly the deciding feature, target and intent. A third example handles sarcasm: a post that literally praises a banned ideology in a tone the surrounding thread makes plainly mocking. Chosen action: allow, with reasoning about the contextual cues that mark sarcasm.

Now the assistant has been shown the boundary, not the interior. Its accuracy on novel sarcastic and reclaimed-language posts rises, because it learned the criteria that resolve the close calls rather than re-confirming the easy ones. The number of examples barely changed; their placement changed everything.

Near-neighbours: the most efficient example you can give

Of all edge-case examples, the most informative is a matched pair of near-neighbours, two inputs that look almost identical yet break to different decisions. The reason is that a single example fixes a point but leaves the slope of the boundary unknown; the model sees that this input maps to action A but cannot tell which feature made the difference. A near-neighbour on the other side of the call isolates that feature by holding everything else roughly constant. The contrast between the pair does the teaching that neither example could do alone.

In practice you hunt for these pairs in your error logs. Find two inputs your prompt currently treats the same that ought to be treated differently, or two it splits that a human would treat alike, and you have located the boundary precisely. Promoting that pair to examples, with the deciding feature spelled out in each, gives Claude the sharpest possible signal about where the line truly runs. One well-chosen pair routinely outperforms four unrelated borderline examples, because the others each illuminate a single point while the pair illuminates the rule that connects them.

When the boundary itself is contested

Sometimes the ambiguity is not the model's confusion but a genuine lack of consensus: reasonable reviewers really do disagree about the right call. Edge-case examples are still the right move, but their job quietly changes. Instead of revealing a boundary that already exists in the task, they now define one, encoding the policy your team has chosen so the model applies it consistently even where human opinion would split. Here the reasoning in each example doubles as documentation of why your organisation draws the line where it does.

This is an underappreciated benefit of edge-case prompting. By forcing yourself to articulate the deciding feature for each contested case, you surface and settle policy questions that vague instructions had previously left to chance. The examples become both a teaching signal for Claude and a durable record of deliberate choices for the humans who review its output, which is exactly the kind of provenance the reliability side of the exam rewards.

Misconceptions that cost marks

Misconception

To make few-shot examples reliable, demonstrate plenty of clear, representative cases so the model has a solid foundation.

What's actually true

Clear cases are ones the model already handles, so demonstrating them teaches little. The examples that change behaviour are drawn from the ambiguous edge cases where the model is currently inconsistent.

Misconception

If edge-case examples are not helping, the answer is to add many more examples.

What's actually true

The issue is usually placement, not quantity. A few examples positioned at the decision boundary, each showing why the close call broke the way it did, outperform a large set drawn from the easy interior of the task.

Edge cases shift as the model and the task evolve

The boundary you teach today is not fixed forever. As your task drifts, as new kinds of input arrive, or as you move to a more capable model, the location of the hard cases moves with it. Inputs that were genuinely ambiguous for last quarter's prompt may now be handled effortlessly, while a fresh band of confusing cases opens up somewhere you were not watching. That makes edge-case selection a recurring activity rather than a one-time setup. The healthiest pipelines treat their example set as living: they periodically mine new error logs, retire demonstrations the model has outgrown, and promote the latest close calls into examples. Spending example slots on yesterday's edge cases is simply a slower version of spending them on obvious ones, since both demonstrate behaviour the current model already has in hand. Keeping the examples aimed at where the model is uncertain right now is what sustains the gains over time instead of letting them quietly decay.

The takeaway

Spend your example budget where the model is uncertain. Choose borderline inputs, ideally near-neighbours that split to different decisions, and make the deciding feature explicit in the reasoning. Done well, a small handful of edge-case demonstrations sharpens Claude's decision boundary far more than a long list of obvious ones, which is the whole reason this refinement exists. It is the natural complement to format-spanning examples for extraction quality, where breadth of structure matters and here depth at the boundary matters.

the boundary
where examples actually change behaviour
near-neighbours
pairs that split reveal the deciding feature
placement
matters more than the number of examples
Check your understanding

A triage assistant routes support tickets. It is accurate on clearly-urgent and clearly-routine tickets but inconsistent on the borderline ones. The team added few-shot examples of obvious urgent and obvious routine tickets, with no improvement on the borderline cases. What is the best fix?

People also ask

Should few-shot examples cover obvious cases or edge cases?
Edge cases. The model already handles obvious inputs, so demonstrating them teaches little. Aiming examples at the ambiguous, borderline inputs is where the demonstrations actually change behaviour.
How do you teach Claude a decision boundary?
Show inputs that sit close to the line on both sides, each paired with the chosen action and the deciding feature that resolved it. Seeing similar inputs split into different decisions teaches the criterion that separates them.
Why do my few-shot examples not help with ambiguous inputs?
Usually because they demonstrate clear-cut cases the model never struggled with. If the examples do not sit at the boundary, they cannot inform the borderline judgements where the model actually goes wrong.

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

Anthropic

Prompting 101 | Code w/ Claude

Why watch: Anthropic engineers build a real insurance-claims prompt and show how adding diverse, edge-case examples with the reasoning behind each labelled decision teaches Claude correct decision boundaries on ambiguous inputs.

More videos for this concept

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying