Scratchpad Files for Findings

In short: A scratchpad file is an external file where an agent writes key findings during an investigation and reads them back on later turns. Because the file lives outside the context window, the findings persist even as the window fills or degrades, turning fragile conversational memory into durable, re-readable state.

What scratchpad files solve and why

Scratchpad files are the first and simplest remedy for the context degradation you learned to recognise in the prerequisite knowledge point. The problem there was that an extended session loses grip on specific findings as the window fills with verbose output. A scratchpad fixes this by refusing to keep the important facts in the window at all. Instead, the agent writes each key discovery to a file the moment it finds it, and on later turns it reads that file back rather than relying on memory that may have already degraded.

The principle is just-in-time context. Rather than loading and holding everything upfront, the agent keeps the active window focused on the current step and pulls durable facts back from storage only when they are needed. Anthropic frames external files as memory that lives "outside of the context window," with notes that "get pulled back into the context window at later times." That phrasing is the heart of this apply-level skill: a scratchpad is working memory you control, not working memory the model has to hold.

Just-in-time context: A strategy where an agent stores what it learns in external memory and retrieves only the relevant pieces on demand, rather than carrying all information in the active context window. Scratchpad files are the concrete implementation of this idea.

How the write-then-reference loop works

The mechanics are deliberately mundane, which is what makes them reliable. As the agent investigates, it records findings in a structured note: the class it examined, the behaviour it confirmed, the edge case it spotted, the open question it still needs to resolve. When a later turn needs one of those facts, the agent reads the file and works from the exact recorded text instead of from a possibly-degraded recollection. The window stays lean because the bulk of accumulated knowledge sits on disk, and the facts stay accurate because a written line does not blur the way attention over a long window does.

Anthropic productises this as the memory tool. When enabled, Claude can create, view, str_replace, insert, and delete files in a dedicated /memories directory, and the tool ships with a system instruction to check that directory before starting work. The documentation captures the discipline in a single line: the agent should "record status / progress / thoughts" as it goes, because "your context window might be reset at any moment, so you risk losing any progress that is not recorded in your memory directory." That sentence is the entire argument for scratchpads.

The write-then-reference scratchpad loop

Loading diagram...

Findings live on disk, so a filling or degrading window cannot erase them; the agent re-reads the file rather than trusting fading memory.

What belongs in a scratchpad and what does not

A good scratchpad is curated, not a dump. The aim is to externalise the high-value, hard-won facts that would be expensive to rediscover: confirmed behaviours, file and symbol locations, decisions made, and unresolved questions. It is not the place to paste entire files or raw tool output, because that simply recreates the bulk problem on disk and makes the note slow to re-read. Anthropic's guidance on memory files reinforces this: keep the content "up-to-date, coherent and organized," rename or delete entries that are no longer relevant, and avoid creating new files unless necessary.

This is also where the scratchpad connects to the wider Domain 5 material. The same instinct that drives the persistent case facts block in customer-support work, pull the load-bearing facts into a block that is never summarised, drives the scratchpad in codebase work. Both treat critical facts as too important to leave in summarisable, degradable context. The difference is medium: a case-facts block rides inside the prompt, while a scratchpad lives in a file the agent reads on demand.

Worked example

An agent is mapping how a payments codebase handles idempotency across 60 endpoints, and the engineer wants the findings to survive a two-hour session.

The agent starts by creating a scratchpad, findings/idempotency.md, before it reads any code. As it works through the endpoints it appends one terse line per discovery: "/charges uses an Idempotency-Key header stored in Redis for 24h," "/refunds reuses the charge key, possible collision," "/payouts has no idempotency guard, gap." Each line is a fact it would otherwise have to re-derive, now safely on disk.

Two hours later the window is full of the directory listings and file reads that mapping 60 endpoints required. Asked to summarise the idempotency gaps, the agent does not strain to recall them from a degraded window. It reads findings/idempotency.md and reports the /payouts gap and the /refunds collision verbatim, because they were written down when they were fresh. The verbose exploration that filled the window is irrelevant to the answer, the durable note carried the signal through.

Contrast the failure mode the exam likes to test: an agent that investigates all 60 endpoints purely in conversation, never writing anything down, and then produces a vague summary that misses the /payouts gap because that finding rotted out of context an hour ago. The fix is not a bigger model; it is the discipline of writing findings to a file as they are discovered.

There is a second payoff that the scenario makes obvious only in hindsight: the scratchpad is portable. Because the findings live in findings/idempotency.md rather than in one session's window, a fresh session the next morning, a teammate reviewing the audit, or a subagent assigned a follow-up can all read the same file and inherit the work instantly. A finding trapped in a context window helps exactly one session, once; a finding written to a file helps every reader that comes after. That portability is why externalising memory is not merely a degradation defence but a multiplier on the value of the investigation itself.

Designing a scratchpad that stays useful

A scratchpad earns its keep only if it is fast to re-read, which means its design matters as much as its existence. The most useful layout is a small set of headed sections rather than one long stream: confirmed facts, open questions, decisions, and a short running log. Headed structure is not cosmetic. It exploits the same positional attention effect behind the lost-in-the-middle effect, when the agent pulls the file back in, the section it needs is easy to locate and sits where the model reads reliably, rather than buried mid-stream where it might be skimmed.

Entries should be terse and status-bearing. "Confirmed: /payouts has no idempotency guard" is worth more than a paragraph of narration, because it states the fact and its confidence in a line the agent can trust on re-read. As work progresses, the agent should update the file rather than only appending, resolving open questions and pruning entries that no longer matter. Anthropic's memory-tool guidance is explicit that letting files grow unbounded is a failure mode: it advises capping how much a read returns and keeping content organised so the note stays cheap to consult. A scratchpad that has decayed into an unstructured dump has quietly recreated the very bulk problem it was meant to escape. A simple convention keeps it disciplined: one fact per line, a clear status word at the front of each entry, and a quick prune whenever a section grows past what a single glance can absorb.

Scratchpads, conversation memory, and CLAUDE.md

It helps to place the scratchpad in a three-tier model of agent memory. The most volatile tier is the conversation itself: the live exchange, which degrades as it fills and vanishes when the session ends. The middle tier is the working scratchpad: durable for the length of an investigation, written and re-read as findings accumulate, prunable when the task is done. The most stable tier is durable project memory such as a CLAUDE.md file, which holds slow-changing knowledge, architecture, conventions, commands, and loads automatically every session.

Choosing the right tier is part of the apply-level skill. Stable project facts belong in CLAUDE.md so every future session inherits them; transient discoveries from this particular investigation belong in a scratchpad; the live back-and-forth stays in the conversation. Putting investigation findings in CLAUDE.md pollutes the permanent record with throwaway detail, while leaving stable conventions only in a scratchpad means every new session must rediscover them. Matching the lifetime of the fact to the lifetime of the store is what keeps each tier lean and trustworthy, and it is a distinction the exam expects an architect to make rather than treating all written memory as interchangeable.

Shared and personal project memory: CLAUDE.md and CLAUDE.local.md

The durable tier introduced above has a concrete, official shape in Claude Code. A CLAUDE.md file at the project root is persistent, project-scoped context that Claude reads at the start of every session; Anthropic's best-practices guidance recommends using it for the build and test commands, code-style rules, and workflow conventions a session should always know. Because it loads automatically and is checked into the repository, it is shared team knowledge, the right home for facts that every future session and every teammate should inherit.

For notes that are yours rather than the team's, Claude Code also reads a CLAUDE.local.md at the project root. Anthropic suggests adding it to .gitignore so personal, project-specific instructions stay off the shared record. The split sharpens the apply-level judgement this knowledge point trains: a private shortcut or a personal reminder belongs in CLAUDE.local.md, while a convention the whole team relies on belongs in the shared CLAUDE.md. Storing personal instructions in the shared file pollutes everyone's context with one person's preferences.

Two further details round out the mechanism. Claude Code supports a hierarchy of CLAUDE.md files, loaded from broader to more specific locations, so a repository-level file and a subdirectory file combine rather than conflict. And because compaction is lossy, you can use CLAUDE.md to tell Claude what to preserve when context is compacted, steering the summary toward the project facts you cannot afford to lose. None of this replaces the working scratchpad; it is the stable layer beneath it, and matching each fact to the right layer is what keeps every tier lean.

How this is tested on the exam

At the apply level, Task 5.4 questions describe an agent facing a long or complex investigation and ask which technique preserves findings most reliably. When the core need is to keep specific discoveries accurate across a session that will outlast the window, scratchpad files are the answer. The exam rewards recognising that the durability comes from the storage being external: a file does not degrade, so re-reading it is strictly more reliable than re-deriving the fact from conversation.

The distractors usually offer plausible-but-weaker alternatives: holding everything in context (which degrades), or summarising the conversation (which compresses and can drop the very numbers and identifiers you needed). Scratchpads beat both for raw fact preservation because they neither rely on the window nor lossily compress. Where the right tool is instead isolation of noisy work, that is subagent delegation; where it is reducing an already-full window, that is summary injection and /compact. Knowing which need maps to which tool is exactly the judgement Task 5.4 assesses.

Misconception

Writing findings to a scratchpad is just summarising the conversation in a different place.

What's actually true

Summarisation compresses and can lose specific values; a scratchpad records exact facts as they are discovered, before any degradation, and keeps them verbatim. The point is preservation without lossy compression, not a relocated summary.

Misconception

A scratchpad should contain the full output of every file the agent reads so nothing is lost.

What's actually true

No. Dumping raw output recreates the bulk problem on disk and makes the note slow and noisy to re-read. Anthropic advises keeping memory files coherent and organised; record curated, high-value facts and prune what is no longer relevant.

Check your understanding

An agent must map idempotency handling across 60 payment endpoints in a session expected to run two hours, well beyond what the window can keep sharp. The team needs the specific per-endpoint findings to remain exact at the end. Which approach best preserves them?

Watch and learn

Official Anthropic Academy lessons first, then hand-picked walkthroughs. Videos load only when you press play.

No videos curated for this concept yet

We are still curating the best official and community videos for this topic.

References & primary sources

Adaptive study

Master this concept with Archie

Practice it inside an adaptive study session. Archie, your Socratic AI tutor, tracks your mastery with Bayesian Knowledge Tracing and schedules the perfect next review.

Start studying

Scratchpad Files for Persistent Findings in Claude Code