2026-04-28

When 1M context actually beats agent retrieval

Long context isn't a substitute for retrieval. It's a substitute for some retrievals.

#context-window#retrieval#claude#agents

The temptation

Claude 4.7 has 1M context. The temptation is to dump everything in.

Don't. Or at least, don't always.

Single-document analysis: a 200K-word PDF, a meeting transcript, a full repo subdirectory. Retrieval over a single coherent doc adds latency for no quality.
Cross-references inside the same artifact: "find every place this function is called in this codebase" — the model SEES every callsite when you give it the whole file.
One-shot synthesis: "summarize these 20 emails into one digest" — way faster to drop them all in than to embed and retrieve.

Decision memory: querying "what did we decide about X six months ago" across 200 wiki entries. The wiki is 5MB. You don't want to load 5MB into every prompt.
Pricing checks: "is this proposed pricing consistent with our canonical pricing" — load only the canonical, not the entire business-ops repo.
Cross-project: "did we solve this in another repo" — agents that scope to specific repos beat dumping everything.

For Struvo decision agents:

Total tokens per decision: about 15-30K instead of 1M+. Latency: 3-5 seconds instead of 30. Cost: about 1/30th.

The 1M is still useful — for the rare case where the agent needs to expand scope. But "always load 1M" is wasteful by default.

The decision is per-call:

The mistake is using long context as a default. It's a tool, not a habit.