2026-04-28
When 1M context actually beats agent retrieval
Long context isn't a substitute for retrieval. It's a substitute for some retrievals.
#context-window#retrieval#claude#agents
The temptation
Claude 4.7 has 1M context. The temptation is to dump everything in.
Don't. Or at least, don't always.
When 1M context wins
- Single-document analysis: a 200K-word PDF, a meeting transcript, a full repo subdirectory. Retrieval over a single coherent doc adds latency for no quality.
- Cross-references inside the same artifact: "find every place this function is called in this codebase" — the model SEES every callsite when you give it the whole file.
- One-shot synthesis: "summarize these 20 emails into one digest" — way faster to drop them all in than to embed and retrieve.
When agent retrieval still wins
- Decision memory: querying "what did we decide about X six months ago" across 200 wiki entries. The wiki is 5MB. You don't want to load 5MB into every prompt.
- Pricing checks: "is this proposed pricing consistent with our canonical pricing" — load only the canonical, not the entire business-ops repo.
- Cross-project: "did we solve this in another repo" — agents that scope to specific repos beat dumping everything.
The hybrid I run
For Struvo decision agents:
- Each agent has a scoped retriever — wiki only, code only, pricing only
- The retrieval pulls 5-10 relevant entries
- THOSE get loaded into the model context with the question
- The model answers from the loaded context
Total tokens per decision: about 15-30K instead of 1M+. Latency: 3-5 seconds instead of 30. Cost: about 1/30th.
The 1M is still useful — for the rare case where the agent needs to expand scope. But "always load 1M" is wasteful by default.
What you can steal
The decision is per-call:
- Single coherent artifact, one question? Long context.
- Many artifacts, one question? Retrieve, then context.
- Same question, many artifacts? Embed once, retrieve forever.
The mistake is using long context as a default. It's a tool, not a habit.