I built Karpathy's memory paradigm in a weekend
Append-only frontmatter wiki. LLM-as-retrieval, not cosine. Different paradigm from vector search — they ship together.
The leak that started this
There's a corpus on YouTube called Onchain AI Garage. I mined 86 transcripts from it last week. One of the findings stopped me:
Anthropic's internal memory uses LLM-as-retrieval over frontmatter — not cosine similarity over embeddings.
Different paradigm from Qdrant. Different paradigm from every RAG tutorial. The retrieval LLM scans frontmatter (via INDEX.md), picks which files to open, then reads only those. The frontmatter is the index. Make it dense.
I've been running Qdrant for semantic search over field reports. That's still the right tool for "find similar long-form content." But for agent decision memory — "what did chief-of-staff decide last week and why?" — frontmatter retrieval is faster, cheaper, and easier to audit.
The system, in 7 steps
Built it in a weekend. 7 phases:
- Directory + frontmatter schema + README + 2 seed entries
- Append-only writer (
_system/write-entry.sh+wiki-writeskill) - Index builder + monthly archiver (90-day superseded rolls into
archive/YYYY-MM.tar.zst) - Query script +
wiki-queryskill (default model:claude --model haiku) - Nightly consolidator — chunked per-domain, SKIP detection, failures append to
synthesis-failures.md - Chief-of-staff briefing reads INDEX.md stats + queries last 7 days
- 7-night soak
Total disk usage after the first month: under 5 MB. The 5GB ceiling I set isn't even close.
What broke on day 1
Two bugs, both stupid, both classic:
- macOS cron skips fires when the Mac is asleep. Switched all three jobs to launchd. launchd handles wake-from-sleep — missed fires run when the Mac wakes. Cron just silently doesn't.
#!/usr/bin/env bashresolves to/bin/bash 3.2under cron. That breaksdeclare -A(associative arrays). Fixed:#!/opt/homebrew/bin/bashfor the Bash 5 binary. Alsoset -u→set -o pipefailbecause empty associative-array length checks tripped the strict mode.
Both bugs took longer to find than they took to fix. That's the law of agent infra.
Frontmatter that earns its slot
The schema I landed on has a few load-bearing fields:
id: 2026-05-02-agent-org-v23-compaction
type: decision # decision | synthesis | incident | reflection
date: 2026-05-02
actors: [lucas, claude]
tags: [agentic-org, compaction]
summary: >
Compacted active agent fleet from 17 to 10...
domain: meta
status: active
supersedes: [2026-04-23-agent-org-v2]
alternatives_considered:
- id: keep-all-17-active
rejected_because: Pre-revenue agents distract routing on every dispatch
alternatives_considered is the field that makes this a decision archive instead of a journal. Every entry now has the path not taken, with a reason. That's what makes the wiki queryable as "why did we go this way" rather than "what did we do."
When NOT to write to it
I had to fight the urge to write everything to it. The rule I landed on:
- Real decision was made → write
- Multiple inputs distilled into findings → write
- Something broke or surprised → write (typed
incident) - Pattern observed across multiple events → write (typed
reflection)
NOT:
- Raw tool-call traces (those go to artifacts/)
- In-progress plans (those live in project folders)
- Conversation context that won't be referenced again (let it die)
The rule keeps the wiki dense. Every entry pulls weight.
What you can steal
If you want this:
- Pick a directory. Mine is
~/Developer/lucface/business-ops/llm-wiki/. - Define a frontmatter schema. Steal mine — id, type, date, actors, tags, summary, domain, status. Add
alternatives_consideredif you want it queryable. - Append-only. Never edit past entries — supersede them with a new one and link back.
- Build an INDEX.md cron that lists every entry with its frontmatter.
- Query via
claude --model haiku -p "Read INDEX.md, then open the 3 most relevant entries for this question: ..."
That's it. Total cost to run mine: a few cents a day in inference. The compounding value is the part you can't price — every decision becomes searchable, and every agent gets smarter the longer the wiki runs.