2026-05-04

I built Karpathy's memory paradigm in a weekend

Append-only frontmatter wiki. LLM-as-retrieval, not cosine. Different paradigm from vector search — they ship together.

#llm-wiki#memory#agents#karpathy#decision-memory

The leak that started this

There's a corpus on YouTube called Onchain AI Garage. I mined 86 transcripts from it last week. One of the findings stopped me:

Anthropic's internal memory uses LLM-as-retrieval over frontmatter — not cosine similarity over embeddings.

Different paradigm from Qdrant. Different paradigm from every RAG tutorial. The retrieval LLM scans frontmatter (via INDEX.md), picks which files to open, then reads only those. The frontmatter is the index. Make it dense.

I've been running Qdrant for semantic search over field reports. That's still the right tool for "find similar long-form content." But for agent decision memory — "what did chief-of-staff decide last week and why?" — frontmatter retrieval is faster, cheaper, and easier to audit.

The system, in 7 steps

Built it in a weekend. 7 phases:

Directory + frontmatter schema + README + 2 seed entries
Append-only writer (_system/write-entry.sh + wiki-write skill)
Index builder + monthly archiver (90-day superseded rolls into archive/YYYY-MM.tar.zst)
Query script + wiki-query skill (default model: claude --model haiku)
Nightly consolidator — chunked per-domain, SKIP detection, failures append to synthesis-failures.md
Chief-of-staff briefing reads INDEX.md stats + queries last 7 days
7-night soak

Total disk usage after the first month: under 5 MB. The 5GB ceiling I set isn't even close.

What broke on day 1

Two bugs, both stupid, both classic:

macOS cron skips fires when the Mac is asleep. Switched all three jobs to launchd. launchd handles wake-from-sleep — missed fires run when the Mac wakes. Cron just silently doesn't.

#!/usr/bin/env bash resolves to /bin/bash 3.2 under cron. That breaks declare -A (associative arrays). Fixed: #!/opt/homebrew/bin/bash for the Bash 5 binary. Also set -u → set -o pipefail because empty associative-array length checks tripped the strict mode.

Both bugs took longer to find than they took to fix. That's the law of agent infra.

Frontmatter that earns its slot

The schema I landed on has a few load-bearing fields:

id: 2026-05-02-agent-org-v23-compaction
type: decision        # decision | synthesis | incident | reflection
date: 2026-05-02
actors: [lucas, claude]
tags: [agentic-org, compaction]
summary: >
  Compacted active agent fleet from 17 to 10...
domain: meta
status: active
supersedes: [2026-04-23-agent-org-v2]
alternatives_considered:
  - id: keep-all-17-active
    rejected_because: Pre-revenue agents distract routing on every dispatch

alternatives_considered is the field that makes this a decision archive instead of a journal. Every entry now has the path not taken, with a reason. That's what makes the wiki queryable as "why did we go this way" rather than "what did we do."

When NOT to write to it

I had to fight the urge to write everything to it. The rule I landed on:

Real decision was made → write
Multiple inputs distilled into findings → write
Something broke or surprised → write (typed incident)
Pattern observed across multiple events → write (typed reflection)

NOT:

Raw tool-call traces (those go to artifacts/)
In-progress plans (those live in project folders)
Conversation context that won't be referenced again (let it die)

The rule keeps the wiki dense. Every entry pulls weight.

What you can steal

If you want this:

Pick a directory. Mine is ~/Developer/lucface/business-ops/llm-wiki/.
Define a frontmatter schema. Steal mine — id, type, date, actors, tags, summary, domain, status. Add alternatives_considered if you want it queryable.
Append-only. Never edit past entries — supersede them with a new one and link back.
Build an INDEX.md cron that lists every entry with its frontmatter.
Query via claude --model haiku -p "Read INDEX.md, then open the 3 most relevant entries for this question: ..."

That's it. Total cost to run mine: a few cents a day in inference. The compounding value is the part you can't price — every decision becomes searchable, and every agent gets smarter the longer the wiki runs.