8 principles I steal from Karpathy on every build
Build from scratch. Radical minimalism. Verifiability unlocks automation. The program.md is the most consequential file.
Why Karpathy
I watch a lot of AI talks. Most of them are pitches. Karpathy's are not. He shows the work, names the tradeoffs, and tells you when his old advice was wrong. That's the bar.
Here are 8 principles I lifted from his recent talks and applied to the agent fleet I run.
1. Build from scratch first
Don't import LangChain. Don't import LlamaIndex. Build the agent loop yourself first, in 200 lines. You'll understand what's actually happening. THEN reach for libraries when you need to skip the parts you've already understood.
Applied: I wrote the chief-of-staff briefing aggregator from scratch. 180 lines of bash. I know exactly what it does and exactly where it would break.
2. Radical minimalism
Every dependency is a future debugging session. Every framework is a future migration. Default to less.
Applied: This whole hub is Next.js + Drizzle + a hand-rolled Markdown renderer. No MDX-React. No remark/rehype pipeline. The renderer is 130 lines and I own all of them.
3. Verifiability unlocks automation
You can't safely automate what you can't verify. Build the eval harness FIRST. Build the diff-against-golden FIRST. Then build the thing.
Applied: Every Struvo agent has an eval dataset. Before we ship a model swap, we run the new model against the golden set and compare scores. No "vibes-based" deployments.
4. The program.md is the most consequential file
Karpathy's claim: in agent systems, the markdown spec that defines the agent's job is the highest-leverage file in the repo. More than any model choice. More than any framework.
Applied: Every agent in my fleet has a 100-300 line markdown spec at ~/.claude/agents/<name>.md. Those files are versioned, reviewed, and the only place I edit when I want the agent to behave differently. Model upgrades don't change them.
5. The eval IS the artifact
You don't ship an agent. You ship an agent + its eval set. The eval set is the contract.
Applied: For Struvo we maintain eval-datasets/ per pipeline. Transcription accuracy. Entity extraction. Report quality. Without the eval, the agent is a guess.
6. Two-phase memory
Append-only logs + nightly consolidation worker. The system "dreams" while you sleep — distills, consolidates, files away.
Applied: That's literally the LLM Wiki I built. Log entries land hot, consolidation cron at 3 AM rolls them up, archive at month-end.
7. Local + cloud, not local OR cloud
Local models for the cheap, frequent decisions. Frontier models for the rare, hard ones. The sweet spot is hybrid.
Applied: ~90% of agent decisions go through local Gemma on the PC. The remaining 10% — synthesis, judgment, planning — go to Claude Opus or Sonnet. Total cost is a fraction of running cloud-only.
8. Show the failure cases
Every agent post should have a "what broke" section. Without it, you're selling something. With it, you're teaching.
Applied: Look at any of my posts. Every one has a "what broke" section. That's not stylistic — it's an enforcement of this principle.
What you can steal
The meta-move is: don't follow Karpathy's specific advice. Follow his pattern of stating principles, applying them, and owning the failures.
If you can write a list like this for the system you're building — what you believe, why you believe it, where you've been wrong — your agent fleet will compound faster than mine.