Fabian G. Williams aka Fabs

Fabian G. Williams

Principal Product Manager, Microsoft Subscribe to my YouTube.

Your Brain Forgets Most of Your Life. So Does Your AI Agent.

A TED talk on memory by neuroscientist Lisa Genova made me realize that the same cognitive failures humans experience — attention gaps, context loss, prospective memory failures — show up in AI agents I build for real organizations. Here is what I learned and what I built to fix it.

Fabian Williams

11-Minute Read

Split illustration showing a human brain with fading neural pathways alongside an AI circuit board with data streams — bridging biological and digital memory

Neuroscientist Lisa Genova explains why forgetting is normal — your brain filters out most of your day, loses context when you change rooms, and is terrible at remembering future intentions. I watched her TED talk and realized I had already encountered every one of these failures in AI agents I build for real organizations. Here is how the parallels work, what breaks in production, and what I built to fix it.

I Built a Knowledge Base That Writes Itself. Here Is What Andrej Karpathy Got Right.

Andrej Karpathy posted about using LLMs to build personal knowledge bases. I took his workflow, wired it into my Obsidian vault with Claude Code, and within an hour had 21 cross-linked wiki articles compiled from YouTube transcripts. Here is how it works and why it matters.

Fabian Williams

8-Minute Read

Obsidian graph view showing 21 cross-linked wiki articles compiled by Claude Code from raw transcripts

Andrej Karpathy tweeted about using LLMs to build personal knowledge bases — raw sources in, compiled wiki out, all in Obsidian. I implemented his entire workflow in one session using Claude Code skills. Four YouTube transcripts became 21 cross-linked wiki articles. The system now compiles new sources, health-checks its own consistency, and searches itself. It took an afternoon. It will compound forever.

How Do You Trust an Autonomous AI Agent? Evals Are the Answer.

I run an autonomous AI agent at home — 16 cron jobs daily. It says 'done' but did it actually do anything? I built an eval framework to find out. Here's what broke, what I learned, and why agent evals are fundamentally different from LLM evals.

Fabian Williams

10-Minute Read

OpenClaw Eval Dashboard showing mixed results across 9 dimensions — the honest picture after adding freshness, failure rate, and delivery gap scoring

I run an autonomous AI agent on a Mac Mini in my house. She handles 16 daily cron jobs — finances, email triage, outreach campaigns, device monitoring, morning briefings. The agent says “done.” But did it actually do anything? I built a 9-dimension eval rubric to find out. Along the way I discovered that my evals were broken, my agent was better than I thought, and the most important metric isn’t pass/fail — it’s whether a failure is your fault or the agent’s fault.

Recent Posts

Categories

About

Fabian G. Williams aka Fabs Site