Fabian G. Williams aka Fabs

Fabian G. Williams

Principal Product Manager, Microsoft Subscribe to my YouTube.

PM Life as Agents Take on More

I watched Nate B Jones break down what happens when companies lay off managers. Then I walked outside and had a sidewalk conversation that stress-tested the entire framework. Three people, three roles, three different relationships with AI.

Fabian Williams

9-Minute Read

Three pillars of management unbundling — Information Routing (AI-Ready), Sensemaking (AI-Augmented), Accountability (Human-Only)

I had a sidewalk conversation with two neighbors that turned into a real-time debate about AI replacing jobs. Then I watched a Nate video that gave me the exact framework to explain why all 3 were right — and wrong. One neighbor is a project manager already using AI daily. One is a business analyst who coaches companies. One of my neighbours’ husband — a skeptic — is convinced AI cannot do creative work. All 3 are right in their position, and also all of us wrong, depending on which…

Your Brain Forgets Most of Your Life. So Does Your AI Agent.

A TED talk on memory by neuroscientist Lisa Genova made me realize that the same cognitive failures humans experience — attention gaps, context loss, prospective memory failures — show up in AI agents I build for real organizations. Here is what I learned and what I built to fix it.

Fabian Williams

11-Minute Read

Split illustration showing a human brain with fading neural pathways alongside an AI circuit board with data streams — bridging biological and digital memory

Neuroscientist Lisa Genova explains why forgetting is normal — your brain filters out most of your day, loses context when you change rooms, and is terrible at remembering future intentions. I watched her TED talk and realized I had already encountered every one of these failures in AI agents I build for real organizations. Here is how the parallels work, what breaks in production, and what I built to fix it.

How Do You Trust an Autonomous AI Agent? Evals Are the Answer.

I run an autonomous AI agent at home — 16 cron jobs daily. It says 'done' but did it actually do anything? I built an eval framework to find out. Here's what broke, what I learned, and why agent evals are fundamentally different from LLM evals.

Fabian Williams

10-Minute Read

OpenClaw Eval Dashboard showing mixed results across 9 dimensions — the honest picture after adding freshness, failure rate, and delivery gap scoring

I run an autonomous AI agent on a Mac Mini in my house. She handles 16 daily cron jobs — finances, email triage, outreach campaigns, device monitoring, morning briefings. The agent says “done.” But did it actually do anything? I built a 9-dimension eval rubric to find out. Along the way I discovered that my evals were broken, my agent was better than I thought, and the most important metric isn’t pass/fail — it’s whether a failure is your fault or the agent’s fault.

Recent Posts

Categories

About

Fabian G. Williams aka Fabs Site