Fabian G. Williams aka Fabs

Fabian G. Williams

Principal Product Manager, Microsoft Subscribe to my YouTube.

How Do You Trust an Autonomous AI Agent? Evals Are the Answer.

I run an autonomous AI agent at home — 16 cron jobs daily. It says 'done' but did it actually do anything? I built an eval framework to find out. Here's what broke, what I learned, and why agent evals are fundamentally different from LLM evals.

Fabian Williams

10-Minute Read

OpenClaw Eval Dashboard showing mixed results across 9 dimensions — the honest picture after adding freshness, failure rate, and delivery gap scoring

I run an autonomous AI agent on a Mac Mini in my house. She handles 16 daily cron jobs — finances, email triage, outreach campaigns, device monitoring, morning briefings. The agent says “done.” But did it actually do anything? I built a 9-dimension eval rubric to find out. Along the way I discovered that my evals were broken, my agent was better than I thought, and the most important metric isn’t pass/fail — it’s whether a failure is your fault or the agent’s fault.

Your Next Hire Should Be an AI — Here's How a Nonprofit Did It in Two Weeks

How MACONA went from a one-person operation to a team of two — without adding headcount. An autonomous AI executive assistant managing email, social media, newsletters, and donor outreach 24/7 on dedicated hardware.

Fabian Williams

6-Minute Read

OpenClaw Gateway Dashboard showing healthy status, 12 active sessions, and cron jobs enabled

We deployed an autonomous AI executive assistant for a nonprofit in under two weeks. She runs eight scheduled programs daily — morning briefings, social media, donor research, newsletter drafts, content scouting, and end-of-day digests — all without being asked. The CEO went from drowning in operational work to just making decisions. The same pattern works for any small organization: medical practices, restaurants, law firms, conferences, mom-and-pop shops.

53 Downloads, 114 Countries, Zero Marketing Budget: My First Month on the App Store

I built two iOS apps to solve my own problems. In the first month, people in 114 countries found them through App Store search alone. Here's what the data looks like, what it taught me, and why I think everyone should ship something.

Fabian Williams

7-Minute Read

Global reach chart showing 114 countries discovered the apps through App Store search

I built two iOS apps in about ten days total. Neither was planned as a product — both started as solutions to my own problems. In the first month on the App Store, people in 114 countries found them through organic search. No marketing budget. No ads. No influencer deals. Here is what the numbers look like, what they taught me, and why I think everyone with an idea should just build the thing.

Recent Posts

Categories

About

Fabian G. Williams aka Fabs Site