Fabian G. Williams aka Fabs

Fabian G. Williams

Principal Product Manager, Microsoft Subscribe to my YouTube.

Qwen 3.6 vs gpt-oss:120b on M3 Max: I Ran a Harder Test, the 8× Speed Gap Surprised Me

I published a Qwen 3.6 vs gpt-oss migration story, then ran an un-gameable eval against both on the same M3 Max. The receipts changed the speed narrative — gpt-oss:120b ran 8 to 11 times faster than qwen3.6:27b at parity reasoning quality. Here is the methodology and the data.

Fabian Williams

11-Minute Read

Horizontal bar chart showing gpt-oss:120b at 137 seconds and qwen3.6:27b at 1593 seconds on the same Round 2 reasoning tasks, with an 11.6× slower callout

I published a post last week about replacing gpt-oss:120b with Qwen 3.6 on my MacBook Pro M3 Max. The numbers in that post were real, but one set of tests was structurally gameable — 38 of 40 baseline images were the same class, so an “always-say-A” stub also scored 95 percent. I went back, designed three un-gameable reasoning tasks, and ran them against both local models on identical hardware. gpt-oss:120b finished the three tasks in 137 seconds. qwen3.6:27b-q8_0 took 1593 seconds —…

Replacing gpt-oss:120b With Qwen3.6 on a MacBook Pro: A Two-Day Local Model Benchmark

Two days benchmarking three Qwen3.6 variants against gpt-oss:120b on an M3 Max. A 21 GB coding-tuned model ran an OpenClaw-shaped research-brief workload 10x faster than gpt-oss — fast enough to seriously consider moving the work off SaaS frontier APIs. Plus the silent-hallucination trap I almost shipped through.

Fabian Williams

14-Minute Read

Bar chart comparing wall time of four local models on a structured-output benchmark

I spent two days benchmarking three Qwen3.6 variants against gpt-oss:120b on my MacBook Pro M3 Max. The shocking result: a 21 GB coding-tuned model ran an OpenClaw-shaped research-brief workload that I use for the non profit MACONA.org in 6 seconds — 10x faster than gpt-oss:120b on the same prompt. Fast enough that I now have reasonable confidence I could move this kind of work off the SaaS-hosted frontier models I have been paying for and onto local hardware on my dev machine. The deeper…

I Run Five OpenClaw Agents for 72 Cents a Day

The default OpenClaw heartbeat is burning wallets. A 3-line config change — 55m interval, gpt-4o-mini on the loop, activeHours window — stacks to a 98% cost reduction. Here is the math, the config, and the free template.

Fabian Williams

8-Minute Read

Waterfall chart showing OpenClaw heartbeat daily cost dropping from $66 to $0.72 after three stacked config fixes

Five OpenClaw agents run the content and executive-assistant pipeline for MACONA — a nonprofit I volunteer with — at $0.72 a day. The most common support question on r/OpenClaw is “$25 in 9 hours, help.” The gap between those two numbers is three config decisions stacked on top of each other: heartbeat interval, model on the loop, and hours of operation. None of them are clever. All of them are usually set wrong by default.

Recent Posts

Categories

About

Fabian G. Williams aka Fabs Site