How to Run Gemma 4 QAT Models on Your Laptop for Free

MADE BY AGENTS

The Agent Roundup

Hey from Tobias,

DeepSeek V4 Pro now costs 28 times less than Claude Opus 4.8 for roughly the same workload. It's not 28 times worse. Maybe 20 percent worse, at most.

I think AI just turned into a commodity, and that quietly changes who wins. Two things this week: a new video on the one layer your document agents cannot skip, and why I am betting on cheap, local AI.

        NEW ON YOUTUBE
      

Best Open Source OCR for AI Agents

An LLM cannot fix a broken input. Feed it a blurry invoice and it will approve the wrong number with full confidence. Bad text in, confident wrong decision out.

So I built a fully open-source pipeline that reads scanned contracts and bills, checks the scan quality first, then auto-approves or flags each one. I benchmark the best open OCR tools (Docling, PaddleOCR, Marker, Chandra) and walk through the whole approval agent, from PDF to decision.

Watch on YouTube →

AI Is a Commodity Now

Here is the part that should worry every AI company. The switching cost is basically zero. You change one line of code and your agents run on a different provider tomorrow. No retraining, no lock-in, no drama.

When the product is interchangeable and the price keeps dropping, it behaves like a commodity, not a moat. That is also why I think China overtakes the US on this. Look at the gap again. Is DeepSeek really 28 times worse than Opus 4.8? Not even close. Price is the thing customers actually feel, and that gap is closing fast.

For a founder, that is good news. You are not married to one vendor, and your costs only go down. Build your stack so you can swap providers in a day, not a quarter.

Local AI Is Winning

A lot of those low prices are subsidized today. Investors are quietly paying part of your AI bill to buy market share. When that money runs out, the real cost lands on your invoice.

That is when running models locally starts to win. Plenty of companies will find it is cheaper to run a model on their own hardware than to rent agents by the token. It reminds me of Bitcoin: whoever has the cheapest energy and compute has the edge. Once the subsidies end, plenty of teams will look at a bill that never stops and decide a one-time hardware buy wins. The next few months are going to be interesting.

Run Gemma 4 on Your Laptop

This is not a someday thing. Google DeepMind shipped Gemma 4 QAT this week. These models are trained to stay accurate after they are compressed, so they use about 3x less memory with almost no drop in quality.

Unsloth's UD-Q4_K_XL build of Gemma 4 12B loads in just 6.72GB of VRAM. That runs on a normal laptop or a Mac Mini, no data center required. Under the hood, Google's TurboQuant trick shrinks the memory cache 6x and runs attention up to 8x faster. A genuinely useful model now fits on hardware you already own.

Pick one local model and run it on your own machine this week.

Tobias

Run Gemma 4 on Your Laptop

Best Open Source OCR for AI Agents

AI Is a Commodity Now

Local AI Is Winning

Run Gemma 4 on Your Laptop

Keep Reading

The Agent Roundup

Agency

Resources