insights, processes, and behind-the-scenes from the minds at not a square straight to your inbox.
subscribe and never miss a creative spark.
Two brains: why I run most of my AI on a laptop, not the cloud
Most people think using AI means sending every task to the most powerful model money can buy.
It doesn't.
The frontier models are extraordinary at judgement — weighing nuance, writing, reasoning across messy context. But most of what a business actually runs on AI isn't judgement. It's grunt work. Classifying a comment as positive or negative. Pulling five fields out of an invoice. Scoring a list of items against a rule. Tagging, extracting, sorting.
That work doesn't need a genius. It needs a reliable pair of hands. And a reliable pair of hands can run on the laptop already sitting on your desk — for free.
The system: two brains, one router
I split every AI workload into two brains.
The local brain is a small open model running on my own machine (Ollama, a quantised model, no internet round-trip). It handles the high-volume, low-judgement tasks: classification, extraction, scoring, first-pass tagging. It costs nothing per run, it never rate-limits me, and the data never leaves the building.
The cloud brain is the frontier model. It handles the small share of work that genuinely needs judgement: the final narrative, the tricky edge case, the thing a client will actually read.
In between sits a router — a few lines of logic that decide which brain a task belongs to. Bulk and mechanical goes local. Nuanced and customer-facing goes to the cloud.
That's the whole pattern. It isn't clever. It's just deciding, deliberately, not to pay frontier prices for forklift work.
The numbers
Here's the part that makes operators sit up.
On one of my own pipelines, the scoring and classification step ran entirely on a cloud model. The bill landed around $50 a month and crept up every time volume grew. I moved that exact step to a local model. Same job, same quality for that task.
The new cost: roughly $5 a month — basically the electricity.
| | Before | After | |---|---|---| | Where the bulk work runs | Cloud (frontier model) | Local (open model) | | Monthly cost | ~$50, rising with volume | ~$5, flat | | Data leaving the machine | Every request | None | | Rate limits | Yes | No |
A ~90% cut, and the saving gets bigger as volume grows, because the local brain doesn't charge per request. The cloud brain still earns its keep — it just does the 10% of the work where it's worth the money.
Why this matters
Three reasons, in order of how often they bite.
Cost that scales the wrong way. Per-request pricing is fine at low volume and brutal at high volume. If your AI bill grows in lockstep with your business, you've built a tax on your own growth. Moving the bulk work local breaks that link.
Privacy. Some data shouldn't leave your machine — customer details, financials, anything sensitive. A local model means the bulk processing of that data never touches someone else's server. For some businesses that's a nice-to-have. For others it's the difference between being allowed to use AI at all.
Control. No rate limits, no surprise pricing change, no outage on someone else's status page taking your pipeline down at 6am.
Here's the thing — none of this is anti-cloud. The frontier model is still the best tool in the box for work that needs a brain. The mistake is using it for the work that needs a pair of hands.
What's next
If you're running anything on AI at volume, do one audit this week: list every place you call a paid model, and mark each one judgement or grunt work. The grunt-work rows are your savings. Most of them can move to a local model with no drop in quality, because the task was never hard — it was just frequent.
You don't need a bigger model. You need to stop paying a specialist to do the filing.
.avif)
.avif)
.avif)