A Microscope for a Mind · Kolja Wawrowsky

Not just another local language model, but an instrument you can look inside — one that shows its reasoning, lets you watch every layer think, and holds still while you experiment on it. Built by someone who spent a career making the invisible visible.

Most of the AI you use is doubly out of reach. It runs somewhere else — your words travel to a data center and an answer comes back — and it’s sealed shut, a black box you couldn’t open even if it sat on your desk. You can use it. You can’t watch it work.

I spent most of my working life building the opposite kind of thing. For years I wrote the software inside confocal microscopes, and then I ran a microscopy lab — instruments whose entire purpose is to make the invisible visible, that you point at living tissue and watch it do what it does. A good instrument doesn’t just hand you a picture. It lets you observe, it lets you calibrate, it lets you intervene — and it tells you the truth about what’s really there.

Apertura is that instinct turned toward a new kind of specimen. It’s a complete, modern AI language model — a from-scratch rebuild of Google’s Gemma-4, one of the strongest open models available — that runs entirely on my own Mac and, more to the point, that I can look inside. Not another app that runs a model behind glass. An instrument built so the model can teach, be observed, and be experimented with.

It can teach

You learn a system most deeply by rebuilding it. Every layer, every calculation, in order, until it stops being magic and becomes something you actually understand — where it’s clever, where it’s fragile, what it truly costs to run. The working model is almost the by-product; the understanding is the point.

And the model can teach in a second sense: it can show its work. The newest models don’t just answer — they can reason first, privately, before committing to a reply. I built the instrument so that hidden monologue can be switched on and read. Handed the old riddle about a bat and a ball — the one most people get wrong on pure instinct — it reasons its way up to the trap and deliberately steps around it, out loud, where you can follow every move.

It can be observed

A language model “thinks” in dozens of layers, each one passing a transformed signal to the next. In an ordinary setup, all of that is sealed machinery. Here I can freeze the model in mid-thought and read out what every layer is doing — the way you’d image a cell at each stage of a process instead of only seeing the end result.

One part matters more than it sounds. I held the rebuild to an exact standard: given the same prompt, my version produces the same words as the original reference, one for one, until nothing separates them but the kind of microscopic rounding that even two official versions disagree on. In one test the two ran in perfect lockstep for ninety words before a single near-tie tipped them apart. That fidelity is calibration. It’s how you know that when you see something surprising inside the model, you’re seeing the model — not an artifact of your own instrument.

That same discipline is how I caught a smudge on the lens. The model’s vocabulary contains a particular invisible character, and Apple’s built-in text reader was silently deleting it every time the vocabulary loaded — a tiny, reasonable-sounding “cleanup” that was enough to make the wrong word come out. An instrument you can’t trust to be faithful isn’t an instrument; it’s a rumor. Finding it was exactly the work of chasing an artifact out of a microscope image: prove every obvious cause innocent until only the unlikely one is left standing.

It can be experimented with

The real reward is that it holds still while you experiment on it. One of the Gemma-4 models is built as a team of specialists — a hundred and twenty-eight of them — that wakes only the few it needs for each word. So I dismissed half the team, then half of what remained, and again, down to a handful, and watched the answers shift and fray. I’ve coarsened the model’s numerical precision step by step to find where its fluency breaks. I’ve turned its reasoning on and off and compared the two minds side by side.

These are experiments on a living system — fully repeatable, no electrodes, no ethics board, the whole specimen sitting on a desk and perturbable at will. And it isn’t one specimen but a family: one instrument plays the entire Gemma-4 line — a phone-sized model, a thirty-one-billion-parameter giant, the team-of-specialists design, a memory-frugal variant — switched by a single configuration file. A tray of related samples for the same microscope.

Why this matters

We have always understood minds — biological ones — by observing them, perturbing them gently, and watching what changes. The trouble is that brains are precious, fragile, and mostly opaque to us. Here is a different kind of mind: artificial, and not to be mistaken for the real thing — but complete, and completely open. You can watch it reason, freeze it mid-thought, read every layer, take pieces away and see what it loses, and run the same experiment a thousand times exactly.

That’s the project, underneath the engineering. Not just another model that answers questions. An instrument for a mind — one that teaches, that can be observed, that holds still to be experimented with — local-first, inspectable, mine, owing nothing to anyone. It’s the thing I’ve spent a career believing in: that you come to understand what you can finally see.

Apertura is open source. The whole instrument — every layer, every calculation — is on GitHub.

A note on method: much of this was built in close collaboration with an AI coding assistant — using today’s intelligence to understand and re-create the thing itself. The more I sit with that, the more it feels like exactly the right shape for the work.