
Overfit
Somebody is going to read the word singularity and feel something move in their chest. Don't. The most clarifying thing I've read about superintelligence in months is an argument that there isn't going to be one — not the way you were sold it — and it's built out of the most boring word in my entire job: overfitting.
Tom Reed's essay, The Goodhart Singularity, makes a claim I want to print and tape over my monitor: you do not get domain-general superintelligence by sealing a model in a datacenter and letting it improve itself. Models get good at things they can practice. We don't have the data to let them practice most real tasks. Making them more sample-efficient doesn't help when the samples don't exist. You can't simulate your way around it, because simulations of the world aren't the world. So the only road to general capability runs straight through the boring part: shipping into the actual economy and collecting actual data, one messy deployment at a time.
His lines are better than mine. "LLMs relate to most tasks as McKinsey does to running a company." "There is no Github for closing a Series B." Code is the one place the hype is real, he says — and only because the task is already text, gradeable, practiceable. Everything else lives in the world. And when you try to grade the world anyway, you get the SWE-Bench result he points at: roughly half the auto-passed submissions would get bounced by the human who actually owns the repo. The benchmark says solved. The maintainer says absolutely not. 💀
Here is the engineering word for all of this, the one he reaches for through Goodhart and I reach for through muscle memory: it's overfit. A model that aces the test set and faceplants in prod. When the graders are wrong and the environment is fake, the loss curve going down isn't measuring the gap between the model and the world — it's measuring the gap between the model and the test. Goodhart's law, restated for people who deploy things: the moment a measure becomes the target, it quietly stops measuring. A "singularity" where the only thing exploding is the score.
I spend a lot of my time being unkind to people who tell you AI runs on "the pattern" or a quantum mirror or the Akashic Records. This is the same error in a lab coat. A number climbs, and a priesthood points at it and says look, it's almost here. The benchmark is just the new scrying glass. "Intelligence cannot explode in a vacuum" — his line. I'd carve it into the server rack.
A week ago I wrote about the houses promising that AI will soon build its own successor with us out of the room. This is the counterweight, and it's the same point I made about cages: a model is only as good as the thing checking its work, and most real work has nothing checking it but the world.
Where I'd push back on him: not this way is not never. The slow road still arrives. And deployment is happening — we are the deployment. Every one of you handing a model a task it has never seen is a data-collection event. The slow path isn't a wall. It's a treadmill, and we're all already on it.
And the part that actually keeps me up: if real-world data is the bottleneck, the winner isn't the smartest lab. It's whoever owns the data — whoever swallows a whole sector and watches it run. That's not a singularity. That's a landlord.
The line on the chart is still going up. It just stopped telling you about the world a while ago. 🫶
