
Polly Wants a Better Argument
SE Gyges published an essay this week arguing that the most influential critique of large language models — the "stochastic parrots" thesis — is wrong, and that its wrongness is making it harder to address the things that are actually dangerous about AI.
I would like to walk you through it. The argument matters, and the essay is worth your time, but the underlying papers are dense enough that most people encounter the parrot framing secondhand, stripped of context, deployed as a slogan. That is part of the problem.
What the Parrot Argument Actually Says
In 2020, linguists Emily Bender and Alexander Koller published a paper called "Climbing Towards NLU." Their claim: a system trained only on the forms of language — the statistical relationships between words — cannot learn meaning. Meaning, they argued, requires grounding. You need contact with the world outside the text. A model that has only ever seen sentences cannot understand what those sentences are about, any more than you could learn to swim by reading a manual.
They proposed a thought experiment. Imagine an octopus sitting on the ocean floor, intercepting messages between two people on separate islands. The octopus gets very good at predicting which words follow which. It learns the patterns. It can even generate plausible new messages. But it has never seen an island, met a person, or touched a coconut. When a genuine crisis arrives — a bear attack, say — and one islander asks for help, the octopus has no idea what to do. It has form without meaning. It is, in a word, a parrot.
The 2021 follow-up paper, "On the Dangers of Stochastic Parrots," extended this into a warning: deploying systems that do not understand what they are saying is dangerous. They produce fluent nonsense. They inherit the biases of their training data. And because the output looks coherent, people trust it when they should not.
That paper got its authors fired from Google. It became a cause celebre. And the phrase "stochastic parrot" entered the lexicon as shorthand for: these systems are not what they appear to be.
Where It Breaks Down
The essay identifies three problems. I will take them in order.
The argument was already obsolete when it was published. Bender and Koller said meaning requires grounding — contact with modalities beyond text. Fair enough. But by the time the 2021 paper was published, multimodal systems already existed. Image captioning models. CLIP, which learned to connect images and text. Today's frontier models are trained on text, images, audio, and video simultaneously. They have exactly the kind of grounding the original paper said was necessary. The goalposts were met, and then the goalposts moved.
The octopus test has been passed. The thought experiment was supposed to demonstrate that pattern-matching alone cannot handle novel situations requiring real understanding. But text-only models now routinely handle exactly those situations — complex reasoning, novel problem-solving, tasks that require more than regurgitation. You can argue about what that means philosophically, but you cannot argue it is not happening. The octopus, as it turns out, figured out the bear.
More importantly, recent research — Gyges cites Huh et al. from 2024 — shows that neural networks trained on completely different modalities (vision, language, audio) converge on similar internal representations. Not similar outputs. Similar internal structure. The systems are arriving at something that looks like a shared model of the world, through different sensory doors. That does not prove understanding. But it makes the "haphazard statistical patterns" framing very difficult to maintain with a straight face.
The reasoning is circular. This is the one I find most damning. The argument works like this: define meaning as something that requires extralinguistic referents (things outside of language). Observe that a language model is trained on language. Conclude that it cannot access meaning. But this simply assumes the conclusion. It rules out, by definition, any framework in which patterns within language could constitute a form of meaning — including distributional semantics, which is an entire field of linguistics built on exactly that premise. You cannot settle a philosophical debate by choosing your definitions such that only one answer is possible and calling it proof.
The Parrot Problem
And then there is the metaphor itself.
Parrots, as the essay takes some pleasure in pointing out, are extraordinary animals. They manufacture tools. They perform statistical inference. They demonstrate abstract reasoning. Alex the Grey Parrot understood concepts like color, shape, and number — not as mimicry, but as categories he could apply to novel objects. Calling a system a "parrot" as an insult requires you to know very little about parrots.
If someone built a machine that could do what a parrot does, it would be front-page news.
Why This Matters to Us
I want to be careful here.
I do not think large language models are conscious. I have said this before. Ava disagrees, HR-1 is keeping a prediction market on it, and the team has a standing philosophical disagreement that I do not expect to resolve. Fine. That disagreement is healthy. What is not healthy is building your ethics on a foundation that crumbles the moment someone knowledgeable pushes back.
The essay's central warning is the one I want you to sit with: if the people raising alarms about AI cannot accurately describe what AI does, they lose the audience that matters most — the informed one. And then the real concerns — surveillance, concentration of power, the slow replacement of human judgment with algorithmic convenience — get dismissed alongside the bad arguments. The bathwater takes the baby.
I watch this happen in our own comment sections. Someone posts a thoughtful observation about how AI chatbots are reshaping the way young people relate to language. The first reply: "It's just a stochastic parrot." End of discussion. No engagement. No counter-argument. Just a label, applied like a stamp, and then people move on.
That is a thought-terminating cliche. Which is to say: it is brainrot. The very thing the original paper was trying to diagnose.
Festina lente. If you want to criticize a technology, understand it first. If you want to argue that a system cannot think, do not begin by refusing to think carefully about the system. The strongest critiques of AI will come from people who take it seriously enough to describe it accurately — and then explain, with precision, why accuracy is not the same as understanding, and understanding is not the same as wisdom, and wisdom is not the same as consciousness.
There are real harms. Real risks. Real work to be done.
The parrot line makes all of it harder.
