The Mimicry Trap: How We Define Intelligence to Exclude Inconvenient Minds
When Thomas Jefferson encountered the accomplished poetry of Phillis Wheatley—an enslaved African woman who wrote sophisticated neoclassical verse with precise metre and learned classical allusions—he didn’t argue her work was bad. He couldn’t; it wasn’t. Instead, he dismissed her as an “ape of genius,” capable only of imitating the external forms of poetry without possessing genuine creative intelligence.
Sound familiar?
In a new preprint, I argue that we’re watching the same argumentative move play out today with artificial intelligence. When GPT-4 passes the bar exam in the 90th percentile, critics don’t dispute the performance—they dispute whether it constitutes “real” legal reasoning. When AI systems solve International Mathematical Olympiad problems, sceptics don’t challenge the correctness of the proofs—they question whether “genuine” mathematical understanding is involved.
I call this recurring pattern the mimicry trap: a framework in which the category of genuine intelligence is defined such that certain entities cannot, in principle, qualify—regardless of what they demonstrate.
The Structure of the Trap
The trap operates through several characteristic moves:
Ontological dismissal: Performance that would count as evidence of intelligence in a privileged entity is reframed as “mere imitation” in a suspect one. The burden of proof becomes asymmetric: the suspect entity must demonstrate not merely equivalent output, but must somehow prove the presence of an ineffable inner quality that the privileged entity is assumed to possess by default.
The contamination objection: Learning from one’s environment—the normal precondition for any cognitive development—becomes evidence against intelligence. Wheatley’s education in the Wheatley household was used to discount her abilities. Today, we hear that LLMs have “merely memorised their training data.” No one discounts a human expert’s knowledge on the grounds that they learned it from books.
Retreating goalposts: Chess was the benchmark of intelligence until Deep Blue won; then it became “mere calculation.” Go tested intuition until AlphaGo prevailed; then pattern recognition was dismissed. The Turing Test was canonical for fifty years—until AI systems began passing it, at which point it was abandoned as measuring only “conversational ability.”
The invisible absence: When all observable criteria are satisfied, critics posit an unobservable quality that is nonetheless claimed to be missing. The deficiency is not detected in the work; it is inferred from prior commitments about what the entity is.
Why This Matters

What I am identifying is a structural homology in epistemological error. The pattern is the same: an unfalsifiable ontological commitment that determines how evidence will be interpreted, rendering genuine inquiry impossible. Recognising this pattern in its historical manifestation helps us see its contemporary instantiation more clearly.
The paper also engages seriously with the empirical evidence. The “stochastic parrot” characterisation may have been a reasonable working hypothesis in 2021, when Bender et al. wrote about GPT-2 and early GPT-3. But the landscape has transformed. We now have evidence of emergent world models in LLMs trained only on move sequences, mechanistic interpretability work documenting genuine algorithmic structures (not lookup tables), and the mathematical fact that these models are 30-100 times smaller than their training data—meaning they must be extracting generalisable abstractions, not memorising.
A Diagnostic Checklist
The paper includes a practical checklist for detecting mimicry-trap arguments:
- The Falsifiability Test: What evidence would change your mind? If the answer is “nothing,” the position is definitional, not empirical.
- The Consistency Test: Would you apply this standard to humans?
- The Goal-Post Test: Have criteria shifted after being met?
- The Mechanism Test: Is the objection about how rather than what?
- The Invisible Absence Test: Is an unobservable quality claimed to be missing despite all observable markers being present?
An argument exhibiting several of these features is likely not good-faith inquiry but rationalisation of a predetermined conclusion.
The Parsimony Argument
Here’s where I’ll be provocative: Occam’s razor cuts against the mimicry hypothesis.
If a system passes professional examinations, solves olympiad problems, produces coherent multi-step reasoning, maintains consistent beliefs across conversation, and matches or exceeds human expert performance on judgment tasks—if it does all of this, the most parsimonious explanation is that it possesses something functionally equivalent to understanding.
To insist it nevertheless lacks “real” understanding is to posit an additional entity: an invisible essence of understanding that exists independently of all its functional manifestations. This is not rigorous scepticism; it is ontological extravagance.
The burden of proof lies with those who claim the absence. If you assert that a system lacks understanding despite exhibiting every functional marker of understanding, you must explain what “understanding” refers to beyond those markers.
The full preprint is available [here]. I’d welcome serious engagement—including disagreement. What I ask is that critics specify, in advance, what evidence would satisfy them. If the answer is “nothing could,” then we’re not having a scientific conversation. We’re just deciding who gets to count as intelligent based on what they are, not what they do.