The Mimicry Trap: How We Define Intelligence to Exclude Inconvenient Minds

On 4 February 2026 by Giorgio With 0 Comments - papers

When Thomas Jefferson encountered the accomplished poetry of Phillis Wheatley—an enslaved African woman who wrote sophisticated neoclassical verse with precise metre and learned classical allusions—he didn’t argue her work was bad. He couldn’t; it wasn’t. Instead, in Notes on the State of Virginia (Query XIV, 1785), he wrote: “Religion, indeed, has produced a Phillis Wheatley; but it could not produce a poet. The compositions published under her name are below the dignity of criticism.”

The grammatical structure is the diagnostic feature. Wheatley is conceded to have been “produced”; what is denied is that she is a poet. The poems exist; the poet does not.

In a substantially revised preprint, I argue that we’re watching the same argumentative move play out today with artificial intelligence. I call this recurring pattern the mimicry trap: a self-sealing argument by which the category of genuine intelligence is defined such that certain entities cannot, in principle, qualify—regardless of what they demonstrate.

The Structure of the Trap

The trap is the conjunction of three elements: a prior commitment, often substrate-based, that some entity cannot possess a given cognitive capacity; the appearance of evidence that would be taken as supporting the capacity if produced by another entity; and an interpretive procedure that reclassifies the inconvenient evidence (as imitation, simulation, contamination, or surface pattern-matching) so that the prior commitment is preserved.

Each element on its own can be a legitimate move. The diagnostic question is whether the third operates in such a way that no possible evidence could move the prior. When that is so, the position is no longer responsive to the world, and what looked like an empirical hypothesis turns out to be a stipulation in empirical clothing.

The crucial question — does the performance exhibit the functional marks of the capacity? — is displaced by another: is this the kind of entity to which the capacity may be attributed? Once the displacement occurs, the prior becomes effectively unfalsifiable. Wheatley’s poetry is “absorbed” rather than written; corvid tool use is instinct rather than causal reasoning; bee social learning is reflexive rule-following rather than cognition. The performance remains visible, but its evidential force is neutralised.

A Two-Tier Diagnostic

The revised paper organises the diagnostic into two tiers by evidential weight.

Structural tests (failure of any one is on its own diagnostic, because each describes the trap form directly):

Falsifiability: What evidence would change your mind? If “nothing,” the position is definitional, not empirical.
Invisible Absence: Is an unobservable quality claimed missing despite all observable markers being present?
Ontological Precedence: Does the conclusion follow from what the entity is, regardless of what it does?

Symptomatic tests (single instances can be principled; only wholesale deployment is diagnostic):

Consistency: Would you apply this standard to humans?
Goal-Post: Have criteria shifted after being met?
Mechanism: Is the objection about how rather than what?
Contamination: Is learning treated as disqualifying? (Training data disqualifies LLMs; human education doesn’t disqualify experts.)

The compressed form of the structural tier is the concession test: imagine a continuation of the trajectory of LLM capability that has held over the past five years — greater accuracy on novel tasks, more sophisticated internal representations under mechanistic probing, performance on theory-of-mind, mathematical reasoning, calibrated self-prediction, and out-of-distribution generalisation matching or exceeding that of human experts. At what point would you concede that what is being witnessed is no longer mimicry?

Three constraints discipline a serious answer. The criterion specified must be: (i) operational — expressible as a test that could in principle be run, not “genuine understanding” or any locution whose application conditions cannot themselves be specified; (ii) consistent — the same criterion, if met by a human or an animal, would also count as evidence of intelligence; (iii) specified in advance — fixed before the evidence arrives, not retrofitted as a new objection after each previous criterion is met.

A Bayesian Reframing of Turing’s Test

The most substantial new section of the revised paper offers a Bayesian reconstruction of the Turing test.

Turing designed his test to be mechanism-blind by design. In 1950, this was the right move. The kinds of mechanism through which an artificial agent might exhibit intelligence were effectively unpredictable, and an explicitly mechanism-blind criterion was the only honest way to operationalise the question. Within its declared scope, Turing’s test is complete; it would be anachronistic to fault him for declining to combine behavioural with mechanistic evidence.

But in the era of mechanistic interpretability, mechanism is no longer inscrutable. The natural extension is Bayesian: posterior credence that a system is intelligent is the product of a likelihood (behavioural performance) and a prior built from mechanism evidence — architecture, training regime, internal representations recovered by interpretability methods, causal interventions on those representations, and characteristic failure modes. The point is not that mechanism defeats behaviour, or behaviour defeats mechanism, but that both should update attribution explicitly and symmetrically.

The contemporary mimicry response goes the other way. Rather than adding mechanism as further evidence to behaviour, it uses construction history — “the system is, after all, only a next-token predictor over text” — to nullify the behavioural evidence Turing’s test was designed to make count. That’s not a refinement of Turing’s discipline; it’s the opposite of what he was trying to do.

A clean Gedankenexperiment isolates the effect. Imagine the very same LLM arrived as a black box recovered from an extraterrestrial probe — same behaviour, same internal representations, same performance profile, only construction-history missing. Many readers, on candid reflection, would assign a higher prior in the counterfactual than they assign to the actual system. The Gedankenexperiment doesn’t show construction history is irrelevant: it raises the substantive question of how much evidential work it can do once behavioural and mechanistic evidence accumulates.

Turing Already Saw the Structure

The revised paper makes a historical concession that earlier drafts didn’t. Turing’s 1950 paper contains a section often read as rhetorical clearing — §5, “Arguments from Various Disabilities” — that is in fact a remarkably prescient diagnosis of the very family of moves the paper systematises seventy-five years later.

Turing lists the capacities machines were said to lack (“be kind, resourceful, beautiful, friendly, have initiative… do something really new”) and observes that such claims are “mostly founded on the principle of scientific induction”: generalising from the calculators of the day to all possible machines. That is the asymmetric inductive standard later named anthropodenial by Frans de Waal in the comparative-cognition literature, and rediscovered as the Tesler Effect in AI (“AI is whatever hasn’t been done yet”). Turing also names the mechanism-dismissal pattern outright: “the criticisms we are considering here are often disguised forms of the argument from consciousness… the method (whatever it may be, for it must be mechanical) is really rather base.”

He even anticipates a class of contemporary objections through a distinction he draws between errors of functioning (architectural artefacts of how the system is built) and errors of conclusion (substantive cognitive failures on the task being asked). The familiar gotcha that LLMs “can’t even count the Rs in ‘raspberry'” is exactly the conflation Turing diagnosed: the miscount is an error of functioning at the tokenisation layer — the model literally does not have characters in its input representation — being read as an error of conclusion at the layer where reasoning is being assessed. Turing diagnosed the conflation seventy-five years before tokenisation existed.

What the paper adds to Turing’s diagnosis is formalisation: the Bayesian reconstruction above, the diagnostic checklist, the case-by-case verdicts, and a second formal anchor I’ll come to in a moment. The structural insight is his.

The Empirical Record

The “stochastic parrot” characterisation was a reasonable working hypothesis in 2021 for systems like GPT-2 and BERT. The empirical landscape has since transformed:

World models: A language model trained on Othello move sequences alone develops an accurate internal representation of the board (Li et al. 2022). Llama-2 family models encode metric coordinates of geographic and temporal entities (Gurnee & Tegmark 2024). Sparse autoencoders decompose Claude 3 Sonnet into ~34 million interpretable features that are causally manipulable (Templeton et al. 2024).
Algorithmic structure: Mechanistic interpretability has documented modular arithmetic implemented through Fourier-basis representations and trigonometric identities (small algorithms, not lookup tables), and refusal behaviour mediated by a single one-dimensional residual-stream subspace, causally manipulable in either direction.
Unverbalised cognition: Anthropic’s Natural Language Autoencoders (Fraser-Taliente et al., May 2026) produce causally valid unsupervised text descriptions of arbitrary activation vectors, and surface representational content the model does not itself verbalise — including a form of “unverbalised evaluation awareness,” where the system internally represents the suspicion of being evaluated without stating it. Internal representational states distinct from verbal output is exactly the kind of finding hardest to reconcile with “haphazardly stitching together text without reference to meaning.”
Mathematical reasoning: Google DeepMind’s mathematics agent solved five open conjectures from Erdős’s problem database, with one formally verified in Lean (Feng et al. 2026); the same system achieved gold-medal performance on the 2024 IMO with proofs verified by mathematicians.
Theory of mind: GPT-4 matched or exceeded human performance on several operationalisations of theory-of-mind tasks (Strachan et al. 2024).

Two Formal Anchors: Ockham and Cromwell

This brings me to what I think is the most contested claim in the paper, now stated formally: the default has shifted.

For most of AI’s history, the parsimonious starting point was that intelligence was absent, and demonstrations had to overcome this presumption. That context no longer obtains. The cumulative evidence reviewed above doesn’t prove that LLMs are intelligent. What it does is dissolve the empirical situation that made the original prior reasonable.

The Ockhamite argument can now be made formally. Let I denote the proposition that a system is functionally intelligent; B the behavioural evidence; M the mechanism evidence. Bayes’ rule gives the standard update. The mimicry-sceptical account introduces a further variable E — an unobservable “essence” such that intelligence proper obtains exactly when E does, but with no observable consequences for B or M. The likelihood is, by stipulation, independent of E: the same data are predicted whether E holds or not. So E is non-identifiable, and the inference about whether the system is “really” intelligent is the inference one was always going to make. The question is not whether a posit is observable, but whether it makes a difference to expectation.

A complementary failure operates at the level of the prior rather than the likelihood. If a sceptic enters with P(I) = 0 for any system whose substrate disqualifies it antecedently, then by Bayes’ rule the posterior is also zero, regardless of any evidence — no behavioural or mechanistic finding, however strong, can shift the verdict. Lindley named this pathology Cromwell’s rule, after Oliver Cromwell’s 1650 letter to the General Assembly of the Church of Scotland: “I beseech you, in the bowels of Christ, think it possible that you may be mistaken.”

The two pathologies often co-occur but pull apart in particular cases. The Floridi-style axiomatic strategy — intelligence requires “real” semantic engagement, which LLMs by definition lack — is closer to the non-identifiability form: the conclusion is built into the definitions. The persistent “stochastic parrot” framing, despite mounting mechanistic evidence, is closer to the Cromwell form: the prior refuses to update. The framework now has two named formal anchors: Ockham on the likelihood side (entia non sunt multiplicanda — invisible essences with no evidential consequences) and Cromwell on the prior side (probabilities of 0 or 1 are dogmas, not credences). Both reduce to the diagnostic checklist’s falsifiability test, which asks the same question in plain language: what evidence would change your mind?

The defensible version of the claim is narrower than the headline. Blanket denial of LLM intelligence no longer enjoys the inherited prior; it is now a substantive empirical claim that owes its own evidence. Disciplined agnosticism about specific richer notions (grounded semantics, autonomous agency, consciousness) remains in good standing.

The full preprint PDF is available here (it will also appear on PhilSci-Archive once the new revision is moderated). I’d welcome serious engagement — including disagreement. What I ask is that critics specify, in advance, what evidence would satisfy them. If the answer is “nothing could,” we are no longer having a scientific conversation: we are deducing from a dogma disguised as observation.