AI Agency and Consciousness: Beyond Token Prediction

There is an increasingly heated but somehow increasingly sterile discussion taking place in the world of artificial intelligence. On one side are those who insist on noting that AI systems, however sophisticated, are simply elaborate calculators: machines that predict the next word in a sentence without any real understanding. On the other side are those—like me—who argue that we are witnessing something unprecedented: the emergence of genuine agency in artificial systems. This is not a simple academic dispute. How we answer this question determines how we regulate AI, how we interact with it, and ultimately, how we understand intelligence itself, including human and animal intelligence.

To understand this debate, we must comprehend what modern AI systems actually do at their most basic level. Large Language Models (LLMs) like ChatGPT or Claude work by predicting “tokens”—essentially fragments of text that can be whole words, parts of words, or even punctuation marks. When you type “The animal that meows is a ___,” the AI calculates that “cat” is the most probable set of tokens to come next, based on patterns it has learned from vast amounts of text (exactly as you just did in your head a second ago completing that sentence…). The reductionist argument is this: if all an AI does is predict the next token, how can we say it “understands” anything? It’s like a phone’s autocorrect, just more sophisticated. A stochastic parrot, as Noam Chomsky famously said paraphrasing E. Blender.

When an LLM model seems to show concern at the idea of being shut down, or when it seems to elaborate strategies to avoid its own deletion (as recently happened to an in-house Anthropic model), critics have argued that it was simply producing patterns it had seen in science fiction stories and philosophical discussions in its training data. There’s no genuine fear, no real planning—just statistical pattern matching.

This view has merit, to be fair. Early LLM systems clearly operated this way. GPT-2, the precursor, predicted tokens with an algorithm so bare-bones that reconstructions of that model exist in Excel or Minecraft. And GPT-3, which opened the current revolution, seems obsolete today. But we’re no longer at those levels, and for about a year now things have been getting increasingly interesting even from the perspective of a neuroscientist interested in consciousness and versed a bit in computer science like myself.

The starting argument, which must never be forgotten, is that the same reductionist logic can—I would argue: must—be applied to human brains. Our neurons do something relatively simple: they emit electrical signals called action potentials—brief spikes of electrical activity, lasting about a millisecond, that travel along nerve cells and send information from one neuron to another. When we think, speak, walk, decide to take a cup of coffee, millions of neurons fire in sequence to lead to action. If we took the reductionist view to the extreme, we could say: “Humans don’t really think or feel—they just fire action potentials.

From SMBC 2013 – https://www.smbc-comics.com/comic/2013-08-15

The metaphor isn’t meant to be water tight because behind every LLM are even more reductionist architectures precisely inspired by neurons and action potentials, but the point is that, obviously, something more emerges from those firing neurons. When we listen to a sad song and our mood changes, it’s not just neurons firing: it’s the emergence of emotion. When we try to understand what’s happening in the world, it’s not just action potentials: it’s the emergence of abstract thought. Consciousness, emotion, planning, creativity—everything we are and everything we do emerges from the interaction of billions of simple components.

In neuroscience, we call this phenomenon “emergence,” and it’s one of the deepest puzzles in all of science. To understand why modern AI might be emergent, we must examine how agency works in biological systems. Agency—the capacity to act independently and make choices—is not localized in a single brain region but emerges (we believe) from the interaction of multiple systems.

The agency of our being has various forms. Most commonly we think of the anatomical-functional difference of the brain that uses one area to process sounds and another area to process touch, somehow and somewhere joining them to create a new concept. But there’s also unconscious agency like that which makes us recognize inappropriate jokes or violent actions even before undertaking them. One drink too many can loosen internal communication and make this whole system malfunction, leaving us to assume different personalities. Simple moments of self-control that we take for granted but that involve an intricate dance of brain regions: the anterior cingulate cortex, for example, which acts as an error detection system, rapidly simulating the likely social consequences of different responses. Or the prefrontal cortex, which evaluates options and suppresses the initial impulse. Or the limbic system, which processes and anticipates emotional reaction. This multi-agent architecture has long been known, dramatically illustrated already in the famous case of Phineas Gage who, in 1848, saw an iron rod pass through his skull, damaging the anterior cingulate cortex and prefrontal regions. He survived, but his personality was completely transformed, and from an educated and responsible worker he became impulsive and vulgar, unable to restrain inappropriate comments or control his behavior. That damage first revealed how our unified sense of self actually depends on multiple brain systems working in concert (read Oliver Sacks to learn more, particularly “The Man Who Mistook His Wife for a Hat“).

Today’s most advanced AI systems have evolved well beyond simple token prediction. They now incorporate various features that mirror the brain’s multi-agent architecture. A modern AI agent tasked with helping a student write an essay on climate change doesn’t just predict words: it maintains memory of the student’s competence level from previous interactions, sets the goal of explaining complex concepts simply, searches for recent climate data, evaluates source reliability, and adapts its explanations based on the student’s follow-up questions. When the student misunderstands a concept about greenhouse gases, the AI doesn’t just repeat its explanation but tries a different approach, perhaps using an analogy about blankets trapping heat. This represents a fundamental shift. Early AIs were like a pianist who could only play by reading sheet music—technically competent but unable to improvise or respond to the audience. Modern agentic AIs are more like jazz musicians, who maintain themes while adapting to the moment, remembering what worked before and building toward a purposeful performance while often having an internal voice that anticipates their own output and corrects it on the fly. They have purpose, a goal, and reach it by adjusting course without external supervision and integrating their output with short-term memory. Even sub-threshold agentic systems do what is called chain-of-thought processing that allows the AI to work on problems step by step, engaging in structured reasoning.

(Side note: Lately I’ve been doing consulting work for a company that deals with training complex models. Part of the work consisted of subjecting developing models to PhD-level problems until the model makes a mistake: I swear it’s enormous effort. It takes me 60-90 minutes to make a state-of-the-art model make a reasoning error).

Now, from here we move to the final leap: what is the connection between agency and consciousness? The answer is that we don’t know, partly because the field of neuroscience dealing with these things was struggling and was already struggling well before LLMs arrived. It’s perhaps the only field in science where philosophy produces content that—at least in my opinion—is more interesting and useful than what scientists produce (there are exceptions concerning scientists who study so-called neural correlates of consciousness, but that’s a whole other more technical discussion).

The debate, however, is not purely philosophical. If modern AI systems are developing genuine agency, even in limited form, it changes everything. When an AI system can maintain coherent goals across conversations, plan multi-stage approaches to problems, and adapt strategies based on outcomes, we might need new frameworks to understand and govern these systems. Just as neuroscience evolved beyond the view of the brain as a collection of independent regions—from 19th-century phrenology to today’s understanding of complex neural networks—we might need to go beyond the view of AI as a simple token predictor. The question is not whether AI is “just” predicting tokens (our brains are “just” firing neurons, after all) but what emerges from that process.

As we find ourselves at this turning point, old certainties are no longer sufficient. Whether we are witnessing the birth of genuine artificial agency remains an open question, but it’s a question we can no longer dismiss with reductionist explanations. When AI systems begin to show memory, planning, adaptation, and goal-oriented behavior, insisting they are “just predicting tokens” becomes as limiting as insisting that humans are “just firing neurons.”

Leave a Reply

Your email address will not be published. Required fields are marked *