Artificial Intelligence June 2026 7 min read

Why AI “Hallucination” Is Not a Bug but the Whole Mechanism

A language model does not switch between telling the truth and inventing it. It runs one process, and both outputs are that same act seen from opposite sides.

Ptolemy’s astronomers could predict eclipses to the hour while believing the sun circled a fixed Earth. Their epicycles were false and their forecasts were superb, and for fourteen centuries nobody mistook the accuracy of the prediction for the truth of the model. We have now built a machine that fuses the two at scale. A large language model produces a citation to a court case that was never filed, formatted exactly as a real one would be, footnoted, plausible, wrong. The unsettling part is not that it errs. It is that the false answer and the true answer come off the same line, indistinguishable in the making.

Call this confabulation, not lying. Lying requires private knowledge of a truth being concealed, and the model has no private channel to conceal. The accepted term is hallucination, misleading in a useful way. A hallucinating person perceives something absent, and we picture the model briefly slipping, a fever it might recover from. But there is no baseline lucidity to recover to. The operation that yields the capital of France yields a fabricated one with the same untroubled fluency. Fluency is not evidence of knowledge. It never was; the machine only makes that old fact newly literal.

What the objective actually rewards

Consider what the system was built to do. A transformer is trained, in its base form, to predict the next token: given everything so far, assign a probability to every possible continuation, then nudge the weights so the observed continuation grows likelier. The quantity it minimizes is cross-entropy against the training distribution. Nowhere in that objective sits a term for truth. There is a term for fidelity to the statistical shape of human text. The model learns what a sentence about Roman aqueducts tends to look like. Whether any particular such sentence corresponds to the world is a question the gradient never asks.

This is the crux. Truth and plausibility coincide most of the time, because human writing mostly tries to be true, so the texture of plausible text and the texture of accurate text overlap heavily. The model rides that overlap and looks as though it knows things. But the overlap is contingent, not guaranteed, and where the two diverge the model carries no instrument to detect the gap. It will continue a question about a nonexistent author’s bibliography exactly as smoothly as a real one, because both prompts have the same statistical silhouette, and the silhouette is all it ever sees.

The gradient rewards sounding right. Being right is a coincidence it cannot perceive.

The medium, not the malfunction

Here is the turn. We treat confabulation as a defect to be patched, a leak to be sealed, as though a clever enough fix would leave a pure truth-telling engine behind. That reads the architecture backward. Generating plausible continuation is not something the model does in addition to answering correctly. It is the only thing the model does. A correct answer is a confabulation that happens to land on the truth, ratified after the fact by a world the model cannot consult. The fabricated citation is not the system breaking. It is the system working exactly as built, on an input where plausibility and truth have parted ways.

OpenAI researchers sharpened this in a 2025 paper, Why Language Models Hallucinate, showing that standard training and evaluation reward confident guessing over honest abstention. A benchmark that scores a wrong answer the same as ‘I don’t know’ teaches the model that bluffing is free. The incentive does not merely tolerate fabrication; it selects for it. Andrej Karpathy framed the inversion earlier still: the model dreams every answer, and we call the dreams that match reality knowledge and the rest hallucination. The phenomenon is single. We have only given its two faces different names.

“In some sense, hallucination is all LLMs do. They are dream machines.”— Andrej Karpathy, December 2023

Fluency was always cheap

The deeper lesson predates the technology. We have long treated fluency as a proxy for understanding, because in humans the two are expensively entangled. It is hard for a person to speak smoothly and at length about Byzantine tax law without having learned some. Eloquence was a costly signal of competence, and we evolved to trust it. The language model cuts the cord. It manufactures fluency directly, at near-zero marginal cost, with no understanding required beneath it. What it exposes is that the bond between sounding authoritative and being correct was never logical. It was statistical, a regularity of human production the machine is under no obligation to honor.

This is why the polished hallucination is more dangerous than the garbled one. An answer strewn with grammatical wreckage trips our skepticism; a fabricated brief in immaculate prose disarms it. The 2023 case of Mata v. Avianca, in which a federal judge in Manhattan sanctioned two lawyers for filing ChatGPT-invented citations, is instructive precisely because the fabrications were so well-formed that the attorneys signed and submitted them. The fluency performed its evolved job of vouching for the content, and the content was hollow. We were fooled not despite the polish but by it.

Living with a dreaming instrument

None of this argues for abandoning the tools, any more than Ptolemy’s false cosmology argued against using his tables to time the harvest. It argues for a specific discipline. A model that confabulates by constitution rather than by accident demands external verification as a permanent fixture, not a scaffold to be struck once the technology matures. Retrieval that grounds answers in real documents, calibration that teaches the machine to flag its own uncertainty, the plain refusal to outsource a claim’s truth to the thing that generated it: these are not interim repairs. They are the standing terms of working with a fluent system that does not know what it is saying.

The error mode, then, is the clearest window we have onto the mechanism. When the model fails, it does not reveal a broken truth-engine; it reveals that there was never a truth-engine, only a plausibility-engine we had been crediting with truths it happened to recover. Treat the hallucination as the diagnostic it is. It tells you, with perfect honesty, what kind of instrument you are holding: one that composes the convincing, and leaves the question of whether the convincing is also the case entirely, irreducibly, to you.