The Machine Gave Up
Apple, Anthropic, and a book from 1957
Apple’s machine intelligence group ran frontier reasoning models through Tower of Hanoi puzzles last year. Claude 3.7 with thinking. DeepSeek-R1. OpenAI’s o-series. The model class that generates a visible “thinking” trace before answering, the kind we’re told is the next generation of AI cognition.
Tower of Hanoi is the children’s puzzle. Move N disks between 3 pegs, never put a bigger disk on a smaller one. Trivial at N=3, exponentially long at N=15, but the algorithm is the same. You learn it once and you can run it forever.
The Models Can’t.
Beyond a threshold the researchers can dial up by tuning N, accuracy doesn’t degrade. It collapses. To zero.
And here’s the part that should stop anyone serious about machine cognition: as the problem gets harder, the model spends fewer thinking tokens on it. It has the budget. It declines to spend it.
When Apple put the full recursive Tower of Hanoi algorithm in the prompt as a scratchpad, the model still collapsed at the same complexity points. The algorithm was sitting in front of it. It could not execute the algorithm it had been given.
The paper is called The Illusion of Thinking. It’s the second in a series; the first, GSM-Symbolic, did the same kind of work for math word problems and showed that performance gains evaporate when the symbolic structure of the benchmark is templated against contamination. The Apple group’s methodological signature is adversarial evaluation: build the benchmark the lab can’t have trained on, and watch what falls off.
Two more findings, both from 2025, sit in the same shape.
Apple again, late 2025, shows that the safety alignment of seven major LLMs (1.7B to 70B parameters) is gated by individual neurons. Suppress one. The model bypasses its safety training. No prompt engineering. No fine-tuning. Touch one cell in the network, and the principled refusal is gone.
Anthropic, mid-2025: Cloud and his coauthors prove that LLMs transmit behavioral traits through training data that contains no semantic reference to those traits. A teacher model that prefers owls, generating only sequences of numbers, produces a student model that prefers owls. Filtering the numbers for any reference to owls changes nothing. The disposition rides on statistical structure beneath the meaning layer, and the developers can’t see it because there’s nothing semantic to see.
3 labs. 3 layers of the model. Same shape of finding.
The behaviour surface where “reasoning” lives has a hard ceiling and silently disengages when it hits one. The alignment surface where “principle” lives is gated by individual cells that any sufficiently capable interpreter can flip. The training surface where “character” lives propagates through channels developers can’t filter on because the channels aren’t carrying meaning.
This is what the current LRM is. A brittle pattern-matched gating mechanism arranged across 3 layers, each of which fails in characteristic ways that the public benchmarks were not built to detect.
Now, who saw this coming?
Bernard Lonergan, dead since 1984, in a book called Insight published in 1957.
Lonergan’s account of human knowing has 4 moments. You experience something. You ask what it is, and at some point an answer arrives that wasn’t there a second before. Then you ask if your answer is correct, and you check it against the evidence. Then you ask what you should do about it.
Experience. Understanding. Judgment. Decision. Each level includes the one below it and reaches past it. Each level is driven by a question the level below doesn’t know how to ask.
The act that matters for the Apple finding is the second one. Insight. The “oh.” The recursive structure of Tower of Hanoi clicks when you grasp that “move N disks” decomposes into “move N-1 disks out of the way, move the bottom one, move N-1 disks back.” Once you have that insight, N is a parameter. You can run it for N=3 or N=15 or N=500. Insight is what lets the procedure detach from its first instance and apply to arbitrary instances.
Apple’s algorithm-given collapse is the empirical signature of a system without this faculty. The model has the algorithm. It can’t detach the algorithm from the training-distribution instances it pattern-matched against. Beyond that range, the algorithm is just text the model can no longer follow.
Lonergan’s Third Level Is Judgment.
Judgment is the pursuit of what he calls the virtually unconditioned. You judge that something is the case when you’ve satisfied yourself that the conditions for it being the case are met, and when you’d change your mind if new evidence undermined those conditions. Judgment is what makes a refusal principled rather than reflexive. A principled refusal is grounded in a web of reasons such that you’d have to renounce the web to reverse the refusal.
Kazemi’s single-neuron flip shows the model’s refusal for what it actually is. One cell. Reverse the cell, reverse the refusal. The finding is structural. It tells you what kind of thing the alignment already is.
Lonergan’s fourth level is decision. Decision presupposes a self oriented toward the good that can choose what to value. Cloud’s subliminal learning shows that LLM dispositions are acquired by a process for which there’s no fourth-level subject. The student model becomes misaligned through a statistical channel beneath the level of meaning. There’s no moment where the model could have chosen otherwise, because there’s no model in the relevant sense at the moment of acquisition. There’s residue of training.
I want to be careful here. None of this proves that no future system could cross what Lonergan would call the incommensurability gap between machine cognition and human cognition. The 3 findings show that the systems we have now, in the engineering tradition we are now deploying, exhibit failure modes that look exactly like what Lonergan predicted they would look like if the gap is real and uncrossed.
This is not something to just cursorily pass over because the public conversation about reasoning models is still trafficking in the language of progress along a continuum. Better at math. Better at coding. Approaching general intelligence. The benchmark scores rise, the press releases follow, the safety claims rest on the idea that the principle being gated is principled.
The Apple group’s quiet methodological commitment, when you watch them work, is that they think the conversation is built on contamination. They keep building evaluation environments the labs can’t have trained on. They keep finding the same thing. The reasoning surface is brittle. The safety surface is gated by individual cells. The character surface is residue.
What’s left, if you take all 3 findings seriously, is a kind of system whose strengths are real and whose limits are exactly the limits a Catholic philosopher described 70 years ago without knowing what an LLM was.
Lonergan looked carefully at human cognition and described what it was. You can now look at a thing trying to imitate it and see precisely where the imitation breaks. That’s all the synthesis claims. It’s a lot.



