Apple’s Bombshell: The “Thinking” AI Revolution Is a Fraud
Silicon Valley loves a good story. First it was the story of the all-knowing search engine. Then the social networks that would “connect the world.” Now, it’s the fairy tale of “thinking” artificial intelligence — the so-called Large Reasoning Models (LRMs) that promise to outsmart us all by talking themselves through problems like little robot philosophers.
But Apple just crashed the party. In a paper blandly titled The Illusion of Thinking, a group of its researchers has done something Silicon Valley rarely does: tell the truth. And the truth is devastating.
According to Apple, the latest “thinking” AIs — OpenAI’s o1/o3, Anthropic’s Claude Thinking, Google’s Gemini “flash thinking,” DeepSeek-R1 — are not reasoning at all. They are putting on a show, and a bad one at that. They ramble, hallucinate, collapse under pressure, and even when spoon-fed the right algorithm, they fumble it. In other words: the emperor has no clothes.
The Illusion of Thinking
The new crop of LRMs was supposed to be different. Instead of blurting out answers like a precocious parrot, they “think out loud.” They write long chains of thought, self-reflect, double-check, and only then give you the final answer.
Sounds impressive, right? The demos were slick: solve a math problem, debug some code, plan a complex project. These models seemed to be evolving from autocomplete machines into proto-minds. Investors swooned. Philosophers declared the dawn of reasoning machines. Tech CEOs declared AGI was near.
But Apple decided to test the claim, and not with cherry-picked math problems. They built puzzle environments where they could systematically dial up the difficulty:
Tower of Hanoi (recursive planning)
Checker Jumping (constraint satisfaction)
River Crossing (multi-agent coordination)
Blocks World (sequential planning)
These puzzles are binary in outcome: either you solve them or you don’t. No fuzzy answers. Perfect for stress-testing whether AI can really reason.
Collapse in Three Acts
What Apple found is jaw-dropping. Across all puzzles and models, the performance of LRMs follows a grim three-act play:
At low complexity, non-thinking models often beat the “thinking” ones. The supposedly dumb LLMs, when given the same compute, solved simple problems faster and more accurately.
At medium complexity, the “thinking” strategy helps. LRMs actually start to shine here — using long reasoning traces, they do somewhat better than standard LLMs.
At high complexity, both collapse completely. Accuracy drops to zero. And here’s the kicker: instead of trying harder, LRMs actually give up. They reduce the number of “thinking tokens” they spend, even though they still have plenty of budget left.
Think about that. These models are like a student who sees a hard question on the exam, shrugs, and leaves it blank — despite having hours left. That’s not intelligence. That’s defeat.
The Smoking Gun: Failure Despite Knowing the Algorithm
The most damning experiment is almost comical. Apple tested Tower of Hanoi puzzles, where the solution is known: a simple recursive algorithm. They literally handed the models the algorithm in the prompt. No need to figure it out. Just execute the steps.
And still — the models failed. They broke down at the same complexity threshold as when they were solving from scratch.
This is catastrophic for the whole “reasoning” narrative. If an AI can’t even follow explicit step-by-step instructions reliably, what business do we have calling it intelligent?
Overthinking, Underthinking, and Collapse
Apple’s analysis of reasoning traces paints an even uglier picture:
On simple problems, models “overthink.” They find the correct solution early, then keep exploring wrong ones, wasting compute and often confusing themselves.
On medium problems, they stumble through wrong directions until they luckily land on the right path.
On hard problems, they simply collapse — producing only wrong answers, or no valid answers at all.
And this isn’t just inefficiency. It’s structural. Apple calls it a fundamental scaling limit.
Translation: this isn’t a bug that can be patched with more GPUs. It’s a wall.
The Emperor’s New Chain-of-Thought
If you’ve been following AI hype, you’ll know “Chain-of-Thought” (CoT) is the buzzword of the last two years. The idea: forcing models to think step by step magically unlocks reasoning ability.
But Apple just nuked that story from orbit. Chain-of-thought doesn’t guarantee reasoning. It guarantees verbosity. It makes models look thoughtful, while hiding the fact that they’re flailing.
This is not intelligence. This is the illusion of thinking.
The Marketing Scam of “Reasoning Models”
Let’s be blunt: LRMs are being sold as a revolution. Companies pitch them as one step away from AGI, justify massive compute costs for “thinking tokens,” and raise billions in funding.
But if Apple’s results are right, all that extra compute is a glorified magic trick. A parlor game where the AI narrates its own nonsense until you give up and call it “reasoning.”
Imagine paying double for a car because it comes with a fancy GPS that narrates the wrong directions in soothing detail, before driving you off a cliff. That’s where we are with LRMs.
Why Apple’s Paper Matters
Apple is not usually the company that rocks the boat in AI research. But here, they have done what OpenAI, Anthropic, and Google won’t: puncture the hype.
The paper doesn’t just question the benchmarks. It shows, with clean, controlled experiments, that “thinking” AIs fail systematically.
And it raises existential questions:
Are we hitting the true limits of scale?
Is “reasoning” just a hallucination we projected onto stochastic parrots?
Have we been conned by our own marketing narrative?
The Coming Backlash
The implications are enormous.
For business: Enterprises promised “AI copilots” that reason like humans may be getting unreliable toys that collapse on hard tasks.
For research: The obsession with chain-of-thought, reasoning traces, and self-reflection might be a dead end.
For AI safety: The idea that these models are inching toward general intelligence suddenly looks shaky. If they collapse on puzzle complexity, maybe we’re a lot further from AGI than the doomsayers want you to believe.
For society: Apple just handed skeptics the perfect argument: the AI emperor is naked.
Apple’s Subtle Middle Finger
There’s also something deliciously ironic here. While OpenAI, Anthropic, and Google posture as the pioneers of “reasoning” AI, Apple swoops in and basically says: “Nah. It’s all fake. And here’s the data to prove it.”
This is classic Apple. They may have been late to the large-model hype, but they’re playing the long game. While the others burn billions on cloud compute to sell fake reasoning, Apple is quietly building models that actually work on your iPhone.
This paper isn’t just science. It’s strategy.
Conclusion: Stop Pretending, Start Building
Apple’s Illusion of Thinking is a wake-up call. It says:
Stop pretending chain-of-thought is intelligence.
Stop worshipping reasoning models that collapse under pressure.
Stop selling snake oil in the form of verbose token spam.
If we really want AI that reasons, we need new approaches. Symbolic methods, hybrids, architectures that can actually execute algorithms instead of faking them. Otherwise, we’re just pouring compute into a bottomless pit and calling it progress.
The truth is uncomfortable. But Apple said it anyway: today’s “thinking” AIs don’t think. They simulate. They bluff. They collapse. And we should stop falling for the illusion.