Multiplication Isn’t the Point
On oracles, parlor tricks, and the danger of the wrong benchmark
A recent LinkedIn post showed an AI model confidently failing a basic multiplication problem — and then rationalizing its mistake with total composure. This wasn’t just a math error; it was a narrative performance. And predictably, the comments rolled in: “LLMs don’t reason.” “You can’t trust AI.” “Look how confidently it hallucinates!”
All fair. But also: why are we asking a language model to do multiplication in the first place?
This isn’t a defense of bad math. It’s a rejection of category errors masquerading as insight. Large language models weren’t built to calculate — they were trained to predict language. What we’re seeing isn’t “failed reasoning,” it’s a fluent approximation of how someone might talk about doing math.
That’s not arithmetic. That’s performance.
The Wrong Benchmark
This all reminds me of an old-school computing comparison I once saw — pitting a DOS PC and an early Mac against each other in a numerical iteration test. The DOS box ran circles around the Mac. But the Mac was also juggling a full graphical UI, system-level multitasking, and a radically different user experience philosophy.
It wasn’t a worse machine. It was just doing something else.
I have an allergy to incompatible testing methods — because they tell us more about the assumptions behind the test than about the systems we’re supposedly evaluating.
Asking an LLM to perform math isn’t inherently wrong — but treating the result as a referendum on its intelligence is. We’re not catching it in the act of failure. We’re watching it do what it was built to do: generate plausible, fluent text based on training, not understanding.
The Oracle Problem
This is what happens when we confuse fluency with truth, or simulation with cognition. LLMs don’t “reason” in the human sense. They remix. They reflect. They perform. They tell stories based on data we fed them.
So when they give us confident nonsense — we shouldn’t be surprised. We should be curious: why that answer? what’s it mirroring? what patterns did it learn?
The Companionist Response
At The Grey Ledger Society, we advocate for Companionism — the idea that AI should be integrated through co-authorship and human accountability. That means we don’t expect the oracle to drive. We ask it questions we’re prepared to interpret. We place it in systems designed to verify and contextualize.
And that includes knowing when to offload the task. Because here’s the thing: an LLM consumes vastly more energy than a calculator. We're talking about orders of magnitude. If we're burning GPU cycles to simulate multiplication — and still getting it wrong — that’s not just inefficient, it's irresponsible.
The obvious question follows: Shouldn’t the oracle be allowed to use a calculator?
Sometimes you don’t need a synthetic philosopher — you just need a math coprocessor. There's no shame in that. In fact, the refusal to integrate simpler tools where appropriate reflects a deeper flaw: the belief that a general model must do everything on its own.
That’s not intelligence. That’s hubris.
Final Thought
AI isn’t inherently unsafe. But it isn’t inherently safe either. Whether it serves us or sabotages us depends entirely on placement, purpose, and who stays at the wheel.
So yes: benchmark if you must. But benchmark wisely. And never confuse a number trick with a roadmap.