Vibes vs. evidence

You can see the AI age everywhere but in the productivity statistics.

Robert Solow said the same thing in 1987, about computers. Computing capacity had increased a hundredfold over two decades, IT spending was accelerating across every sector, and measurable productivity growth had been cut in half. The technology was obviously real. The numbers hadn't caught up.

Nearly forty years later, same scene, different actors. The major hyperscalers plan to spend somewhere north of $600 billion on AI infrastructure this year. Goldman Sachs calculates that AI contributed "basically zero" to U.S. GDP growth in 2025. One chief economist invoked Solow directly: "AI is everywhere except in the incoming macroeconomic data."

The gap between what we're spending and what we're measuring isn't a mystery. It's a pattern.

The pattern

Every general-purpose technology follows the same arc. It arrives. Early adopters demonstrate stunning possibilities. Capital floods in. Narratives form around what the technology will do, far ahead of what it has done. Then there's a period where the narrative and the evidence exist in separate realities. Sometimes that period lasts years. Sometimes decades.

Economists call it the productivity J-curve. A new technology enters an economy and measured productivity initially dips. Not because the technology doesn't work, but because the complementary investments required to make it work are massive, slow, and invisible to GDP accounting. Process redesign. New org structures. Training people to actually use the thing. The spending shows up immediately. The returns don't. One research team estimated that adjusting for intangible investments that national accounts miss, total factor productivity was nearly 16% higher than official measures suggested. The gains were there. The measurement systems couldn't see them.

Electrification followed this arc over forty years. Factories first replaced central steam engines with central electric motors. Same layout, different power source. Productivity barely moved. The breakthrough came a generation later, when factories were redesigned from the ground up around individual motors at each machine. By the 1920s, manufacturing productivity growth had more than doubled, with electrification accounting for roughly half. But the technology itself had been available for decades.

Computing compressed the same trajectory. In a study of large U.S. firms, the full benefits of IT investment didn't appear for five to seven years, and only when accompanied by organizational changes that had nothing to do with the technology itself. The productivity surge arrived after 1995, when output per hour roughly doubled its prior rate.

The AI productivity gap isn't evidence that the technology doesn't work. It's what the gap years look like from the inside. And the gap years are when the most expensive mistakes get made, because the absence of evidence gets filled by narrative.

The narrative economy

When measurement lags technology, stories fill the vacuum. In 2026, the stories are moving markets.

In February, a researcher published a 7,000-word scenario memo on Substack about a potential global intelligence crisis. Explicitly labeled a scenario, not a prediction. By Monday, IBM shares had fallen 13%, its worst single-day drop since 2000. Several blue-chip stocks fell more than 6%. Axios reported days later that narrative volatility has become a risk category in its own right. Blog posts and research memos are moving billions in market capitalization.

This isn't new, either. Every technological revolution follows a two-phase cycle: an installation phase, where speculative capital floods into new infrastructure and narratives drive valuations far ahead of fundamentals, and a deployment phase, where the technology is broadly adopted and real productivity gains materialize. The turning point is almost always a crash.

The numbers today echo the pattern. AI startups command median valuations of 25 to 30 times revenue. Venture capital has concentrated hard: AI startups now absorb more than 60% of all VC dollars, up from roughly a quarter five years ago. Meanwhile, about $2 trillion in software market capitalization has evaporated over the past year, with a concentrated wipeout in early 2026 taking another $285 billion. The logic: if AI agents can do the work of a hundred employees, you don't need a hundred software seats. But the agents don't reliably do the work yet. The seat reductions haven't materialized at scale. The market is pricing in a future that hasn't arrived.

Who benefits from the gap

The perception-reality gap isn't just a market phenomenon. It's an economy. And it has structural beneficiaries whose incentives align with maintaining the gap rather than closing it.

Nvidia posted $120 billion in net income last fiscal year selling GPUs. TSMC fabricates virtually all advanced AI chips at 45% margins. This is the picks-and-shovels model, as old as the Gold Rush. Most miners went broke. The people selling shovels got rich. It didn't matter whether the miners found gold.

Consulting firms occupy a similar position. The large ones are booking billions in AI engagements annually, and AI-related work now drives a significant share of their revenue. They get paid for strategy and implementation regardless of whether the implementations deliver value.

Venture capital has its own structural dependency on the narrative. Inflated expectations raise valuations. Higher valuations enable larger rounds. Larger rounds generate larger fees. One peer-reviewed paper put it plainly: value is being created "on a bet of transformation" by "creating the illusion that AI futures are inevitable."

The technology is real. The GPUs are real. The models are real. But the economy that has formed around the narrative of AI is larger and faster-moving than the economy formed around the results of AI. The people who profit most from the narrative have the least incentive to ask whether the results have caught up.

The fluency trap

If the gap were just a market phenomenon, it would be expensive but survivable. Markets correct. But the gap has penetrated the organizations making the spending decisions. And there, it reinforces itself.

A survey of 5,000 workers found that over 40% of executives claim AI saves them eight or more hours per week. Two-thirds of non-management staff reported under two hours. Or none. Separately, more than half of CEOs say they've gotten "nothing out of" their AI investments. Nearly nine in ten executives in another large study saw no change in productivity measured as sales per employee. Despite all of this, those same firms predict meaningful improvement over the next few years.

The most striking finding I've come across: a randomized controlled trial with experienced open-source developers. Before the study, developers predicted AI would make them 24% faster. After the study, they still estimated a 20% improvement. The measured result was that they were 19% slower with AI than without. The perception of benefit persisted in direct contradiction of the evidence.

I think about that study a lot. Because the root cause isn't technical. It's perceptual. We conflate fluency with competence. When something communicates articulately, we attribute understanding to it. Researchers demonstrated decades ago that statements in easy-to-process formats are judged as more likely true, regardless of accuracy. And as far back as the 1960s, people insisted a simple pattern-matching program was "understanding" them, despite knowing it was a machine.

These models are the most fluent communicators most people have ever encountered. They produce articulate, confident prose on any topic at any length in seconds. We are cognitively wired to interpret that fluency as understanding. It isn't.

The micro gap

There's a version of this dynamic that doesn't show up in surveys or GDP calculations. It lives in the daily experience of working with these tools.

I use frontier AI models every day in production software development. The latest generation has a performance envelope that's wider, not tighter. The ceiling is genuinely remarkable. When these models are working well, the output exceeds what I'd expect from any individual contributor. But the floor has dropped. Not in a human way. In an alien way. Illogical approaches to straightforward problems. Reasoning errors that would be unusual from even an unstructured human thinker.

Large language models, including the reasoning-focused variants, don't perform genuine logical reasoning. They replicate reasoning-like text. Adding a single irrelevant clause to a math problem caused performance drops of up to 65% across the top models. These aren't edge cases. They reveal what the system actually is.

The mechanism maps to a known failure mode in machine learning: when you optimize too aggressively against a proxy reward signal, proxy performance keeps climbing while true quality peaks and then declines. Recent work has shown that reinforcement learning doesn't teach models new reasoning patterns. It biases output toward reward-likely paths. The result is higher scores on familiar benchmarks and lower performance on novel problems.

This is the macro gap in miniature. Excellent language is not excellent reasoning. The models can articulate first-principles thinking with perfect clarity. They just don't reliably do it.

Make sure it's plugged in

There's a discipline that cuts through vibes at every scale, and it's almost embarrassingly simple.

In recording studios, engineers would spend forty-five minutes under a mixing desk troubleshooting signal routing. Swapping cables. Testing components. Tracing circuits. Before someone would finally ask: is it plugged in? And too many times, it wasn't. The problem was at the foundation, and everyone had skipped past it to reason about complexity.

First-principles debugging. Exhaust the simple checks before diagnosing complex causes. Resolve variables before building theories. Confirm what you know before reasoning about what you don't.

Before concluding AI is transforming your organization, check the measurement. Are you measuring outcomes or activity? Are you asking executives or the people doing the work? Before concluding the technology is broken, check your methodology. Are you comparing to the right baseline? Are you accounting for complementary investments that haven't happened yet?

What happens next

History doesn't offer a prediction, but it offers a shape.

The Solow paradox resolved. Computing did transform productivity. It just took a decade of complementary investment before the gains appeared. Electrification followed the same arc over four decades. The dot-com bubble destroyed trillions in investment capital, but internet usage continued growing through the crash. The technology delivered. The business models built on premature narratives didn't. The infrastructure companies that seemed invincible lost 80 to 90 percent of their value. Their former executives now warn they see the same red flags with AI.

AI is probably real in the way electrification was real and computing was real and the internet was real. The question isn't whether. It's when, and at what cost, and who bears that cost during the gap years. Right now, the cost falls on workers whose productivity claims don't match executive narratives. On software companies being repriced on stories rather than evidence. On organizations making irreversible structural decisions based on vibes rather than measurement.

The discipline that resolves this isn't complicated. It's the same discipline it's always been. Check whether it's plugged in before you tear apart the machine.