The AI productivity audit

Published
February 18, 2026
Category
Operations
Read time
15 minutes

The gap between the pitch and the data

A task that took three weeks reduced to 37 minutes. That's the number Axios CTO Dan Cox shared in February 2026, describing what happened when his team turned AI agent teams loose on a real project. Orders of magnitude faster. At Spotify, co-CEO Gustav Soderström told investors that the company's best developers "have not written a single line of code since December." They direct AI, review its output, and approve. The role changed from coder to conductor.

These stories are real. They are also the highlight reel.

A randomized controlled trial by METR gave 16 experienced open-source developers 246 real tasks, half with AI tools and half without. The developers using AI took 19% longer. Before the experiment, they predicted AI would make them 24% faster. After finishing, they still believed AI had sped them up by 20%. The perception gap is the finding: people consistently overestimate AI's impact on their own productivity.

The ManpowerGroup 2026 Global Talent Barometer surveyed roughly 14,000 workers across 19 countries. Regular AI usage jumped 13% year over year, reaching 45% of workers. Confidence in using the technology fell 18%.

More people using it. Fewer people confident it's helping. That's not a productivity story. It's a warning sign that most teams are moving faster without checking whether faster means better.

How AI creates more work

Researchers Aruna Ranganathan and Xingqi Maggie Ye at UC Berkeley's Haas School of Business spent eight months embedded inside a 200-person US tech company. They conducted roughly 40 in-depth interviews between April and December 2025. Their finding ran counter to the standard AI pitch: AI tools didn't reduce workloads. They intensified them.

Workers moved faster. They took on wider task ranges. They extended their hours voluntarily. Nobody mandated any of this. The AI adoption at the company was optional.

"You had thought that maybe, oh, because you could be more productive with AI, then you save some time, you can work less. But then really, you don't work less. You just work the same amount or even more."

The researchers documented a feedback loop with four stages. AI sped up individual tasks. Faster output raised expectations, both from managers and from the workers themselves. Higher expectations drove more reliance on AI to keep pace. And more reliance expanded the scope of what each person was expected to handle. The loop tightened with each cycle.

Engineers started spending more time reviewing AI-generated code submitted by colleagues. The code came faster, which meant more of it to review. Review quality became the bottleneck that speed created.

Software engineer Siddhant Khare described the same dynamic from the inside: "I shipped more code last quarter than any quarter in my career. I also felt more drained than any quarter in my career." His diagnosis was precise: "AI reduces the cost of production but increases the cost of coordination, review, and decision-making. Those costs fall entirely on the human."

The Faros AI report, covering over 10,000 developers across 1,255 teams, put numbers on it. High-AI teams completed 21% more tasks and merged 98% more pull requests. But PR review time increased 91%. Bugs per developer rose 9%. PR sizes grew 154% larger. At the company level, the correlation between AI adoption and actual performance metrics evaporated.

Apollo chief economist Torsten Slok summed up the macro picture: "AI is everywhere except in the incoming macroeconomic data."

The three traps

The intensification loop feeds three specific failure modes. Most teams hit at least two of them within the first six months of serious AI adoption.

Scope creep. AI makes it possible to do more, so you do more. A marketing team that used to produce four blog posts a month discovers AI can draft twelve. The manager asks for twelve. Nobody asks whether twelve is better than four. Nobody checks whether the audience wants twelve. The output ceiling rose, so the target rose with it. Stack Overflow's 2025 developer survey found that 66% of developers cite "AI solutions that are almost right, but not quite" as their top frustration. Forty-five percent say debugging AI code is more time-consuming than writing it themselves. The scope expanded, but the quality assurance burden expanded faster.

Expectation inflation. Faster output resets the baseline. If a developer used to ship a feature in two weeks and now ships it in three days, the new expectation isn't "we have slack." The new expectation is "ship five features in two weeks." The Faros data shows this clearly: more PRs merged, but more bugs and longer reviews. Speed became the new floor, not a ceiling that freed up capacity.

Review overhead. AI generates output at a pace that no human review process was designed to handle. Every piece of AI-generated work still needs a human to evaluate it. Someone has to decide whether the code is correct, whether the analysis is sound, whether the recommendation makes sense in context. That evaluation work is invisible in most productivity metrics because companies track output volume, not the human time spent verifying it.

The Berkeley researchers put it plainly:

"What looks like higher productivity in the short run can mask silent workload creep and growing cognitive strain."

These traps share a root cause. The AI deployment changed the tools but not the work structure. The meetings stayed the same. The review processes stayed the same. The expectations adjusted upward with no corresponding adjustment to capacity, role definitions, or what "done" actually means.

Measuring what AI actually costs

Most companies cannot answer a basic question: is AI saving us time or shifting where the time goes?

They can't answer it because they measure the wrong thing. Output volume is easy to count. PRs merged, tickets closed, documents processed, emails sent. AI reliably increases all of these. What companies don't measure is the total human time involved in producing that output, including the prompting, reviewing, debugging, reworking, and coordinating that AI-assisted work requires.

Here's how to build the audit.

Pick three workflows. Not your entire operation. Three processes where AI tools are actively in use. Choose ones that matter: a revenue-generating workflow, an internal operations workflow, and a creative or analytical workflow. Variety exposes different failure modes.

Map the full time cost. For each workflow, track every minute of human involvement over two weeks. Not just the task execution. Include the time spent prompting the AI, reviewing its output, correcting its mistakes, re-prompting after failures, coordinating with teammates about AI-generated deliverables, and context-switching between AI-assisted and manual work. The BCG AI at Work 2025 study found that companies see significantly greater time savings when they redesign workflows around AI rather than bolting AI onto existing processes. The audit reveals whether you've redesigned or just bolted.

Compare to the pre-AI baseline. This is where most teams get uncomfortable. Pull the data from before AI adoption if you have it. If you don't, estimate conservatively. How long did this workflow take six months ago? How long does it take now, with all the human involvement included? For some workflows, the honest answer will be: it takes longer. That's not a reason to abandon AI. It's a reason to redesign the workflow instead of just adding tools to it.

Separate output gains from time gains. Your team may be producing 40% more output. That's worth something. But if they're working 20% more hours to do it, the productivity gain per hour is smaller than the headline suggests. And if burnout is rising, the long-term cost could erase the short-term output gain entirely.

Track where the saved time actually goes. When AI does save time on a task, what happens to that time? Does it become capacity for higher-value work? Does it become more volume of the same work? Does it disappear into expanded scope that nobody approved? The answer tells you whether AI is a productivity tool or an acceleration treadmill.

This audit isn't a one-time exercise. Run it quarterly. The dynamics shift as AI tools improve, as your team's proficiency changes, and as expectations recalibrate. What worked three months ago may be creating new problems today.

Designing AI practice

The Berkeley researchers coined a term for what the most effective teams built: "AI practice." Not policy. Not guidelines taped to a wall. A living set of norms that define when to use AI, when to stop, and what good AI-assisted work looks like.

The concept has three components. Here's how to make each one operational.

Decision pauses. Before starting a task with AI, the team member asks: is this a task where AI adds value, or am I reaching for it out of habit? This sounds trivial. It isn't. The intensification loop runs on autopilot. AI becomes the default for everything because it feels productive, even when it's not. The pause forces a conscious choice.

Make it concrete. Khare, the software engineer, uses a three-prompt rule: if AI doesn't get to 70% of an acceptable solution in three attempts, he writes it himself. Steve Yegge, cited by Pragmatic Engineer, warns that engineers vibe coding at peak intensity may only sustain about three productive hours a day — but those hours can be enormously more productive than a full day without AI. His advice to leadership: "Do you let them work for three hours a day? The answer is yes, or your company's going to break." These aren't arbitrary limits. They're circuit breakers that prevent the feedback loop from running unchecked.

Your team's version will look different. The principle is the same: define the boundary before the work starts, not after the exhaustion hits.

Sequencing. Not every task in a workflow benefits equally from AI. The practice defines which steps get AI assistance and which stay manual. A developer might use AI for boilerplate code and first-draft documentation while keeping architecture decisions and code review entirely human. A marketing team might use AI for research synthesis while keeping messaging and positioning manual.

The sequencing decision depends on where AI errors are cheap versus expensive. Errors in a first draft are cheap. Errors in a customer-facing analysis are expensive. Errors in a security configuration are catastrophic. Map the error cost for each step, and allocate AI accordingly.

Human grounding. Regularly scheduled work that happens without AI. This serves two purposes. It maintains the team's ability to do the work independently, which matters when AI tools break, change, or produce something you need to evaluate from scratch. And it provides a calibration point: when the team does the work manually, they see exactly how much value AI is or isn't adding.

One pattern that works: dedicate one day per sprint to AI-free work on the same tasks the team normally does with AI. Compare the output, the time, and the team's confidence in the results. This isn't a Luddite exercise. It's quality assurance for your AI adoption.

If you haven't assessed which of your workflows are good candidates for AI in the first place, our readiness scorecard provides a scoring framework for that evaluation.

The manager's job

The intensification problem is a management problem. Tools don't redesign work structures. People do. And the manager is the person standing between the feedback loop and the team.

Reset the expectations. When AI makes a task faster, the default response is to raise the target. The manager's job is to ask: should we? Sometimes yes. If the team was previously bottlenecked on a repetitive task and AI cleared it, raising the target makes sense. But raising it without examining the review burden, the error rate, and the coordination cost is how workload creep starts. The target conversation should include total human time, not just output volume.

Protect capacity. AI creates a specific type of cognitive load. The constant evaluation of AI output, the mental switching between directing AI and doing the work, the judgment calls about when AI is right and when it's subtly wrong. This is real work, and it drains the same attention budget as any other demanding task. Managers who fill every minute freed by AI with new assignments aren't capturing productivity. They're converting time savings into burnout debt.

Khare's approach to this was time-boxing: 30-minute AI sessions with a timer. When the timer goes off, step back. Evaluate what you have. Decide whether to continue. The practice prevents the "just one more prompt" spiral that turns a 20-minute task into a two-hour rabbit hole.

Measure what matters. Output volume is a vanity metric for AI adoption. The metrics that tell you whether AI is working: time-to-completion including all human involvement, error rates and rework frequency, team confidence and satisfaction scores, review turnaround time, and the ratio of output gained to hours spent. If output is up 40% and hours are up 30%, your real productivity gain is smaller than the dashboard suggests.

Name the intensity. Work intensification thrives in silence. When nobody talks about it, individuals assume they're the problem. "Everyone else seems fine with this pace." They're not. The Berkeley study found that workers across the company experienced intensification but rarely discussed it. The manager who names it, who says "this pace is by design, and here's where we're drawing the line," gives the team permission to set boundaries instead of white-knuckling through them.

Redesign the work, not just the tools. The BCG study found the dividing line between companies that captured AI value and those that didn't. The difference wasn't which tools they used. It was whether they restructured workflows around AI or simply added AI to existing processes. Adding AI to an unchanged workflow is like putting a jet engine on a bicycle. You'll go faster. You'll also crash.

Redesign means rethinking role boundaries: who does what when AI handles the first pass? It means rethinking review processes: how do you evaluate AI output at scale without burning out your senior people? It means rethinking meeting structures: do you still need the same coordination meetings when AI handles status updates and summaries? The answers vary by team. The question is universal.

For teams thinking about how to sequence these workflow changes across the organization, our AI innovation pipeline provides a framework for prioritizing and ordering initiatives.

The discipline

There is a version of this essay that ends with "slow down." That would be wrong.

The intensity isn't inherently bad. For people with a bias toward action, people who chose startups over bureaucracies and building over planning, intensity is a feature. The speed that AI enables is real. The output gains are real. The founders who are learning AI's capabilities and limitations today are building compounding advantages that will widen every quarter.

The problem was never the tools. The problem is the amorphous mandate. "We need to become more efficient with AI." Those words don't do anything. They create anxiety without direction, activity without purpose, and the exact intensification loop the Berkeley researchers documented. A mandate without concrete approaches is just pressure with no release valve.

Start small. Pick the processes that eat time. Run the audit. Build the practice. Measure total cost, not just output. Treat AI proficiency the way you'd treat any skill: something that develops through deliberate practice, not something that arrives the day you buy the license.

The companies pulling ahead aren't the ones deploying the most AI tools. They're the ones that did the unsexy work of redesigning how their teams operate. They measured honestly. They set boundaries that protected their people. They captured the time AI freed instead of filling it reflexively.

The AI productivity audit isn't a one-time project. It's a discipline. Run it, learn from it, adjust, run it again. The gap between the companies that get this right and the ones that don't will not close. It will compound.

And if you're building the management infrastructure for AI agents alongside human teams, our agent management framework covers the supervision, trust boundaries, and org chart changes that make human-agent teams work.

Get The Pepper Report

The short list of what we're writing and what we're reading.

Published monthly
Thoughts, guides, headlines
Light editorial, zero fluff