Published on

70% of Companies Added AI Without Changing a Role

Authors
  • avatar
    Name
    Ptrck Brgr
    Twitter

Five percent. Ten. Maybe fifteen. That's where most enterprises land when you ask about AI-driven productivity gains. Not the demo gains, those are spectacular. The company-wide, show-it-to-the-board numbers.

Martin Harrysson and Natasha Maniar from McKinsey explain in Moving away from Agile: What's Next that the gap between individual AI wins and enterprise-scale impact is almost entirely organizational. Their survey of 300 companies found top performers seven times more likely to have AI-native workflows and six times more likely to have restructured roles entirely.

I keep saying the ceiling isn't technical, it's organizational. But hearing McKinsey put numbers on it caught me off guard. From enterprise AI projects, I've seen this across business units: teams bolting Copilot onto the same two-week sprint, the same standup cadence, the same eight-person squad. Then wondering why gains plateau. (And honestly, the 70% stat below hit different.)

Two Pizza to One Pizza

Here's the structural shift that surprised me most. Top-performing teams aren't just adding AI tools. They're shrinking. From the classic two-pizza team of eight to ten down to pods of three to five.

About 70% of the companies that we survey have not changed their roles at all. — Martin Harrysson, McKinsey

Seventy percent. Let that land. Companies hand engineers AI coding tools, expect different outcomes, and haven't redefined a single role. Same job description. Same sprint structure. Same review process. Then leadership asks why productivity looks flat.

At Tier, we ran lean teams by necessity — small embedded groups, each owning a full slice of the product. That constraint forced us into patterns that look a lot like what McKinsey now prescribes as "AI-native." Fewer handoffs. More ownership per person. I didn't think of it as future-proofing at the time, but it tracks.

The Review Bottleneck

Speed up code generation without changing code review, and you've just moved the traffic jam. Agents get fuzzy stories, produce fuzzy code, and the only quality gate is still a human reading diffs.

Pull request count goes up. Review queues balloon. Developers spend more time reviewing AI-generated code than they saved generating it. Net gain? Close to zero. Sometimes negative, because AI-generated code that looks right but isn't creates a specific debugging pain that manual code rarely does.

Here's the question I keep coming back to: if review is the bottleneck, do you fix it with better specs upstream, automated quality gates, or smaller units of work? Probably all three. But most teams I've seen reach for none of them.

Spec-Driven, Not Story-Driven

One shift I didn't expect from McKinsey: moving from story-driven to spec-driven development. PMs iterating on specs with agents rather than writing long PRDs that get interpreted loosely.

A PM who can prototype directly in code collapses the feedback loop from weeks to hours. Continuous planning replaces quarterly cycles. Specs replace stories as the unit of work.

I'm not fully convinced this scales to legacy enterprises with decades of Jira workflows baked into their culture. My sample size is limited, but the orgs I've seen try this leap usually stall in the middle. The tooling is ready. The humans aren't.

Where Measurement Breaks

Bottom performers were not even measuring speed and only 10% were measuring productivity. — Natasha Maniar, McKinsey

This one genuinely surprised me. Not measuring productivity? Companies deployed AI tools across engineering orgs and then... didn't check if anything changed?

Activity metrics lie. Output metrics tell truth. I say this all the time. But I assumed most companies at least had some output metrics. Turns out, no. McKinsey's framework pushes measurement from inputs (tool investment, upskilling spend) through outputs (velocity, code quality, developer NPS) to economic outcomes (time to revenue, cost per pod). That full stack of measurement separates the 5x improvers from the 10%-and-stuck crowd.

And here's the part most teams miss — resiliency metrics. Maniar flagged mean time to resolve priority bugs as a proxy for code resilience. Fast code that breaks fast isn't fast.

The Orchestration Shift

Engineers moving from execution to orchestration. I think the framing is directionally right but easy to romanticize. "Orchestrating agents" sounds clean. In practice, it means debugging non-deterministic outputs, writing better prompts than most engineers are trained to write, and understanding architecture deeply enough to know when the agent is wrong.

The real skill shift isn't from "writing code" to "managing agents." It's from narrow specialization to broader judgment. I could be wrong, but I suspect this transition takes most enterprises three to five years, not the twelve months the consultants hope for.

Why This Matters

Agile was built for human constraints. Two-week sprints made sense when humans wrote, reviewed, and tested all the code. Eight-person teams made sense when coordination overhead scaled linearly with headcount.

AI breaks those assumptions. Code generation is cheap. Review is expensive. Coordination overhead now scales with agent outputs, not human outputs. McKinsey's bank case study saw a 51% increase in code merges and 60x increase in agent consumption after restructuring sprints and roles. Those aren't incremental numbers. But they required changing how work gets allocated, how quality gets measured, and who does what.

What Works

Shrink teams. Three to five per pod, full-stack fluency, complete workflow ownership. More pods, same headcount.

Move from stories to specs. PMs iterate on specs with agents before developers touch code. Acceptance criteria get crisp before the sprint starts, not during review.

Measure outcomes, not adoption. Tool usage going up means nothing if delivery speed and code quality don't follow.

Fix review before you speed up generation. Automated quality gates, better specs, smaller merge units. Otherwise you just move the bottleneck.

This works best for teams already frustrated with agile ceremonies that feel like theater. If your AI gains have plateaued at 10-15% and nobody can explain why, the operating model is the first place to look.

Full talk: Watch on YouTube