On April 9, 2026, Microsoft Research published the New Future of Work Report. Buried inside is a section titled "myths" — three of them, named directly:
- Counting AI-generated lines of code is a meaningful productivity metric
- Current tools will instantly turn every developer into a "10× engineer"
- Adoption primarily depends on model capability
The company that owns GitHub Copilot just officially called the entire AI coding marketing playbook from the last two years a myth.
The real number
The same report links to a randomized controlled trial published in Management Science (2026):
- 4,867 developers across Microsoft, Accenture, and an anonymous Fortune 100 company
- Random subset got access to an AI coding assistant; the rest didn't
- Outcome measured at the developer level on real production work
The result: 26.08% more completed tasks (standard error: 10.3%).
Not 10x. Not 5x. Twenty-six percent.
A note on timing: the experiments themselves ran in 2022–2023, on the GPT-3.5 generation of Copilot. Peer-reviewed final results came out in February 2026. The April 2026 Microsoft Research report cites this study as the best evidence available — meaning when they list "10x" as a myth, this is the number they're pointing at.
The hidden finding
Read the second sentence of the abstract: "Less experienced developers had higher adoption rates and greater productivity gains."
Translation: juniors got more out of the assistant than seniors. The benefit is concentrated where the existing skill is lowest.
The mechanism is obvious to anyone who's watched both kinds of engineer use these tools. A junior asks the assistant to write a function and ships what comes back. A senior asks the assistant to write a function and then spends as long reviewing it as it would have taken to write from scratch.
Reviewing is more cognitively expensive than writing. The senior pays that cost. The junior either doesn't pay it, or doesn't yet know they're supposed to.
Same tool. Two different jobs. Two different productivity numbers.
Stack the second study
The METR randomized trial (2025) measured experienced open-source developers working on their own repositories with Cursor + Claude 3.5/3.7 Sonnet:
- Predicted before the trial: +24% speed-up
- Felt after the trial: +20% faster
- Measured: −19% slower
A 40-point gap between perception and reality.
Stack the two studies:
- Microsoft (RCT, 4,867 devs, code completion, GPT-3.5 era): +26%
- METR (RCT, 16 experienced devs, agent tools, modern Sonnet): −19%
The honest range is somewhere between +26% up and −19% down, depending on the task, the seniority of the user, and the discipline of the prompt.
How to actually use this
1. Stop using "lines of code" as an AI productivity metric. Microsoft Research just officially flagged it as a myth.
2. Budget against ~25%, not 10x. A quarter is the right size of number to plan around. License renewals, hiring forecasts, quarterly capacity. 26% is the kind of gain you can actually budget against.
3. Expect inverse seniority gains. If your senior devs feel like AI isn't helping them as much as your juniors — they're probably right, and the data confirms it.
If you've been quietly suspecting the hype was off — you now have a citation. Use it the next time someone asks why your AI-assisted output doesn't look like a launch keynote.
The full breakdown is in the video above.
What's your team using as an AI productivity metric right now?