Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The problem is, there are very few if any other studies.

Not at all, the METR study just got a ton of attention. There are tons out there at much larger scales, almost all of them showing significant productivity boosts for various measures of "productivity".

If you stick to the standard of "Randomly controlled trials on real-world tasks" here are a few:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 (4867 developers across 3 large companies including Microsoft, measuring closed PRs)

https://www.bis.org/publ/work1208.pdf (1219 programmers at a Chinese BigTech, measuring LoC)

https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.)

If you look beyond real-world tasks and consider things like standardized tasks, there are a few more:

https://ieeexplore.ieee.org/abstract/document/11121676 (96 Google engineers, but same "enterprise grade" task rather than different tasks.)

https://aaltodoc.aalto.fi/server/api/core/bitstreams/dfab4e9... (25 professional developers across 7 tasks at a Finnish technology consultancy.)

They all find productivity boosts in the 15 - 30% range -- with a ton of nuance, of course. If you look beyond these at things like open source commits, code reviews, developer surveys etc. you'll find even more evidence of positive impacts from AI.



Thank you!

> https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.)

I like this one a lot, though I just skimmed through it. At 11:58 they talk about what many find correlates with their personal experience. It talks about easy vs complex in greenfield vs brownfield.

> They all find productivity boosts in the 15 - 30% range -- with a ton of nuance, of course.

Or 5-30% with "Ai is likely to reduce productivity in high complexity tasks" ;) But yeah, a ton nuance is needed


Yeah that's why I like that one too, they address a number of points that come up in AI-related discussions. E.g. they even find negative productivity (-5%) in legacy / non-popular languages, which aligns with what a lot of folks here report.

However even these levels are surprising to me. One of my common refrains is that harnessing AI effectively has a deceptively steep learning curve, and often individuals need to figure out for themselves what works best for them and their current project. Took me many months, personally.

Yet many of these studies show immediate boosts in productivity, hinting that even novice AI users are seeing significant improvements. Many of the engineers involved didn't even get additional training, so it's likely a lot of them simply used the autocompletion features and never even touched the powerful chat-based features. Furthermore, current workflows, codebases and tools are not suited for this new modality.

As things are figured out and adopted, I expect we'll see even more gains.


Closed PRs, commits, loc etc are useless vanity metrics.

With ai code you have more loc and NEED more PRs to fix all its slop.

In the end you have increased numbers with net negative effect


Most of those studies call this out and try to control for it (edit: "it" here being the usual limitations of LoC and PRs as measures of productivity) where possible. But to your point, no, there is still a strong net positive effect:

> https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.)

Emphasis added. They modeled a way to detect when AI output is being reworked, and still find a 15-20% increase in throughput. Specific timestamp: https://youtu.be/tbDDYKRFjhk?t=590&si=63qBzP6jc7OLtGyk


Could you try to avoid uncertainties like this by measuring something like revenue growth before and after AI? Given enough data.


Hmm, not an economist but I have seen other studies that look at things at the firm level, so definitely should be possible. A quick search on Google and SSRN didn't turn up some studies but they seem to focus on productivity rather than revenues, not sure why. Maybe because such studies depend on the available data, however, so a lot of key information may be hidden, e.g. revenues of privately held companies which constitute a large part of the economy.


True it probably would be difficult to gather representative data. Also might be hard to seperate out broader economic effects e.g. overall upturns.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: