> The problem is, there are very few if any other studies.
Not at all, the METR study just got a ton of attention. There are tons out there at much larger scales, almost all of them showing significant productivity boosts for various measures of "productivity".
If you stick to the standard of "Randomly controlled trials on real-world tasks" here are a few:
https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.)
If you look beyond real-world tasks and consider things like standardized tasks, there are a few more:
They all find productivity boosts in the 15 - 30% range -- with a ton of nuance, of course. If you look beyond these at things like open source commits, code reviews, developer surveys etc. you'll find even more evidence of positive impacts from AI.
> https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.)
I like this one a lot, though I just skimmed through it. At 11:58 they talk about what many find correlates with their personal experience. It talks about easy vs complex in greenfield vs brownfield.
> They all find productivity boosts in the 15 - 30% range -- with a ton of nuance, of course.
Or 5-30% with "Ai is likely to reduce productivity in high complexity tasks" ;) But yeah, a ton nuance is needed
Yeah that's why I like that one too, they address a number of points that come up in AI-related discussions. E.g. they even find negative productivity (-5%) in legacy / non-popular languages, which aligns with what a lot of folks here report.
However even these levels are surprising to me. One of my common refrains is that harnessing AI effectively has a deceptively steep learning curve, and often individuals need to figure out for themselves what works best for them and their current project. Took me many months, personally.
Yet many of these studies show immediate boosts in productivity, hinting that even novice AI users are seeing significant improvements. Many of the engineers involved didn't even get additional training, so it's likely a lot of them simply used the autocompletion features and never even touched the powerful chat-based features. Furthermore, current workflows, codebases and tools are not suited for this new modality.
As things are figured out and adopted, I expect we'll see even more gains.
Most of those studies call this out and try to control for it (edit: "it" here being the usual limitations of LoC and PRs as measures of productivity) where possible. But to your point, no, there is still a strong net positive effect:
> https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.)
Hmm, not an economist but I have seen other studies that look at things at the firm level, so definitely should be possible. A quick search on Google and SSRN didn't turn up some studies but they seem to focus on productivity rather than revenues, not sure why. Maybe because such studies depend on the available data, however, so a lot of key information may be hidden, e.g. revenues of privately held companies which constitute a large part of the economy.
Not at all, the METR study just got a ton of attention. There are tons out there at much larger scales, almost all of them showing significant productivity boosts for various measures of "productivity".
If you stick to the standard of "Randomly controlled trials on real-world tasks" here are a few:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 (4867 developers across 3 large companies including Microsoft, measuring closed PRs)
https://www.bis.org/publ/work1208.pdf (1219 programmers at a Chinese BigTech, measuring LoC)
https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.)
If you look beyond real-world tasks and consider things like standardized tasks, there are a few more:
https://ieeexplore.ieee.org/abstract/document/11121676 (96 Google engineers, but same "enterprise grade" task rather than different tasks.)
https://aaltodoc.aalto.fi/server/api/core/bitstreams/dfab4e9... (25 professional developers across 7 tasks at a Finnish technology consultancy.)
They all find productivity boosts in the 15 - 30% range -- with a ton of nuance, of course. If you look beyond these at things like open source commits, code reviews, developer surveys etc. you'll find even more evidence of positive impacts from AI.