
TL;DR
- A new study by METR finds that AI coding tools can slow down experienced developers rather than boost productivity.
- In a randomized controlled trial with 16 open source developers, completion time increased by 19% when AI tools like Cursor Pro were used.
- Developers had initially forecasted a 24% productivity gain, revealing a 41-point gap between expectation and outcome.
- Results point to prompt engineering overhead, response delays, and complex code base limitations as key factors.
- The findings contrast with other large-scale studies that report gains in general AI-assisted coding workflows.
A Surprising Result in a Year of AI Hype
In an era when tools like GitHub Copilot, Cursor, and other AI-powered developer assistants are becoming the norm in software engineering workflows, a new study from nonprofit research group METR delivers an unexpected finding: these tools may not enhance productivity for experienced developers working on real-world, complex projects.
The randomized controlled trial, published Thursday, aimed to provide empirical insights on whether leading AI coding tools actually accelerate the development process.
Instead, the study found that developers took 19% longer to complete tasks when using AI tools compared to when coding without them.
“Surprisingly, we find that allowing AI actually increases completion time,” the METR researchers wrote.
“Developers are slower when using AI tooling.”
Key Data Points from the METR Study
Metric | Value |
Study Participants | 16 experienced open source developers |
Tasks Completed | 246 real tasks across large codebases |
Forecasted AI Productivity Gain | 24% time savings |
Actual Result | 19% increase in task completion time |
Main AI Tool Used | Cursor Pro |
Developer Familiarity with Cursor | 56% had prior experience with the tool |
AI Use Restriction | Half of tasks were “AI-allowed”; others restricted |
AI Response Time & Prompting | Identified as a primary delay factor |
Study Publisher | METR (Model Evaluation and Testing for Research) |
Real-World Conditions: Large Codebases and Complex Tasks
Unlike many lab-based studies that rely on synthetic or simplified coding exercises, METR’s trial focused on real-world developer workflows. The participants were experienced open-source contributors who were asked to complete tasks from live repositories they regularly work on.
Researchers divided their assignments into two groups:
- “AI-allowed” tasks — participants could use AI tools such as Cursor Pro.
- “AI-restricted” tasks — developers had to work without AI assistance.
While participants believed AI would shorten their task completion time by nearly a quarter, the opposite proved true in execution. Tasks took significantly longer when AI assistance was used — a deviation that suggests integration friction, prompt inefficiencies, and cognitive overhead may still burden even state-of-the-art AI tools.
Why AI Tools Slowed Things Down
The METR researchers offered several hypotheses for the productivity drag:
- Prompting Takes Time:
Developers had to frequently craft and rephrase prompts, diverting focus from actual coding. - Latency and Wait Time:
AI tools like Cursor introduce response delays, particularly when generating responses for complex problems or navigating extensive codebases. - Lack of Tool Familiarity:
Only 56% of participants were familiar with Cursor, despite all receiving pre-study training. The remaining developers experienced a learning curve during task execution. - AI’s Limitations in Contextual Comprehension:
AI models, while effective in small code samples, continue to struggle in massive, interdependent codebases where context spans hundreds of files and nuanced logic trees.
Not a Death Knell for AI Coding Assistants
Despite the findings, the study’s authors were careful not to draw overly broad conclusions. They emphasized the scope of the research and made clear that:
- AI coding tools are still evolving rapidly, and these results may not hold even three months from now.
- The trial does not suggest that AI tools universally fail to speed up developers. Rather, performance depends on task type, tool maturity, developer familiarity, and contextual complexity.
“We wouldn’t expect the same results even a few months from now,” METR noted, citing ongoing improvements in long-horizon reasoning and multi-file comprehension by leading AI models from OpenAI, Anthropic, and Google DeepMind.
A Note on “Vibe Coders”
The study specifically targeted tools referred to in developer communities as “vibe coders” — AI systems that offer suggestions based on general code patterns rather than strictly enforcing rules or structure.
These tools, like Copilot and Cursor, tend to perform better in greenfield projects or prototyping, but may hinder progress in legacy code or systems with high architectural complexity.
METR’s findings are a reminder that real productivity gains depend on effective integration and task context, not just the presence of AI.
Previous Contradictory Findings
Other prominent studies have painted a more optimistic picture. Research from GitHub, MIT, and Stanford HAI found that AI code suggestions:
- Reduced bug frequency in some use cases
- Improved task completion time by junior developers
- Enhanced developer satisfaction
The contrast suggests that AI’s impact is highly dependent on user experience, tooling maturity, and task nature — and not all developers, especially seasoned engineers in production systems, will benefit immediately.