AI Coding Tools May Not Boost Productivity for All Developer

TL;DR

A new study by METR finds that AI coding tools can slow down experienced developers rather than boost productivity.
In a randomized controlled trial with 16 open source developers, completion time increased by 19% when AI tools like Cursor Pro were used.
Developers had initially forecasted a 24% productivity gain, revealing a 41-point gap between expectation and outcome.
Results point to prompt engineering overhead, response delays, and complex code base limitations as key factors.
The findings contrast with other large-scale studies that report gains in general AI-assisted coding workflows.

A Surprising Result in a Year of AI Hype

In an era when tools like GitHub Copilot, Cursor, and other AI-powered developer assistants are becoming the norm in software engineering workflows, a new study from nonprofit research group METR delivers an unexpected finding: these tools may not enhance productivity for experienced developers working on real-world, complex projects.

The randomized controlled trial, published Thursday, aimed to provide empirical insights on whether leading AI coding tools actually accelerate the development process.

Instead, the study found that developers took 19% longer to complete tasks when using AI tools compared to when coding without them.

“Surprisingly, we find that allowing AI actually increases completion time,” the METR researchers wrote.
“Developers are slower when using AI tooling.”

Key Data Points from the METR Study

Metric	Value
Study Participants	16 experienced open source developers
Tasks Completed	246 real tasks across large codebases
Forecasted AI Productivity Gain	24% time savings
Actual Result	19% increase in task completion time
Main AI Tool Used	Cursor Pro
Developer Familiarity with Cursor	56% had prior experience with the tool
AI Use Restriction	Half of tasks were “AI-allowed”; others restricted
AI Response Time & Prompting	Identified as a primary delay factor
Study Publisher	METR (Model Evaluation and Testing for Research)

Real-World Conditions: Large Codebases and Complex Tasks

Unlike many lab-based studies that rely on synthetic or simplified coding exercises, METR’s trial focused on real-world developer workflows. The participants were experienced open-source contributors who were asked to complete tasks from live repositories they regularly work on.

Researchers divided their assignments into two groups:

“AI-allowed” tasks — participants could use AI tools such as Cursor Pro.
“AI-restricted” tasks — developers had to work without AI assistance.

While participants believed AI would shorten their task completion time by nearly a quarter, the opposite proved true in execution. Tasks took significantly longer when AI assistance was used — a deviation that suggests integration friction, prompt inefficiencies, and cognitive overhead may still burden even state-of-the-art AI tools.

Why AI Tools Slowed Things Down

The METR researchers offered several hypotheses for the productivity drag:

Prompting Takes Time:
Developers had to frequently craft and rephrase prompts, diverting focus from actual coding.
Latency and Wait Time:
AI tools like Cursor introduce response delays, particularly when generating responses for complex problems or navigating extensive codebases.
Lack of Tool Familiarity:
Only 56% of participants were familiar with Cursor, despite all receiving pre-study training. The remaining developers experienced a learning curve during task execution.
AI’s Limitations in Contextual Comprehension:
AI models, while effective in small code samples, continue to struggle in massive, interdependent codebases where context spans hundreds of files and nuanced logic trees.

Not a Death Knell for AI Coding Assistants

Despite the findings, the study’s authors were careful not to draw overly broad conclusions. They emphasized the scope of the research and made clear that:

AI coding tools are still evolving rapidly, and these results may not hold even three months from now.
The trial does not suggest that AI tools universally fail to speed up developers. Rather, performance depends on task type, tool maturity, developer familiarity, and contextual complexity.

“We wouldn’t expect the same results even a few months from now,” METR noted, citing ongoing improvements in long-horizon reasoning and multi-file comprehension by leading AI models from OpenAI, Anthropic, and Google DeepMind.

A Note on “Vibe Coders”

The study specifically targeted tools referred to in developer communities as “vibe coders” — AI systems that offer suggestions based on general code patterns rather than strictly enforcing rules or structure.

These tools, like Copilot and Cursor, tend to perform better in greenfield projects or prototyping, but may hinder progress in legacy code or systems with high architectural complexity.

METR’s findings are a reminder that real productivity gains depend on effective integration and task context, not just the presence of AI.

Previous Contradictory Findings

Other prominent studies have painted a more optimistic picture. Research from GitHub, MIT, and Stanford HAI found that AI code suggestions:

Reduced bug frequency in some use cases
Improved task completion time by junior developers
Enhanced developer satisfaction

The contrast suggests that AI’s impact is highly dependent on user experience, tooling maturity, and task nature — and not all developers, especially seasoned engineers in production systems, will benefit immediately.