AI coding tools can slow down seasoned developers by 19%

vendredi 11 juillet 2025, 14:44 , par InfoWorld

Experienced developers can take 19% longer to complete tasks when using popular AI assistants like Cursor Pro and Claude, challenging the tech industry’s prevailing narrative about AI coding tools, according to a comprehensive new study.

The research, conducted by Model Evaluation & Threat Research (METR), tracked 16 seasoned open-source developers as they completed 246 real-world coding tasks on mature repositories averaging over one million lines of code.

“We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories,” the study said. “Surprisingly, we find that when developers use AI tools, they take 19% longer than without — AI makes them slower.”

The perception gap runs deep

Perhaps most striking is the disconnect between perception and reality. Before starting the study, developers predicted AI tools would reduce their completion time by 24%. Even after experiencing the actual slowdown, participants estimated that AI had improved their productivity by 20%.

“When people report that AI has accelerated their work, they might be wrong,” the researchers added in their analysis of the perception gap.

This misperception extends beyond individual developers, with economics experts predicting AI would improve productivity by 39% and machine learning experts forecasting 38% gains, all dramatically overestimating the actual impact.

Sanchit Vir Gogia, chief analyst and CEO at Greyhound Research, warned that organizations risk “mistaking developer satisfaction for developer productivity,” noting that most AI tools improve the coding experience through reduced cognitive load but don’t always translate to faster output, especially for experienced professionals.

Controlled real-world testing

The study employed randomized controlled trial methodology, rare in AI productivity research. “To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years,” the researchers explained.

Tasks were randomly assigned to either allow or prohibit AI tool usage, with developers using primarily Cursor Pro with Claude 3.5 and 3.7 Sonnet during the February-June 2025 study period. All participants recorded their screens, providing insight into actual usage patterns, with tasks averaging two hours to complete, the study paper added.

Gogia argued this represents “a vital corrective to the overly simplistic assumption that AI-assisted coding automatically boosts developer productivity,” suggesting enterprises must “elevate the rigour of their evaluation frameworks” and develop “structured test-and-learn models that go beyond vendor-led benchmarks.”

Understanding the productivity paradox

The research identified several interconnected factors contributing to the observed slowdown. Despite instructions to use AI tools only when helpful, some developers reported experimenting beyond what was productive. The study participants averaged five years of experience and 1,500 commits on their repositories, with researchers finding greater slowdowns on tasks where developers had high prior experience.

Most tellingly, developers accepted less than 44% of AI-generated code suggestions, with 75% reporting they read every line of AI output and 56% making major modifications to clean up AI-generated code. Working on large, mature codebases with intricate dependencies and coding standards proved particularly challenging for AI tools lacking deep contextual understanding.

“The 19% slowdown observed among experienced developers is not an indictment of AI as a whole, but a reflection of the real-world friction of integrating probabilistic suggestions into deterministic workflows,” Gogia explained, emphasizing that measurement should include “downstream rework, code churn, and peer review cycles—not just time-to-code.”

Broader industry evidence

The METR findings align with concerning trends identified in Google’s 2024 DevOps Research and Assessment (DORA) report, based on responses from over 39,000 professionals. While 75% of developers reported feeling more productive with AI tools, the data tells a different story: every 25% increase in AI adoption showed a 1.5% dip in delivery speed and a 7.2% drop in system stability. Additionally, 39% of respondents reported having little or no trust in AI-generated code.

These results contradict earlier optimistic studies. Research from MIT, Princeton, and the University of Pennsylvania, analyzing data from over 4,800 developers at Microsoft, Accenture, and another Fortune 100 company, found that developers using GitHub Copilot completed 26% more tasks on average. A separate controlled experiment found developers completed coding tasks 55.8% faster with GitHub Copilot. However, these studies typically used simpler, more isolated tasks compared to the complex, real-world scenarios examined in the METR research.

The findings arrive as enterprises pour billions into AI coding tools, with the METR study noting that GitHub reports 41% of new code is now AI-generated. Yet the research reveals a fundamental trust deficit that may be undermining effectiveness.

According to the DORA report, one participant described evaluating AI code as being “like the early days of StackOverflow, [when] you always thought people on StackOverflow are really experienced… And then, you just copy and paste the stuff, and things explode.”

A strategic path forward

Despite the productivity setbacks, 69% of study participants continued using Cursor after the experiment ended, suggesting developers value aspects beyond pure speed. The METR study noted that “the results don’t necessarily spell doom for AI coding tools” as several factors specific to their study setting may not apply broadly.

Gogia recommended enterprises adopt a “portfolio mindset: deploying AI copilots where they augment cognition (documentation, boilerplate, tests), while holding back in areas where expertise and codebase familiarity outweigh automation.” He advocated treating AI tools “not as a universal accelerator but as a contextual co-pilot” that requires governance and measurement.

Related reading:

Why AI-generated code isn’t good enough (and how it will get better)

7 ways to improve your AI coding results

What the AI coding assistants get right, and where they go wrong

The tough task of making AI code production-ready

Sizing up the AI code generators

Lire la suite sur InfoWorld