The productivity paradox of AI-assisted coding

mardi 23 septembre 2025, 11:00 , par InfoWorld

AI is dramatically accelerating code generation. With the help of sophisticated coding assistants and other generative AI tools, developers can now write more code, faster than ever before. The promise is one of hyper-productivity, where development cycles shrink and features are shipped at a blistering pace.

But many engineering teams are noticing a trend: even as individual developers produce code faster, overall project delivery timelines are not shortening. This isn’t just a feeling. A recent METR study found that AI coding assistants decreased experienced software developers’ productivity by 19%. “After completing the study, developers estimate that allowing AI reduced completion time by 20%,” the report noted. “Surprisingly, we find that allowing AI actually increases completion time by 19%—AI tooling slowed developers down.”

This growing disconnect reveals a “productivity paradox.” We are seeing immense speed gains in one isolated part of the software development life cycle (SDLC), code generation, which in turn exposes and exacerbates bottlenecks in other parts such as code review, integration, and testing. It’s a classic factory problem: speed up one machine on an assembly line while leaving the others untouched, and you don’t get a faster factory, you get a massive pile-up.

In this article, we’ll explore how engineering teams can diagnose this pile-up, realign their workflows to truly benefit from AI’s speed, and do so without sacrificing code quality or burning out their developers.

Why AI-generated code needs human review

Generative AI tools excel at producing code that is syntactically correct and appears “good enough” on the surface. But these appearances can be dangerously misleading. Without thoughtful, rigorous human review, teams risk shipping code that, while technically functional, is insecure, inefficient, non-compliant, or nearly impossible to maintain.

This reality places immense pressure on code reviewers. AI is increasing the number of pull requests (PRs) and the volume of code within them, yet the number of available reviewers and the hours in a day remain constant. Left unchecked, this imbalance leads to rushed, superficial reviews that let bugs and vulnerabilities through, or review cycles become a bottleneck, leaving developers blocked.

Complicating this challenge is the fact that not all developers are using AI in the same way. There are three distinct developer experience (DevX) workflows emerging, and teams will be stretched for quite a while to support all of them:

Legacy DevX (80% human, 20% AI): Often experienced developers who view software development as a craft. They are skeptical of AI’s output and primarily use it as a sophisticated replacement for search queries or to solve minor boilerplate tasks.

Augmented DevX (50% human, 50% AI): Represents the modern power user. These developers fluidly partner with AI for isolated development tasks, troubleshooting, and generating unit tests, using the tools to become more efficient and move faster on well-defined problems.

Autonomous DevX (20% human, 80% AI): Practiced by skilled prompt engineers who offload the majority of the code generation and iteration work to AI agents. Their role shifts from writing code to reviewing, testing, and integrating the AI’s output, acting more as a systems architect and QA specialist.

Each of these workflows requires different tools, processes, and support. A one-size-fits-all approach to tooling or performance management is doomed to fail when your team is split across these different models of working. But no matter what, having a human in the loop is essential.

Burnout and bottlenecks are a risk

Without systemic adjustments to the SDLC, AI’s increased output creates more downstream work. Developers may feel productive as they generate thousands of lines of code, but the hidden costs quickly pile up with more code to review, more bugs to fix, and more complexity to manage.

An immediate symptom of this problem is that PRs are becoming super-sized. When developers write code themselves, they tend to create smaller, atomic commits that are easy to review. AI, however, can generate massive changes in a single prompt, making it incredibly difficult for a reviewer to understand the full scope and impact. The core issue isn’t just duplicate code; it’s the sheer amount of time and cognitive load required to untangle these enormous changes.

This challenge is further highlighted by the METR study, which confirms that even when developers accept AI-generated code, they devote substantial time to reviewing and editing it to meet their standards:

Even when they accept AI generations, they spend a significant amount of time reviewing and editing AI-generated code to ensure it meets their high standards. 75% report that they read every line of AI-generated code, and 56% of developers report that they often need to make major changes to clean up AI code—when asked, 100% developers report needing to modify AI-generated code.

The risk extends to quality assurance. Test generation is a fantastic use case for AI but focusing only on test coverage is a trap. This metric can be easily gamified by AI to create tests that touch every line of code but don’t actually validate meaningful behavior. It’s far more important to create transparency around test quality. Are you testing that the system not only does what it’s supposed to do, but also handles errors gracefully and doesn’t crash when something unexpected happens?

The unsustainable pace, coupled with the fracturing of the developer experience, can lead directly to burnout, mounting technical debt, and critical production issues—especially if teams treat AI output as plug-and-play code.

How to make workflows AI-ready

To harness AI productively and escape the paradox, teams must evolve their practices and culture. They must shift the focus from individual developer output to the health of the entire system.

First, leaders must strengthen code review processes and reinforce accountability at the developer and team levels. This requires setting clear standards for what constitutes a “review-ready” PR and empowering reviewers to push back on changes that are too large or that lack context.

Second, automate responsibly. Use static and dynamic analysis tools to assist in testing and quality checks, but always with a human in the loop to interpret the results and make final judgments.

Finally, align expectations. Leadership must communicate that raw coding speed is a vanity metric. The real goal is sustainable, high-quality throughput, and that requires a balanced approach where quality and sustainability keep pace with generation speed.

Beyond these cultural shifts, two tactical changes can yield immediate benefits:

Establish common rules and context for prompting, to guide the AI to generate code that aligns with your organization’s best practices. Provide guardrails that prevent the AI from “hallucinating” or using deprecated libraries, making its output far more reliable. This can be achieved by feeding the AI context, such as lists of approved libraries, internal utility functions, and internal API specifications.

Add analysis tools earlier in the process; don’t wait for a PR to discover that AI-generated code is insecure. By integrating analysis tools directly into the developer’s IDE, issues can be caught and fixed instantly. This “start left” approach ensures that problems are resolved when they are cheapest to fix, preventing them from becoming a bottleneck in the review stage.

The conversation around AI in software development must mature beyond “faster code.” The new frontier is building smarter systems. Engineering teams should now focus on creating stable and predictable instruction frameworks that guide AI to produce code according to company standards, use approved and secure resources, and align its output with the organization’s broader architecture.

The productivity paradox isn’t inevitable. It’s a signal that our engineering systems must evolve alongside our tools. Understanding that your team is likely operating across three different developer workflows—legacy, augmented, and autonomous—is one of the first steps toward creating a more resilient and effective SDLC.

By ensuring disciplined human oversight and adopting a systems-thinking mindset, development teams can move beyond the paradox. Then, they can leverage AI not just for speed, but for a true, sustainable leap in productivity.

Lire la suite sur InfoWorld