AI’s trust tax for developers

lundi 29 décembre 2025, 10:00 , par InfoWorld

Andrej Karpathy is one of the few people in this industry who has earned the right to be listened to without a filter. As a founding member of OpenAI and the former director of AI at Tesla, he sits at the summit of AI and its possibilities. In a recent post, he shared a view that is equally inspiring and terrifying: “I could be 10X more powerful if I just properly string together what has become available over the last ~year,” Karpathy wrote. “And a failure to claim the boost feels decidedly like [a] skill issue.”

If you aren’t ten times faster today than you were in 2023, Karpathy implies that the problem isn’t the tools. The problem is you. Which seems both right…and very wrong. After all, the raw potential for leverage in the current generation of LLM tools is staggering. But his entire argument hinges on a single adverb that does an awful lot of heavy lifting:

“Properly.”

In the enterprise, where code lives for decades, not days, that word “properly” is easy to say but very hard to achieve. The reality on the ground, backed by a growing mountain of data, suggests that for most developers, the “skill issue” isn’t a failure to prompt effectively. It’s a failure to verify rigorously. AI speed is free, but trust is incredibly expensive.

A vibes-based productivity trap

In reality, AI speed only seems to be free. Earlier this year, for example, METR (Model Evaluation and Threat Research) ran a randomized controlled trial that gave experienced open source developers tasks to complete. Half used AI tools; half didn’t. The developers using AI were convinced the LLMs had accelerated their development speed by 20%. But reality bites: The AI-assisted group was, on average, 19% slower.

That’s a nearly 40-point gap between perception and reality. Ouch.

How does this happen? As I recently wrote, we are increasingly relying on “vibes-based evaluation” (a phrase coined by Simon Willison). The code looks right. It appears instantly. But then you hit the “last mile” problem. The generated code uses a deprecated library. It hallucinates a parameter. It introduces a subtle race condition.

Karpathy can induce serious FOMO with statements like this: “People who aren’t keeping up even over the last 30 days already have a deprecated worldview on this topic.” Well, maybe, but as fast as AI is changing, some things remain stubbornly the same. Like quality control. AI coding assistants are not primarily productivity tools; they are liability generators that you pay for with verification. You can pay the tax upfront (rigorous code review, testing, threat modeling), or you can pay it later (incidents, data breaches, and refactoring). But you’re going to pay sooner or later.

Right now, too many teams think they’re evading the tax, but they’re not. Not really. Veracode’s GenAI Code Security Report found that 45% of AI-generated code samples introduced security issues on OWASP’s top 10 list. Think about that.

Nearly half the time you accept an AI suggestion without a rigorous audit, you are potentially injecting a critical vulnerability (SQL injection, XSS, broken access control) into your codebase. The report puts it bluntly: “Congrats on the speed, enjoy the breach.” As Microsoft developer advocate Marlene Mhangami puts it, “The bottleneck is still shipping code that you can maintain and feel confident about.”

In other words, with AI we’re accumulating vulnerable code at a rate manual security reviews cannot possibly match. This confirms the “productivity paradox” that SonarSource has been warning about. Their thesis is simple: Faster code generation inevitably leads to faster accumulation of bugs, complexity, and debt, unless you invest aggressively in quality gates. As the SonarSource report argues, we’re building “write-only” codebases: systems so voluminous and complex, generated by non-deterministic agents, that no human can fully understand them.

We increasingly trade long-term maintainability for short-term output. It’s the software equivalent of a sugar high.

Redefining the skills

So, is Karpathy wrong? No. When he says he can be ten times more powerful, he’s right. It might not be ten times, but the performance gains savvy developers gain from AI are real or have the potential to be so. Even so, the skill he possesses isn’t just the ability to string together tools.

Karpathy has the deep internalized knowledge of what good software looks like, which allows him to filter the noise. He knows when the AI is likely to be right and when it is likely to be hallucinating. But he’s an outlier on this, bringing us back to that pesky word “properly.”

Hence, the real skill issue of 2026 isn’t prompt engineering. It’s verification engineering. If you want to claim the boost Karpathy is talking about, you need to shift your focus from code creation to code critique, as it were:

Verification is the new coding. Your value is no longer defined by lines of code written, but by how effectively you can validate the machine’s output.

“Golden paths” are mandatory. As I’ve written, you cannot allow AI to be a free-for-all. You need golden paths: standardized, secured templates. Don’t ask the LLM to write a database connector; ask it to implement the interface from your secure platform library.

Design the security architecture yourself. You can’t just tell an LLM to “make this secure.” The high-level thinking you embed in your threat modeling is the one thing the AI still can’t do reliably.

“Properly stringing together” the available tools doesn’t just mean connecting an IDE to a chatbot. It means thinking about AI systematically rather than optimistically. It means wrapping those LLMs in a harness of linting, static application security testing (SAST), dynamic application security testing (DAST), and automated regression testing.

The developers who will actually be ten times more powerful next year aren’t the ones who trust the AI blindly. They are the ones who treat AI like a brilliant but very junior intern: capable of flashes of genius, but requiring constant supervision to prevent them from deleting the production database.

The skill issue is real. But the skill isn’t speed. The skill is control.

Lire la suite sur InfoWorld