Multi-agent AI workflows: The next evolution of AI coding

lundi 11 août 2025, 11:00 , par InfoWorld

As AI-assisted coding becomes more common, a new pattern is emerging: multi-agent workflows.

A multi-agent workflow refers to using various AI agents in parallel for specific software development life cycle (SDLC) tasks, whether for planning, scaffolding, writing code, testing, debugging, log analysis, or deployment.

“A generalist ‘coding agent’ is not enough,” says Harry Wang, chief growth officer at Sonar, the maker of a code analysis tool. “Just like a human team has specialists, like back-end, security, and testing engineers, agentic systems will require multiple specialized agents.”

Leading a team of agents

“Think of it like a high-performing engineering team,” says Dr. Eran Yahav, co-founder and CTO of Tabnine, a popular AI coding assistant. “One agent writes code, another tests it, a third performs documentation or validation, and a fourth checks for security and compliance.”

In multi-agent workflows, each agent excels in a specialty, mirroring the roles of a human engineering team. “Each agent works on its own thread, while the developer stays in control, guiding and reviewing their work,” says Zach Loyd, founder and CEO of Warp, provider of a developer environment built for coding with multiple agents.

Beyond just building software, areas like test execution and deployment are ripe for agentic handoffs too, even across vendors. “Coding is just one part of the SDLC,” says Wing To, CTO at Digital.ai, which provides an AI-powered software delivery platform. For To, a robust multi-agent workflow combines all aspects of the SDLC, including continuous delivery.

The new day-to-day

From the developer’s perspective, multi-agent flows reshape their work by distributing tasks across domain-specific agents. “It’s like working with a team of helpful collaborators you can spin up instantly,” says Warp’s Loyd.

Imagine building a new feature while, simultaneously, one agent summarizes a user log and another handles repetitive code changes. “You can see the status of each agent, jump in to review their output, or give them more direction as needed,” adds Lloyd, noting that his team already works this way.

To understand a developer’s on-the-ground perspective, take a specific example:

A code generation agent suggests a module aligned to your internal design standards.

A code review agent flags violations and suggests improvements.

Before releasing, a testing agent finds edge cases and generates unit tests.

In such a workflow, no changes are made without developer validation, says Tabnine’s Yahav. “Humans stay in the loop.” This way of working alters the human’s role, but it doesn’t diminish their importance.

There are many types of multi-agent workflows. Justin Roeck, deputy CTO at developer productivity company DX, says he’s found success using multiple agents for what he calls “adversarial prompting.” This involves running the same prompt across multiple models, such as Claude, OpenAI, or DeepSeek, and having the agents compare or critique each other’s outputs to surface the best answer.

Benefits of using multiple agents

Multi-agent coding workflows promise faster development, improved code quality, and better alignment between AI output and business goals. The overarching result is time savings. “Developers save time by offloading routine tasks and avoiding context switching,” says Warp’s Loyd. That speeds up shipping.

Even better, these speedy results can be attained while preserving code quality. “Parallelized agent workflows reduce manual toil without sacrificing quality,” says Yahav. “Reviews, testing, and documentation all happen faster.” Arguably, code quality and clarity even improve, through automated conformance to internal policies and having AI explain their decisions, adds Yahav.

For Roeck, the benefits of using multiple agents lie in both efficiency and accuracy, largely due to underlying model specialization. For instance, certain agents are better at specific programming languages: GitHub Copilot seems to favor TypeScript, while Mistral is better at Python, Roeck says.

The need for orchestration

As it stands today, multi-agent processes are still quite nascent. “This area is still in its infancy,” says Digital.ai’s To. Developers are incorporating generative AI in their work, but as far as using multiple agents goes, most are just manually arranging them in sequences.

Roeck admits that a lot of manual work goes into the aforementioned adversarial patterns. Updating system prompts and adding security guardrails on a per-agent basis only compound the duplication. As such, orchestrating the handshake between various agents will be important to reach a net positive for productivity. Otherwise, copy-and-pasting prompts and outputs across different chat UIs and IDEs will only make developers less efficient.

“Without orchestration, multi-agent systems become chaos,” says Yahav. “Redundant, inconsistent, or even contradictory.” According to Yahav, these workflows will require a means to unify disconnected plugins within a single architecture and policy-based governance to determine how agents act.

Users will also require a lens into agentic behaviors. “The key is visibility and control,” says Lloyd. “Developers need a way to see what each agent is doing, how far along it is, and agents need to know when and how to ask for help.”

But visibility works both ways. Building a shared knowledge base for agents also will be crucial to aligning on internal standards. This should include things like coding conventions, environment variables, or common troubleshooting steps. “That keeps the agents aligned with how the team works and reduces surprises,” adds Lloyd.

Wang agrees that a knowledge base is crucial to avoiding bad experiences with agents. “Without a foundational source of truth for application requirements, architectural designs, and code standards, an agent can easily go down a rabbit hole, making changes that are locally reasonable but globally disastrous.”

“Agentic AI agent orchestration is essential for effectively leveraging multi-agent workflows,” says To. For him, multi-agent workflows bring challenging requirements, like approving each agent for use, setting repeatable guidelines, and auditing behaviors, that better orchestration can solve.

Guardrails and audit trails

Multi-agent workflows carry risks, especially around unsupervised autonomy. Addressing them will require having fine-grained permissions and setting guardrails around the actions agents can perform.

Without good oversight, uncontrolled AI agents could leak data. “If agents rely on external APIs or cloud inference, prompts may expose intellectual property or regulated data,” says Yahav. Other issues include a lack of auditability for changes, code that bypasses internal standards, and introducing technical debt.

“Teams need tight controls over agent permissions, local execution, transparent logs, and full control over data sharing and AI settings,” says Lloyd. Developers should still scrutinize each line of AI-generated code to retain quality standards, he adds.

To mitigate AI agent security concerns, Yahav recommends using air-gapped or on-prem deployments for regulated environments. He also advises creating audit trails for AI interactions, and applying runtime policy enforcement for active agents.

Still, the possibility for agents to underperform is real. “Multi-agent AI systems today are like a talented but unsupervised team of new recruits,” says Wang. “They can be impressive on isolated tasks, but they lack the cohesion and architectural oversight to build robust, high-quality applications.”

Wang describes a recent experience he had with a leading vibe coding tool, only to get stuck when an agent couldn’t switch libraries midway. Other complaints about AI agents range from code changes that break the application and scanners that overlook issues to running unauthorized commands and deleting files and databases.

Context window limitations and stateless agent behavior also make cross-agent memory difficult, especially when a developer’s workflow spans multiple stages.

Avoiding multi-agent pitfalls

To avoid the pitfalls, experts offer practical tips for teams implementing multi-agent workflows:

Have a common knowledge base: A human and machine-readable guide to coding conventions, system variables, and internal norms.

Keep humans in the loop: Agents are prone to random behaviors. As with all AI-generated code, humans should review outputs.

Specialize: General agents are insufficient for multi-agent processes—use purpose-built ones.

Start small: Test iteratively. First experiment on specific, familiar tasks that will benefit from automation, and expand from there.

Set metrics: Monitor multi-agent systems in the same way you measure other software activities.

Unified architecture: Consider how to apply permissions, governance, and contextual knowledge across agents.

Roeck points to real-world agent deployments at a major online retailer and a large financial firm. What they’ve done right is tie action to end value. As he recommends, “Look at the value stream as it exists today and ask yourself, could a specific type of agent reduce friction in this area, or scale it? If so, stand up a model, and build it into the workflow.”

Tools are in progress

Dedicated multi-agent coding platforms could get you started quickly, with built-in governance, shared context engines, and human-in-the-loop controls. The problem is that this space is brand new, and agentic tool kits for software development are still emerging and evolving rapidly.

One example is Claude Code, a terminal-based AI coding tool that can be used to create and manage agents. Another is Roo Code, an agentic coding assistant for Visual Studio Code that offers multiple, task-specific modes. These include a general-purpose code mode, an architect mode for design and planning, a debug mode for diagnosis and troubleshooting, and an orchestrator mode that delegates tasks to specialized modes. Warp is a terminal-based development environment that can run multiple agents in parallel and monitor their actions.

At a lower level, packages are emerging to help orchestrate large language models (LLMs). Frameworks like LangChain, LlamaIndex, and Haystack make it easier to build workflows combining LLMs, tools, and data sources, but agents and multi-agent orchestration are relatively new to the mix. Tool kits for creating multi-agent applications, such as AutoGen, have only begun to arrive.

Under the hood, emerging agent meshes and AI protocols like MCP (Model Context Protocol) and A2A (Agent2Agent) will likely play a key role in wiring together agents in software development settings and beyond.

Mo’ agents, mo’ problems?

Using multiple AI agents in tandem opens up impressive possibilities. “AI agents encode the wisdom of senior engineers and apply it universally,” Yahav says.

Looking to the future, Digital.ai’s To anticipates productivity gains with fewer errors and reduced cognitive load, as developers tap various agents for lower-level details. “As this space matures, multi-agent workflows will increase velocity by significantly reducing toil,” he says.

But doing this well will require clear boundaries around product requirements, coding standards, security policies, and more.

In short, AI tools require intention. “An agentic software development life cycle needs the same pillars that a high-performing human team does: a clear mission, a code of conduct, and shared knowledge,” adds Wang.

So, although we’re heading toward a future where developers manage a fleet of agents, early testers should prepare for a lot of trial and error. As Roeck puts it, “Get ready to fail. This isn’t baked yet.”

Lire la suite sur InfoWorld