OpenAI Steps Up Security as ChatGPT Atlas Faces Ongoing Prompt Injection Threats

mardi 23 décembre 2025, 17:33 , par eWeek

OpenAI is tightening the screws on ChatGPT Atlas, its AI-powered browser agent, as the company warns that prompt injection attacks remain a persistent threat that is unlikely to disappear soon.

In a detailed security disclosure published this week, OpenAI explained that it has rolled out a new security update to Atlas’ browser agent after uncovering a new class of prompt injection attacks through internal testing.

The update includes a new adversarially trained model and stronger system safeguards, designed to stop malicious instructions hidden inside everyday web content.

Why ChatGPT Atlas is a bigger target

ChatGPT Atlas’ agent mode allows the AI to browse the web like a human, viewing pages, clicking links, typing text, and completing tasks inside a user’s browser. OpenAI described it as “one of the most general-purpose agentic features we’ve released to date.”

That same power also raises the stakes. “As the browser agent helps you get more done, it also becomes a higher-value target of adversarial attacks,” OpenAI said, adding that prompt injection is “one of the most significant risks we actively defend against.”

Unlike traditional cyberattacks that exploit software bugs or trick human users, prompt injection attacks target the AI itself. Attackers hide malicious instructions in emails, documents, or web pages, hoping the agent will follow them instead of the user’s request.

OpenAI outlined how these attacks could play out in real-world scenarios. An AI agent reviewing emails, for example, could unknowingly process a message containing hidden instructions telling it to forward sensitive documents or take other unauthorized actions.

Because Atlas can perform many of the same actions as a logged-in user, including sending emails, accessing cloud files, or initiating transactions, the impact of a successful attack could be serious. This “adds a new threat vector beyond traditional web security risks,” OpenAI said.

Fighting AI attacks with AI

To stay ahead of attackers, OpenAI has built what it calls an LLM-based automated attacker, an AI system trained using reinforcement learning to actively search for prompt injection vulnerabilities in Atlas.

The automated attacker tests malicious prompts in simulation, studies how the browser agent responds, then refines its strategy and tries again. OpenAI says this loop allows it to discover weaknesses faster than relying solely on human testing.

“Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,” OpenAI wrote.

In one internal demonstration, the automated attacker planted a malicious email in a user’s inbox. When the agent later scanned the inbox for a routine task, it followed the hidden instructions and sent a resignation email to the user’s CEO instead of drafting an out-of-office reply. OpenAI says the latest Atlas update was trained directly against attacks like this.

A problem OpenAI says won’t be ‘solved’

Despite the progress, OpenAI is clear that prompt injection is not going away. In its report, the company stated, “We view prompt injection as a long-term AI security challenge, and we’ll need to continuously strengthen our defenses against it.”

“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved’,” the company added.

While OpenAI works on the technical defenses, it is also urging users to be cautious. The company recommends that users limit the agent’s access to sensitive accounts when possible and always review the agent’s actions before hitting “confirm.”

Crucially, how you speak to the AI matters. OpenAI advises users to “avoid overly broad prompts like ‘review my emails and take whatever action is needed.’” Giving an AI vague, open-ended permission makes it much easier for malicious hidden text to hijack the session.

“Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” OpenAI warned.

Governments and public institutions are increasingly turning to AI to anticipate how communities might respond to policy decisions.
The post OpenAI Steps Up Security as ChatGPT Atlas Faces Ongoing Prompt Injection Threats appeared first on eWEEK.

Lire la suite sur eWeek