Copy-paste vulnerability hits AI inference frameworks at Meta, Nvidia, and Microsoft

vendredi 14 novembre 2025, 13:15 , par InfoWorld

Cybersecurity researchers have uncovered a chain of critical remote code execution (RCE) vulnerabilities in major AI inference server frameworks, including those from Meta, Nvidia, Microsoft, and open-source projects such as vLLM and SGLang.

According to Oligo Security, these vulnerabilities stand out for the way they propagated. Developers copied code containing insecure patterns across projects, effectively transplanting the same flaw into multiple ecosystems.

“These vulnerabilities all traced back to the same root cause: the overlooked unsafe use of ZeroMQ(ZMQ) and Python’s pickle deserialization,” said Avi Lumelsky, security researcher at Oligo. “As we dug deeper, we found that code files were copied between projects (sometimes line-for-line), carrying dangerous patterns from one repository to the next.”

Lumelsky noted in a blog post that Oligo has spent the past year uncovering similar RCE-grade flaws across widely used AI frameworks, pointing to a systemic security gap in the emerging inference ecosystem.

Code reuse contamination

In their investigation, Oligo’s researchers found that the initial trigger was exposed in Meta’s Llama Stack, where a function used ZeroMQ’s “recv-pyobj()” to receive data and then pass it directly to Python’s “pickle.loads().” This allowed arbitrary code execution over unauthenticated sockets.

“If you’ve worked with Python, you know pickle isn’t designed for security,” Lumelsky said. “ It can execute arbitrary code during deserialization, which is fine in a tightly controlled environment, but far from fine if exposed over the network.”

From Meta, the same insecure pattern appeared in other frameworks, including Nvidia’s TensorRT-LLM, vLLM, SGLang, and even the Modular Max Server. They all contained nearly identical code (sometimes with a header comment like “Adapted from vLLM”).

Oligo is calling this the “ShadowMQ” pattern, a hidden communication-layer flaw that jumps from one repository to another via copy-and-paste or minor adaptation, rather than fresh implementation. Because these frameworks are widely reused across the AI ecosystem, the contamination risk becomes systemic–a single vulnerable component can infect many downstream projects.

Oligo reported the flaw (CVE-2024-50050) to Meta in September 2024, which swiftly patched the unsafe pickle usage with JSON-based serialization. Thereon, Oligo flagged the flaw’s replication in vLLM (CVE-2025-30165), NVIDIA TensorRT-LLM(CVE-2025-23254), and Modular Max Server (CVE-2025-60455), all now fixed with suitable replacement logics.

Why this matters for AI infrastructure

The vulnerable inference servers form the backbone of many enterprise-grade AI stacks, processing sensitive prompts, model weights, and customer data. Oligo reported identifying thousands of exposed ZeroMQ sockets on the public internet, some tied to these inference clusters.

If exploited, an attacker could execute arbitrary code on GPU clusters, escalate privileges, exfiltrate model or customer data, or install GPU miners, turning an AI infrastructure asset into a liability.

SGLang has been adopted by several large enterprises, including xAI, AMD, Nvidia, Intel, LinkedIn, Cursor, Oracle Cloud, and Google Cloud, Lumelsky noted.

Oligo recommends upgrading to patched versions, which include versions not earlier than Meta Llama Stack v.0.0.41, Nvidia TensorRT-LLM 0.18.2, vLLM v0.8.0, and Modular Max Server v25.6. Restricting the use of pickle with untrusted data, adding HMAC and TLS authentication to ZQ-based communication, and educating dev teams on the risks were also advised.

Lire la suite sur InfoWorld