OpenAI launches GPT-5.2 as it battles Google’s Gemini 3 for AI model supremacy

vendredi 12 décembre 2025, 23:16 , par ComputerWorld

OpenAI has released GPT-5.2, claiming significant gains in the AI model’s ability to complete real-world business tasks to an “expert level” compared to GPT-5.1, released in November.

The new model, available in Instant, Thinking, and Pro performance tiers, offers major improvements across a range of benchmarks, the company said.

Using OpenAI’s GDPval benchmark, which compares the model’s ability to complete 44 different business tasks to the same standards as human experts, GPT-5.2 matched or exceeded human users in 70.9% of tests, compared to GPT-5.1’s 38.8% across the Instant (basic), Thinking (deeper reasoning), and Pro (research-grade) versions.

To illustrate these advances, OpenAI said that GPT-5.2 Thinking could fully format a workforce planning spreadsheet, while on GPT-5.1, the equivalent output assembled the same spreadsheet correctly, but in a more basic state that lacked formatting.

“We designed GPT‑5.2 to unlock even more economic value for people; it’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects,” said OpenAI.

GPT-5.2 also showed a mixture of gains across other important benchmarks, including ARC-AGI-1/ARC-AGI-2 (general problem solving), and SWE-Bench Pro/SWE-Bench Verified (real-world software tasks).

“For everyday professional use, this translates into a model that can more reliably debug production code, implement feature requests, refactor large codebases, and ship fixes end-to-end with less manual intervention,” the company said.

GPT-5.2 has begun rolling out to ChatGPT users, starting with the paid plans. Subscription pricing is unchanged. For API access, GPT-5.2 is priced at $1.75 per one million input tokens, and $14 per one million output tokens, with a 90% discount on cached inputs. Despite this being more expensive than GPT-5.1, OpenAI claimed the model’s greater efficiency meant that “the cost of attaining a given level of quality ended up less expensive due to GPT‑5.2’s greater token efficiency.”

Code red

For OpenAI, the appearance of the new version so soon after the last one represents an important acceleration in its GPT-5 model development. In early December, CEO Sam Altman sent a ‘code red’ emergency memo to OpenAI employees warning that without rapid development of GPT-5, the company risked falling behind Google’s increasingly capable Gemini 3 model

Since then, things appear to have stabilized, with Altman telling CNBC this week that Gemini’s advances had been less significant than first feared, and that the code red state would end by January. However, a noticeable omission from the web announcement was any comparison between GPT-5.2’s performance and that of Gemini 3. Reportedly, a separate press briefing offered only a limited comparison.

Maria Sukhareva, a principal AI analyst at Siemens, questioned OpenAI’s use of benchmarks more generally. “It [GPT-5.2] claims to beat GDPVal, but this is a benchmark developed by OpenAI for OpenAI. Technically there are no obstacles for OpenAI to fine-tune their model for those 44 tasks, while completely failing on everything else,” she pointed out.

“Essentially, the numbers reported by GPT-5.2 are meaningless where one cannot see what data they trained the model on. GPT-5.2 suffers from all the same problems as previous models,” she argued. Sukhareva’s deeper dive on GPT-5.2 benchmarking can be found on her Substack.

Rachid ‘Rush’ Wehbi, CEO of e-commerce platform Sell The Trend, has tested GPT-5.2 under real-world conditions. “GPT-5.2 is doing a lot better when it comes to keeping its train of thought going for longer periods and not falling apart when you throw some layered context at it. For companies, that’s way more important than making a tiny bit of an improvement on some potentially inconsequential benchmark,” he said.

“Benchmarks are fine for showing you’ve made some sort of progress, but they don’t tell you if your model is going to actually hold up in the real world. GPT-5.2 is a step forward, but enterprise AI is still a work in progress.”

According to Bob Hutchins, founder of AI literacy company Human Voice Media, “most enterprise frustration with AI up until now is from the last 20% — the formatting, the constraints, the handoffs. GPT-5.2 shows progress there.” His advice for enterprises was, “ignore the launch noise and run a disciplined trial. GPT-5.2 is a meaningful step. It does not close the gap between promise and practice, it narrows it.”

For example, benchmarking by agentic AI company Vectara’s Hallucination Evaluation Model, found that, while GPT-5.2 has improved on that front, it still lags some competitors.

“OpenAI still has some way to go in improving hallucination performance,” commented Ofer Mendelevitch, Vectara head of developer relations. “GPT-5.2-low-thinking is best in the GPT family so far, ranking 33rd on our leaderboard with an 8.4% hallucination rate. However, ChatGPT 5.2 notably trails DeepSeek V3.2, which ranks 23rd with a hallucination rate of 6.3%. For comparative purposes, Gemini 3’s grounded hallucination rate in our testing was 13.6%, with Grok 4.1 coming in at 17.8%.”

This article originally appeared on InfoWorld.

Lire la suite sur ComputerWorld