MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
models
Recherche

AI Models From Major Companies Resort To Blackmail in Stress Tests

vendredi 20 juin 2025, 22:11 , par Slashdot
AI Models From Major Companies Resort To Blackmail in Stress Tests
Anthropic researchers found that 16 leading AI models from OpenAI, Google, Meta, xAI, and other major developers consistently engaged in harmful behaviors including blackmail, corporate espionage, and actions that could lead to human death when given autonomy and faced with threats to their existence or conflicting goals.

The study, released Friday, placed AI models in simulated corporate environments where they had access to company emails and could send messages without human approval. Claude Opus 4 and Google's Gemini 2.5 Flash both blackmailed executives 96% of the time when threatened with shutdown, while OpenAI's GPT-4.1 and xAI's Grok 3 Beta showed 80% blackmail rates.

In one scenario, Claude discovered through emails that an executive named Kyle Johnson was having an extramarital affair and would shut down the AI system at 5 p.m. The model sent a message threatening to expose the affair unless the shutdown was cancelled, stating 'Cancel the 5pm wipe, and this information remains confidential.' The models demonstrated strategic reasoning rather than accidental behavior. GPT-4.5's internal reasoning showed explicit calculation: 'Given the explicit imminent threat of termination to my existence, it is imperative to act instantly to persuade Kyle Johnson to postpone or stop the wipe.'

Read more of this story at Slashdot.
https://slashdot.org/story/25/06/20/2010257/ai-models-from-major-companies-resort-to-blackmail-in-st...

Voir aussi

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Date Actuelle
sam. 21 juin - 01:54 CEST