DeepMind Details All the Ways AGI Could Wreck the World

vendredi 4 avril 2025, 01:40 , par Slashdot

An anonymous reader quotes a report from Ars Technica, written by Ryan Whitwam: Researchers at DeepMind have... released a new technical paper (PDF) that explains how to develop AGI safely, which you can download at your convenience. It contains a huge amount of detail, clocking in at 108 pages before references. While some in the AI field believe AGI is a pipe dream, the authors of the DeepMind paper project that it could happen by 2030. With that in mind, they aimed to understand the risks of a human-like synthetic intelligence, which they acknowledge could lead to 'severe harm.' This work has identified four possible types of AGI risk, along with suggestions on how we might ameliorate said risks. The DeepMind team, led by company co-founder Shane Legg, categorized the negative AGI outcomes as misuse, misalignment, mistakes, and structural risks.

The first possible issue, misuse, is fundamentally similar to current AI risks. However, because AGI will be more powerful by definition, the damage it could do is much greater. A ne'er-do-well with access to AGI could misuse the system to do harm, for example, by asking the system to identify and exploit zero-day vulnerabilities or create a designer virus that could be used as a bioweapon. DeepMind says companies developing AGI will have to conduct extensive testing and create robust post-training safety protocols. Essentially, AI guardrails on steroids. They also suggest devising a method to suppress dangerous capabilities entirely, sometimes called 'unlearning,' but it's unclear if this is possible without substantially limiting models. Misalignment is largely not something we have to worry about with generative AI as it currently exists. This type of AGI harm is envisioned as a rogue machine that has shaken off the limits imposed by its designers. Terminators, anyone? More specifically, the AI takes actions it knows the developer did not intend. DeepMind says its standard for misalignment here is more advanced than simple deception or scheming as seen in the current literature.

To avoid that, DeepMind suggests developers use techniques like amplified oversight, in which two copies of an AI check each other's output, to create robust systems that aren't likely to go rogue. If that fails, DeepMind suggests intensive stress testing and monitoring to watch for any hint that an AI might be turning against us. Keeping AGIs in virtual sandboxes with strict security and direct human oversight could help mitigate issues arising from misalignment. Basically, make sure there's an 'off' switch. If, on the other hand, an AI didn't know that its output would be harmful and the human operator didn't intend for it to be, that's a mistake. We get plenty of those with current AI systems -- remember when Google said to put glue on pizza? The 'glue' for AGI could be much stickier, though. DeepMind notes that militaries may deploy AGI due to 'competitive pressure,' but such systems could make serious mistakes as they will be tasked with much more elaborate functions than today's AI. The paper doesn't have a great solution for mitigating mistakes. It boils down to not letting AGI get too powerful in the first place. DeepMind calls for deploying slowly and limiting AGI authority. The study also suggests passing AGI commands through a 'shield' system that ensures they are safe before implementation.

Lastly, there are structural risks, which DeepMind defines as the unintended but real consequences of multi-agent systems contributing to our already complex human existence. For example, AGI could create false information that is so believable that we no longer know who or what to trust. The paper also raises the possibility that AGI could accumulate more and more control over economic and political systems, perhaps by devising heavy-handed tariff schemes. Then one day, we look up and realize the machines are in charge instead of us. This category of risk is also the hardest to guard against because it would depend on how people, infrastructure, and institutions operate in the future.

Read more of this story at Slashdot.

Lire la suite sur Slashdot

https://tech.slashdot.org/story/25/04/03/2236242/deepmind-details-all-the-ways-agi-could-wreck-the-w...

56 sources (32 en français)

Date Actuelle

dim. 14 déc. - 09:44 CET