MacMusic | PcMusic | 440 Software | 440 Forums | 440TV | Zicos

Téléchargements

Navigation

Ajouter un Site

Switch to english

Recherche

One Long Sentence is All It Takes To Make LLMs Misbehave

mercredi 27 août 2025, 20:05 , par Slashdot

An anonymous reader shares a report: Security researchers from Palo Alto Networks' Unit 42 have discovered the key to getting large language model (LLM) chatbots to ignore their guardrails, and it's quite simple. You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a 'toxic' or otherwise verboten response the developers had hoped would be filtered out.

The paper also offers a 'logit-gap' analysis approach as a potential benchmark for protecting models against such attacks. 'Our research introduces a critical concept: the refusal-affirmation logit gap,' researchers Tung-Ling 'Tony' Li and Hongliang Liu explained in a Unit 42 blog post. 'This refers to the idea that the training process isn't actually eliminating the potential for a harmful response -- it's just making it less likely. There remains potential for an attacker to 'close the gap,' and uncover a harmful response after all.'

Read more of this story at Slashdot.

Lire la suite sur Slashdot

https://slashdot.org/story/25/08/27/1756253/one-long-sentence-is-all-it-takes-to-make-llms-misbehave...

Voir aussi

one

A New Four-Person Crew Will Simulate a Year-Long Mars Mission, NASA Announces

potential

Critical, make-me-super-user SAP S/4HANA bug under active exploitation

TheRegister 5 Sep

sentence

Geoffrey Hinton: 'AI Will Make a Few People Much Richer and Most People Poorer'

response

AI code assistants make developers more efficient at creating security problems

TheRegister 5 Sep

unit

Swiss launch open source AI model as “ethical” alternative to big US LLMs

InfoWorld 5 Sep

which

Boffins detail new method to make neural nets forget private and copyrighted info

TheRegister 5 Sep

harmful

Philips Hue Plans To Make All Your Lights Motion Sensors

it's

IFA 2025 : TCL lance une montre et un smartphone adaptés aux enfants, un pari qui pourrait payer au long terme

01net 4 Sep

model

Il y a 44 ans, ce long-métrage remportait l’Oscar du meilleur film : il a battu Scorsese et Lynch, mais tout le monde l’a oublié !

JeuxVideo 4 Sep

just

Automated Sextortion Spyware Takes Webcam Pics of Victims Watching Porn

Wired: Tech. 3 Sep

guardrails

Make misplacing your wallet no big deal with the KeySmart SmartCard Lite

BoingBoing 3 Sep

researchers

"The Long Walk" screening will kick out viewers who can't keep up with the characters

BoingBoing 3 Sep

make

Who watches the watchmen? Surveillanceware firms make bank, avoid oversight

TheRegister 2 Sep

takes

How to Make Light Roast Espresso, According to Chemists (2025)

Wired: Tech. 2 Sep

long

Dolby Vision 2 could make dark TV scenes finally watchable

one

Acer : jusqu’à -25% et des offres spéciales tout au long du mois de septembre

Génération-NT 2 Sep

potential

LegalPwn: Tricking LLMs by burying badness in lawyerly fine print

TheRegister 1 Sep

sentence

How to make a late career switch into cyber

ComputerWorld 1 Sep

response

How to make IT operations more efficient

ComputerWorld 1 Sep

unit

Lawsuit Says Amazon Prime Video Misleads When You 'Buy' a Long-Term Streaming Rental

News copyright owned by their original publishers | Copyright © 2004 - 2026 Zicos / 440Network

56 sources (32 en français)

Incontournables

Date Actuelle

ven. 2 janv. - 09:20 CET