Meta's AI safety system defeated by the space bar

lundi 29 juillet 2024, 23:01 , par TheRegister

'Ignore previous instructions' thwarts Prompt-Guard model if you just add some good ol' ASCII code 32
Meta's machine-learning model for detecting prompt injection attacks – special prompts to make neural networks behave inappropriately – is itself vulnerable to, you guessed it, prompt injection attacks.…

Lire la suite sur TheRegister