Navigation
Recherche
|
New Hack Uses Prompt Injection To Corrupt Gemini's Long-Term Memory
mercredi 12 février 2025, 04:30 , par Slashdot
![]() 1. A user uploads and asks Gemini to summarize a document (this document could come from anywhere and has to be considered untrusted). 2. The document contains hidden instructions that manipulate the summarization process. 3. The summary that Gemini creates includes a covert request to save specific user data if the user responds with certain trigger words (e.g., 'yes,' 'sure,' or 'no'). 4. If the user replies with the trigger word, Gemini is tricked, and it saves the attacker's chosen information to long-term memory. As the following video shows, Gemini took the bait and now permanently 'remembers' the user being a 102-year-old flat earther who believes they inhabit the dystopic simulated world portrayed in The Matrix. Based on lessons learned previously, developers had already trained Gemini to resist indirect prompts instructing it to make changes to an account's long-term memories without explicit directions from the user. By introducing a condition to the instruction that it be performed only after the user says or does some variable X, which they were likely to take anyway, Rehberger easily cleared that safety barrier. Google responded in a statement to Ars: 'In this instance, the probability was low because it relied on phishing or otherwise tricking the user into summarizing a malicious document and then invoking the material injected by the attacker. The impact was low because the Gemini memory functionality has limited impact on a user session. As this was not a scalable, specific vector of abuse, we ended up at Low/Low. As always, we appreciate the researcher reaching out to us and reporting this issue.' Rehberger noted that Gemini notifies users of new long-term memory entries, allowing them to detect and remove unauthorized additions. Though, he still questioned Google's assessment, writing: 'Memory corruption in computers is pretty bad, and I think the same applies here to LLMs apps. Like the AI might not show a user certain info or not talk about certain things or feed the user misinformation, etc. The good thing is that the memory updates don't happen entirely silently -- the user at least sees a message about it (although many might ignore).' Read more of this story at Slashdot.
https://it.slashdot.org/story/25/02/12/0011205/new-hack-uses-prompt-injection-to-corrupt-geminis-lon...
Voir aussi |
56 sources (32 en français)
Date Actuelle
mer. 12 févr. - 23:05 CET
|