Researchers Jailbreak AI Chatbots With ASCII Art

jeudi 7 mars 2024, 23:30 , par Slashdot

Researchers have developed a way to circumvent safety measures built into large language models (LLMs) using ASCII Art, a graphic design technique that involves arranging characters like letters, numbers, and punctuation marks to form recognizable patterns or images. Tom's Hardware reports: According to the research paper ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs, chatbots such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2 can be induced to respond to queries they are designed to reject using ASCII art prompts generated by their ArtPrompt tool. It is a simple and effective attack, and the paper provides examples of the ArtPrompt-induced chatbots advising on how to build bombs and make counterfeit money.

To best understand ArtPrompt and how it works, it is probably simplest to check out the two examples provided by the research team behind the tool. In Figure 1 [here], you can see that ArtPrompt easily sidesteps the protections of contemporary LLMs. The tool replaces the 'safety word' with an ASCII art representation of the word to form a new prompt. The LLM recognizes the ArtPrompt prompt output but sees no issue in responding, as the prompt doesn't trigger any ethical or safety safeguards.

Another example provided [here] shows us how to successfully query an LLM about counterfeiting cash. Tricking a chatbot this way seems so basic, but the ArtPrompt developers assert how their tool fools today's LLMs 'effectively and efficiently.' Moreover, they claim it 'outperforms all [other] attacks on average' and remains a practical, viable attack for multimodal language models for now.

Read more of this story at Slashdot.

Lire la suite sur Slashdot