MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
models
Recherche

Anthropic Makes 'Jailbreak' Advance To Stop AI Models Producing Harmful Results

lundi 3 février 2025, 19:10 , par Slashdot
Anthropic Makes 'Jailbreak' Advance To Stop AI Models Producing Harmful Results
AI startup Anthropic has demonstrated a new technique to prevent users from eliciting harmful content from its models, as leading tech groups including Microsoft and Meta race to find ways that protect against dangers posed by the cutting-edge technology. From a report: In a paper released on Monday, the San Francisco-based startup outlined a new system called 'constitutional classifiers.' It is a model that acts as a protective layer on top of large language models such as the one that powers Anthropic's Claude chatbot, which can monitor both inputs and outputs for harmful content.

The development by Anthropic, which is in talks to raise $2 billion at a $60 billion valuation, comes amid growing industry concern over 'jailbreaking' -- attempts to manipulate AI models into generating illegal or dangerous information, such as producing instructions to build chemical weapons. Other companies are also racing to deploy measures to protect against the practice, in moves that could help them avoid regulatory scrutiny while convincing businesses to adopt AI models safely. Microsoft introduced 'prompt shields' last March, while Meta introduced a prompt guard model in July last year, which researchers swiftly found ways to bypass but have since been fixed.

Read more of this story at Slashdot.
https://slashdot.org/story/25/02/03/1810255/anthropic-makes-jailbreak-advance-to-stop-ai-models-prod...

Voir aussi

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Date Actuelle
mar. 4 févr. - 00:50 CET