NIST releases new tool to check AI models’ security

lundi 29 juillet 2024, 10:16 , par InfoWorld

The US Department of Commerce’s National Institute of Standards and Technology (NIST) has released a new open source software package, Dioptra, that allows developers to determine what type of attacks would make an AI model perform less effectively.

“Testing the effects of adversarial attacks on machine learning models is one of the goals of Dioptra, a new software package aimed at helping AI developers and customers determine how well their AI software stands up to a variety of adversarial attacks,” the NIST said in a statement.

The software package, available for free download, can also help developers of AI systems quantify the performance reduction of a model so that they can learn how often and under what circumstances the system would fail, the NIST explained.

The release of Dioptra is linked to President Biden’s executive order passed in 2023 that required the NIST to help with model testing.

Along with the new software package, the NIST has also released several documents promoting AI safety and standards in line with the executive order.

One among those documents is the initial public draft of its guidelines for developing foundation models, dubbed Managing Misuse Risk for Dual-Use Foundation Models.

The guidelines outline voluntary practices developers can adopt while designing and building their model to protect it against being misused to cause deliberate harm to individuals, public safety, and national security.

The draft offers seven key approaches for mitigating the risks that models will be misused, along with recommendations on how to implement them and how to be transparent about their implementation.

“Together, these practices can help prevent models from enabling harm through activities like developing biological weapons, carrying out offensive cyber operations, and generating child sexual abuse material and nonconsensual intimate imagery,” the NIST said, adding that it was accepting comments on the draft till September 9.

Companion documents to guidelines about gen AI safety released

The other releases include two guidance documents that will serve as companion resources to the NIST’s AI Risk Management Framework (AI RMF) and Secure Software Development Framework (SSDF), designed to help developers manage the risks of generative AI.

One of the two documents, named the AI RMF Generative AI Profile, provides a list of 12 tentative risks of generative AI and proposes nearly 200 actions that developers can take to manage them.

These risks include a lowered barrier to entry for cybersecurity attacks, the production of mis-and disinformation or hate speech and other harmful content, and generative AI systems confabulating or “hallucinating” output.

The second guidance document, named Secure Software Development Practices for Generative AI and Dual-Use Foundation Models, is designed to be used alongside the Secure Software Development Framework, the NIST said.

While the SSDF is broadly concerned with software coding practices, the companion resource expands the SSDF partly to address the issue of a model being compromised with malicious training data that adversely affects the AI system’s performance, it added.

As part of the NIST’s plan to ensure AI safety, it has further proposed a separate plan for US stakeholders to work with others around the globe on developing AI standards.

Last year in November, China and the US agreed to work together with at least 25 other countries to mitigate any risks arising out of the progression of AI.

The two countries, along with several other countries including the EU, India, Germany, and France, signed an agreement, dubbed Bletchley Declaration, at the UK AI Safety Summit to form a common line of thinking that would oversee the evolution of AI and ensure that the technology is advancing safely.

Lire la suite sur InfoWorld

https://www.infoworld.com/article/3478308/nist-releases-new-tool-to-check-ai-models-security.html

56 sources (32 en français)

Date Actuelle

sam. 20 déc. - 01:41 CET