MacMusic | PcMusic | 440 Software | 440 Forums | 440TV | Zicos

Téléchargements

Navigation

Ajouter un Site

Switch to english

Recherche

Search-capable AI agents may cheat on benchmark tests

samedi 23 août 2025, 16:32 , par TheRegister

Data contamination can make models seem more capable than they really are
Researchers with Scale AI have found that search-based AI models may cheat on benchmark tests by fetching the answers directly from online sources rather than deriving those answers through a 'reasoning' process.…

Lire la suite sur TheRegister

https://go.theregister.com/feed/www.theregister.com/2025/08/23/searchcapable_ai_agents_may_cheat/

Voir aussi

benchmark

Google Slides cheat sheet: How to get started

ComputerWorld 4 Sep

cheat

30 minutes de gameplay pour le plus 007 des agents secrets !

CowcotLand 4 Sep

tests

Proactive agents bring AI to data analysis teams

may

Google Critics Think the Search Remedies Ruling is a Total Whiff

models

The Google-Apple search deal judgment: Should genAI firms worry?

ComputerWorld 3 Sep

Microsoft PowerToys 0.94 adds shortcut conflict detection, fuzzy search and more!

answers

Google Gets To Keep Chrome But Is Barred From Exclusive Search Deals, Judge Rules

agents

Microsoft researchers develop new tech for video AI agents

ComputerWorld 2 Sep

benchmark

Meet the Guys Betting Big on AI Gambling Agents

Wired: Tech. 2 Sep

cheat

Rare Snail Has a 1-in-40,000 Chance of Finding a Mate. New Zealand Begins the Search

tests

Instagram adds new DM tools and tests picture-in-picture video

may

Battlefield 6 Dev Apologizes For Requiring Secure Boot To Power Anti-Cheat Tools

models

OpenAI adds MCP and SIP support to gpt-realtime for smarter voice-based agents

InfoWorld29 Aug

How does China keep stealing our stuff, wonders DoD group responsible for keeping foreign agents out

TheRegister28 Aug

answers

Google and Zed push protocol to pry AI agents out of VS Code's clutches

TheRegister28 Aug

agents

New procedural memory framework promises cheaper, more resilient AI agents

ComputerWorld28 Aug

benchmark

FBI and Secret Service agents deployed to handle $25 weed buys in DC

BoingBoing27 Aug

cheat

Gartner vous recommande de mettre les bouchées doubles sur les agents d'IA, mais vous dit aussi qu'ils sont surfaits...

tests

LinkedIn says personal networks are trusted more than AI or search

may

Asahi, Nikkei sue AI search outfit Perplexity for copyright infringement

TheRegister26 Aug

News copyright owned by their original publishers | Copyright © 2004 - 2026 Zicos / 440Network

56 sources (32 en français)

Incontournables

Date Actuelle

dim. 4 janv. - 05:53 CET