Navigation
Recherche
|
AI Models Still Struggle To Debug Software, Microsoft Study Shows
vendredi 11 avril 2025, 07:20 , par Slashdot
![]() The study's co-authors tested nine different models as the backbone for a 'single prompt-based agent' that had access to a number of debugging tools, including a Python debugger. They tasked this agent with solving a curated set of 300 software debugging tasks from SWE-bench Lite. According to the co-authors, even when equipped with stronger and more recent models, their agent rarely completed more than half of the debugging tasks successfully. Claude 3.7 Sonnet had the highest average success rate (48.4%), followed by OpenAI's o1 (30.2%), and o3-mini (22.1%). Read more of this story at Slashdot.
https://developers.slashdot.org/story/25/04/11/0519242/ai-models-still-struggle-to-debug-software-mi...
Voir aussi |
56 sources (32 en français)
Date Actuelle
mar. 15 avril - 07:15 CEST
|