MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
problems
Recherche

AI Systems Solve Just 2% of Advanced Maths Problems in New Benchmark Test

mercredi 13 novembre 2024, 20:22 , par Slashdot
AI Systems Solve Just 2% of Advanced Maths Problems in New Benchmark Test
Leading AI systems are solving less than 2% of problems in a new advanced mathematics benchmark, revealing significant limitations in their reasoning capabilities, research group Epoch AI reported this week.

The benchmark, called FrontierMath, consists of hundreds of original research-level mathematics problems developed in collaboration with over 60 mathematicians, including Fields Medalists Terence Tao and Timothy Gowers. While top AI models like GPT-4 and Gemini 1.5 Pro achieve over 90% accuracy on traditional math tests, they struggle with FrontierMath's problems, which span computational number theory to algebraic geometry and require complex reasoning.

'These are extremely challenging. The only way to solve them is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages,' Tao said. The problems are designed to be 'guessproof,' with large numerical answers or complex mathematical objects as solutions, making it nearly impossible to solve without proper mathematical reasoning.

Further reading: New secret math benchmark stumps AI models and PhDs alike.

Read more of this story at Slashdot.
https://science.slashdot.org/story/24/11/13/1244216/ai-systems-solve-just-2-of-advanced-maths-proble...

Voir aussi

News copyright owned by their original publishers | Copyright © 2004 - 2024 Zicos / 440Network
Date Actuelle
jeu. 21 nov. - 12:50 CET