Navigation
Recherche
|
China's Moonshot Launches Free AI Model Kimi K2 That Outperforms GPT-4 In Key Benchmarks
mardi 15 juillet 2025, 00:50 , par Slashdot
![]() The model's standout feature is its optimization for 'agentic' capabilities -- the ability to autonomously use tools, write and execute code, and complete complex multi-step tasks without human intervention. In benchmark tests, Kimi K2 achieved 65.8% accuracy on SWE-bench Verified, a challenging software engineering benchmark, outperforming most open-source alternatives and matching some proprietary models. On LiveCodeBench, arguably the most realistic coding benchmark available, Kimi K2 achieved 53.7% accuracy, decisively beating DeepSeek-V3's 46.9% and GPT-4.1's 44.7%. More striking still: it scored 97.4% on MATH-500 compared to GPT-4.1's 92.4%, suggesting Moonshot has cracked something fundamental about mathematical reasoning that has eluded larger, better-funded competitors. But here's what the benchmarks don't capture: Moonshot is achieving these results with a model that costs a fraction of what incumbents spend on training and inference. While OpenAI burns through hundreds of millions on compute for incremental improvements, Moonshot appears to have found a more efficient path to the same destination. It's a classic innovator's dilemma playing out in real time -- the scrappy outsider isn't just matching the incumbent's performance, they're doing it better, faster, and cheaper. Read more of this story at Slashdot.
https://developers.slashdot.org/story/25/07/14/1942209/chinas-moonshot-launches-free-ai-model-kimi-k...
Voir aussi |
56 sources (32 en français)
Date Actuelle
mar. 15 juil. - 05:26 CEST
|