MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
model
Recherche

Chinese Firm Trains Massive AI Model for Just $5.5 Million

vendredi 27 décembre 2024, 05:21 , par Slashdot
Chinese Firm Trains Massive AI Model for Just $5.5 Million
Chinese AI startup DeepSeek has released what appears to be one of the most powerful open-source language models to date, trained at a cost of just $5.5 million using restricted Nvidia H800 GPUs.

The 671-billion-parameter DeepSeek V3, released this week under a permissive commercial license, outperformed both open and closed-source AI models in internal benchmarks, including Meta's Llama 3.1 and OpenAI's GPT-4 on coding tasks.

The model was trained on 14.8 trillion tokens of data over two months. At 1.6 times the size of Meta's Llama 3.1, DeepSeek V3 requires substantial computing power to run at reasonable speeds.

Andrej Karpathy, former OpenAI and Tesla executive, comments: For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints.

Does this mean you don't need large GPU clusters for frontier LLMs? No but you have to ensure that you're not wasteful with what you have, and this looks like a nice demonstration that there's still a lot to get through with both data and algorithms.

Read more of this story at Slashdot.
https://slashdot.org/story/24/12/27/0420235/chinese-firm-trains-massive-ai-model-for-just-55-million...

Voir aussi

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Date Actuelle
mer. 1 janv. - 17:15 CET