Navigation
Recherche
|
Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates
mardi 27 août 2024, 18:00 , par TheRegister
Faster than you can read? More like blink and you'll miss the hallucination
Hot Chips Inference performance in many modern generative AI workloads is usually a function of memory bandwidth rather than compute. The faster you can shuttle bits in and out of a high-bandwidth memory (HBM) the faster the model can generate a response.…
https://go.theregister.com/feed/www.theregister.com/2024/08/27/cerebras_ai_inference/
Voir aussi |
56 sources (32 en français)
Date Actuelle
mar. 5 nov. - 09:40 CET
|