Google unveils Gemma 3 multi-modal AI models

jeudi 13 mars 2025, 03:39 , par InfoWorld

Google DeepMind has introduced Gemma 3, an update to the company’s family of generative AI models, featuring multi-modality that allows the models to analyze images, answer questions about images, identify objects, and perform other tasks that involve analyzing and understanding visual data.

The update was announced March 12 and can be tried out in Google AI Studio for AI development. Gemma 3 also significantly improves math, coding, and instruction following capabilities, according to Google DeepMind.

Gemma 3 supports vision-language inputs and text outputs, handles context windows up to 128k tokens, and understands more than 140 languages. Improvements also were made for math, reasoning, and chat, including structured outputs and function calling. Gemma 3 comes in four “developer friendly” sizes of 1B, 4B, 12B, and 27B and in pre-trained and general-purpose instruction-tuned versions. “The 128k-token context window allows Gemma 3 to process and understand massive amounts of information, easily tackling complex tasks,” Google DeepMind’s announcement said.

Developers have multiple deployment options, such as Cloud Run and Google GenAI API. An open-weight LLM library, Gemma 3 features a revamped code base, with recipes for inference and fine-tuning. Gemma 3 model weights can be downloaded from Kaggle and Hugging Face.

Nvidia has direct support for Gemma 3 models for maximum performance on GPUs of any size, from Jetson Nano to the most-recent Blackwell chips. Gemma 3 also is optimized for Google Cloud TPUs and integrates with AMD GPUs. For executing on GPUs, users can leverage Gemma.cpp.

Google DeepMind on March 12 also announced ShieldGemma 2, a 4B parameter model built on Gemma 3 that checks the safety of synthetic and natural images against key categories to help build robust data sets and models. ShieldGemma 2 is recommended for use as an input filter to vision language models or as an output filter of image generation systems. ShieldGemma 2 allows developers to minimize the risk of harmful content such as content that is sexually explicit, dangerous, or violent, Google DeepMind said.

Lire la suite sur InfoWorld

https://www.infoworld.com/article/3844489/google-unveils-gemma-3-multi-modal-ai-models.html

56 sources (32 en français)

Date Actuelle

mar. 16 déc. - 21:36 CET