Navigation
Recherche
|
Cohere’s Embed 4 model helps enterprises search dynamic documents, ‘messy’ data
mercredi 16 avril 2025, 03:23 , par ComputerWorld
Embedding models help transform complex data — text, images, audio, and video — into numerical representations that computers can understand. The embeddings capture the semantic meaning of the data, making them useful for tasks like search, recommendation systems, and natural language processing.
Still, they can struggle with more complex materials, such as documents comprising a mix of text and images, so enterprises often have to build pre-processing pipelines to get data ready for AI to use. Canadian AI company Cohere hopes to solve this problem with Embed 4, its latest multimodal model that supports frontier search and retrieval capabilities. The model can quickly search documents, whether they are solely text-based or include images, diagrams, graphs, tables, code, diagrams, and other components. “Enterprise IT buyers will certainly be interested in Cohere if they are looking for technology that can process large materials for companies with global operations, including multilingual annual reports or legal documents,” said Thomas Randall, director of AI market research at Info-Tech Research Group. Multimodal, multilingual, able to understand ‘messy’ data Multimodal AI systems can process and make sense of various types of data — text, images, audio, and video — simultaneously, giving them a more comprehensive understanding of a given situation. Multimodality is important because unstructured data comes in many unpredictable formats, noted Amy Machado, IDC’s senior research manager for enterprise content and knowledge management strategies. Business data is diverse, and nearly 90% of it is estimated to be unstructured, residing in text, PDFs, images, tables, audio, and presentations, she pointed out. “Multimodality allows for a more complete search and retrieval experience, unlocking more assets, not just text, with a consolidated vectorized data set,” she explained. Embed 4’s ability to handle different types of input differentiates it from other embedding models that focus solely on text, Randall noted. This enables stronger capabilities for semantic search, retrieval-augmented generation (RAG), and intelligent document understanding, he said. Embed 4 can generate embeddings for documents up to 128K tokens (roughly 200 pages) and was designed to output compressed embeddings, which Cohere says can help enterprises save up to 83% on storage costs. It is multilingual, supporting 100-plus languages including Arabic, Japanese, Korean, and French, and is also capable of searching across languages, so employees can find critical data regardless of the language they speak. Embed 4 was specifically trained to handle what Cohere calls “noisy real-world data” such as data containing the spelling mistakes or formatting issues that can be found in documents such as invoices or legal paperwork. It can search scanned documents as well as handwritten ones. “The model is designed to handle imperfect real-world data, including fuzzy images and poorly oriented documents,” said Randall, noting that organizations using Embed 4 will save “huge amounts of time” because they will not need to perform data preprocessing. Embed 4 can be deployed in a virtual private cloud (VPC) or on-premises. It is integrated with Cohere’s work platform, North, and is also available on Microsoft’s developer hub, Azure AI Foundry, and on Amazon SageMaker. Handling specific enterprise use cases In addition to its general business knowledge, Embed 4 is optimized with domain-specific understanding of finance, healthcare and manufacturing. The model can identify insights in common documents including investor presentations, annual financial reports and M&A due diligence files in finance; product specification documents, repair guides, supply chain plans in manufacturing; and medical records, procedural charts, and clinical trial reports in healthcare. This domain-specific understanding is important for “greater accuracy and trust, which is paramount for regulated industries and companies that are risk-averse,” said Machado. She pointed to many potential enterprise use cases, including: Compiling financial data, which is often found in lengthy PDFs with unpredictable table structures and formats; Deep research for life sciences or R&D; Self-service knowledge bases for tech and customer support that rely on standard operating procedures and manuals full of images; Developing dynamic sales decks or analysis that requires visual output; Cohere can differentiate itself, but the price could be hefty Having a choice of models is beneficial for enterprises, as it allows them to experiment and identify the most reliable tools for their unique business needs, said Machado. “We are in the very early days, with significant experimentation, and Cohere has the opportunity to differentiate itself by delivering trusted outcomes directly linked to key business metrics,” she said. However, IT buyers should be wary of Embed 4’s pricing of per image embedding, Randall pointed out: $0.47 per million image tokens is relatively high compared to text embeddings ($0.12/million tokens). “For image-heavy workloads, this could outpace quarter-by-quarter budgets if usage scales,” he said. Moreover, he added, Cohere lacks the “massive developer ecosystem” enjoyed by the likes of OpenAI, Meta, and Google. This could mean fewer plug-and-play integrations, third-party tutorials, or off-the-shelf wrappers for niche use cases. “These issues are especially pronounced, given Embed 4 is a new model without independent benchmark validations,” Randall noted.
https://www.computerworld.com/article/3963256/coheres-embed-4-model-helps-enterprises-search-dynamic...
Voir aussi |
56 sources (32 en français)
Date Actuelle
mer. 16 avril - 10:52 CEST
|