Building an analytics architecture for unstructured data and multimodal AI

mercredi 11 juin 2025, 15:10 , par InfoWorld

Data scientists today face a perfect storm: an explosion of inconsistent, unstructured, multimodal data scattered across silos – and mounting pressure to turn it into accessible, AI-ready insights. The challenge isn’t just dealing with diverse data types, but also the need for scalable, automated processes to prepare, analyze, and use this data effectively.

Many organizations fall into predictable traps when updating their data pipelines for AI. The most common: treating data preparation as a series of one-off tasks rather than designing for repeatability and scale. For example, hardcoding product categories in advance can make a system brittle and hard to adapt to new products. A more flexible approach is to infer categories dynamically from unstructured content, like product descriptions, using a foundation model, allowing the system to evolve with the business.

Forward-looking teams are rethinking pipelines with adaptability in mind. Market leaders use AI-powered analytics to extract insights from this diverse data, transforming customer experiences and operational efficiency. The shift demands a tailored, priority-based approach to data processing and analytics that embraces the diverse nature of modern data, while optimizing for different computational needs across the AI/ML lifecycle.

Tooling for unstructured and multimodal data projects

Different data types benefit from specialized approaches. For example:

Text analysis leverages contextual understanding and embedding capabilities to extract meaning;

Video pipelines processing employs computer vision models for classification;

Time-series data uses forecasting engines.

Platforms must match workloads to optimal processing methods while maintaining data access, governance, and resource efficiency.

Consider text analytics on customer support data. Initial processing might use lightweight natural language processing (NLP) for classification. Deeper analysis could employ large language models (LLMs) for sentiment detection, while production deployment might require specialized vector databases for semantic search. Each stage requires different computational resources, yet all must work together seamlessly in production.

Representative AI Workloads

AI Workload TypeStorageNetworkComputeScaling CharacteristicsReal-time NLP classificationIn-memory data stores; Vector databases for embedding storageLow-latency (

Lire la suite sur InfoWorld