MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
data
Recherche

Snowflake launches Openflow to tackle AI-era data ingestion challenges

mardi 3 juin 2025, 15:00 , par InfoWorld
Snowflake (Nasdaq:SNOW) has introduced a new multi-modal data ingestion service — Openflow — designed to help enterprises solve challenges around data integration and engineering in the wake of demand for generative AI and agentic AI use cases.

Ingestion of unstructured data, such as audio and images, of course, is critical to any advanced analytics or AI use case as it complements structured data to produce more context and insights for large language models.

To that extent, Openflow drives value for Snowflake and its customers as it supports batch, streaming, and change data capture (CDC) pipelines from both structured and unstructured sources, according to Marlanna Bozicevich, research analyst at IDC.

“When combined with Snowpipe Streaming, file-based ingest, and third-party connector support, Openflow also provides a scalable ‘data-in-motion’ experience across any data system,” Bozicevich said.

The research analyst also expects Openflow’s data streaming ingestion capabilities to be handy for enterprises as real-time processing is set to become more crucial with the proliferation of agentic AI.

“Data integration and data engineering continue to be the biggest challenges in AI and analytics initiatives,” said David Menninger, Executive Director of software research at ISG.  According to Menninger, to build an accurate generative AI-driven application or agentic application quickly, enterprises need to gather all their data — both structured and unstructured, transform and harmonize it. It is even better if this entire process can be automated while offering observability and governance.

Snowflake’s EVP of product, Christian Kleinerman, pointed out that Openflow does all the above without much intervention as it is a managed service — unlike the existing Snowflake connectors which have to be maintained by enterprises for data ingestion.

Support for unstructured data ingestion

So far, Snowflake lagged behind other data platform vendors in data integration capabilities, especially being largely incapable of moving and processing unstructured data as it relied mostly on SQL processing and partners, Menninger pointed out.

Another advantage for enterprises is the reduction of cost, complexity, and maintenance overhead of using data ingestion tools, according to Chris Deaner, managing director of the technology and experience practice at consulting firm West Monroe. Previously, when Snowflake had limited native ingestion capabilities, enterprises had to add tools like Fivetran or Matillion to their data engineering processes, Deaner said.

Openflow’s origins and how it works

Openflow is based on open source Apache NiFi — a dataflow system based on the concepts of Flow-based programming aimed at securely automating event streams, and generative AI data pipelines and distribution.

To integrate NiFi into its product stack, Snowflake last year acquired Datavolo — a company founded by NiFi co-creators that offered data ingestion based on open-source technology.  

Openflow, which gets its extensibility and observability features from NiFi, works by ingesting, transforming, and persisting data to Snowflake tables, from where it can be used for AI and analytics.

However, unlike most data ingestion services or tools, Openflow’s data transformation process includes semantic chunking and these chunks can be later used for retrieval by AI-based applications or models, the company said.

To accelerate the data transformation phase, Openflow uses Arctic large language models (LLMs) for steps, such as summarization of chunks and generation of descriptions of images within documents, it added.

The service also takes note of any metadata changes, especially authorization metadata, at source and persists them in Snowflake.

Openflow faces competition from Databricks’ Lakeflow, which also has the capability to ingest, transform, and integrate data, including unstructured and streaming data.Lakeflow, which connects to Databricks’ Data Intelligence platform, also gets governance capabilities via the data analytics platform provider’s Unity Catalog, said Bradley Shimmin, lead for the data and analytics practice at The Futurum Group.

Ability to build customer connectors

Although Openflow is a managed service, enterprises and data professionals will be able to build their own customer connectors within the service, according to Snowflake’s Kleinerman.

“Within this managed service, developers and data engineers can use hundreds of first-party Openflow processors (NiFi building blocks) to build custom connectors in minutes,” Kleinerman said, adding that developers can also use the Apache NiFi SDK to build custom processors and deploy them in Openflow. 

Snowflake is also partnering with major vendors, such as Salesforce, ServiceNow, Oracle, Microsoft, Adobe, Box, and Zendesk, for accelerating data ingestion via Openflow.These partnerships, according to Shimmin, add a level of enterprise-grade assurance for customers that data is going to move smoothly and at scale between two systems.

Availability and pricing

Enterprises have the options of running Openflow inside their Snowflake virtual private cloud (VPC) via Snowpark Container Services or a VPC supported by AWS, Azure, and Google Cloud.

“The option to choose a VPC supported by a hyperscaler will offer enterprise customers flexibility and choice over where their integration pipelines are deployed and where the runtimes are,” said Saptarshi Mukherjee, director of product management at Snowflake.

“It also enables greater customizations for private networking topologies, and allows customers to leverage their existing cloud provider pricing, while pre-processing data locally to adhere to specific data privacy regulations,” Mukherjee added.Currently, Openflow implementation via Snowpark Container Services, Azure, and Google Cloud is in private preview. However, the ability to implement on a VPC via AWS has been made generally available. In deployments where a hyperscaler is involved, enterprises will pay for the compute and infrastructure to the hyperscaler and Snowflake will levy charges for data ingestion and telemetry.
https://www.infoworld.com/article/4000742/snowflake-launches-openflow-to-tackle-ai-era-data-ingestio...

Voir aussi

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Date Actuelle
jeu. 5 juin - 14:43 CEST