Apache Flink integrates AI for real-time decision-making

samedi 2 août 2025, 00:12 , par InfoWorld

The Apache Flink Project Management Committee (PMC) has released Apache Flink 2.1.0, a major upgrade to the real-time data processing engine that adds support for defining and managing AI models and for invoking them in real time within Flink SQL. The latter capability lays the foundation for building end-to-end, real-time AI workflows, according to the Apache Flink PMC.

Announced July 31, Apache Flink 2.1.0 can be downloaded from flink.apache.org.

For AI support, Apache Flink 2.1 adds Model DDL (Data Definition Language) Table API support, enabling users to define and manage AI models programmatically via the Table API in both Java and Python. This provides a flexible, code-driven alternative to SQL for model management and integration within Flink applications. Additionally, the ML_PREDICT table-valued function (TVF) has been expanded to perform real-time model inference in SQL queries, applying machine learning models to data streams seamlessly, according to the PMC. The implementation supports both Flink built-in model providers (OpenAI) and interfaces for users to define custom model providers, accelerating Flink’s evolution from a real-time data processing engine to a unified real-time AI platform, the PMC said.

With the 2.1 release, Apache Flink also now supports Process Table Functions (PTFs), the most powerful kind of function for Flink SQL and Table API, the PMC said. Conceptually, a PTF is a superset of all other user-defined functions, mapping zero, one, or multiple tables to zero, one, or multiple rows. This enables implementing user-defined operators that can be as feature-rich as built-in operations, the PMC said. PTFs have access to Flink’s managed state, event-time, table change logs, and timer services.

Apache Flink 2.1 also adds VARIANT as a data type for semi-structured data such as JSON. This new type supports storing any semi-structured data including ARRAY, MAP (with STRING keys), and scalar types while preserving field type information in a JSON-like structure. Unlike the ROW and STRUCTURED types, VARIANT provides superior flexibility for handling deeply nested and evolving schemas. Users can use PARSE_JSON or TRY_PARSE_JSON to convert JSON-formatted VARCHAR data to VARIANT.

Also in Apache Flink 2.1:

A DeltaJoin operator has been introduced in stream processing jobs, along with optimizations for a simple streaming join pipeline.

Smile binary format support has been added for compiled plans, providing a memory-efficient alternative to JSON for serialization and deserialization.

For the runtime, a pluggable batching mechanism for Async Sink has been introduced that allows users to define custom batching write strategies tailored to specific requirements.

A new connector for keyed state allows users to query keyed state directly from a checkpoint or savepoint using Flink SQL, making it easier to inspect, debug, and validate the state of Flink jobs without custom tooling.

Lire la suite sur InfoWorld