Application or one data pipeline for multiple processing needs running in a number of runtimes, on-premise, in the cloud, or locally



Big data pipelines are henceforth at the core of our digital lives where immature code is not tolerated and where the data pipelines need to scale better, have lower latency, run more cheaply, and complete faster.

 

This means that, rewriting pipelines to adopt engine-specific APIs, different implementations for streaming and batch scenarios is not accepted.

Thanks to Dataflow Java SDK  by Google, data Artisans  for Apache Flink, and Cloudera  for Apache Spark, you can move your application or data pipeline to the appropriate engine, or to the appropriate environment (e.g., from on-prem to cloud) while keeping the business logic intact.

Henceforth we can talk about the ability to define one data pipeline for multiple processing needs, without tradeoffs, which can be run in a number of runtimes, on-premise, in the cloud, or locally.

Popular Posts