Application or one data pipeline for multiple processing needs running in a number of runtimes, on-premise, in the cloud, or locally
Big data pipelines are henceforth at the core of our digital lives where immature code is not tolerated and where the data pipelines need to scale better, have lower latency, run more cheaply, and complete faster.
This means that, rewriting
pipelines to adopt engine-specific APIs, different implementations for
streaming and batch scenarios is not accepted.
Thanks to Dataflow
Java SDK by Google, data Artisans for Apache
Flink, and Cloudera
for Apache Spark, you can move your application or data pipeline
to the appropriate engine, or to the appropriate environment (e.g., from
on-prem to cloud) while keeping the business logic intact.
Henceforth we can talk about the ability to define one data pipeline for multiple processing needs, without tradeoffs, which can be run in a number of runtimes, on-premise, in the cloud, or locally.