Apache Hadoop MapReduce, Apache Spark and Apache Flink adjusted for your business logic
Everyone is aware of Apache Hadoop MapReduce and Apache Spark as the obvious engine for all things big data; also Apache Flink, as a streaming-native engine.
However more often, these
modern engines require rewriting pipelines to adopt engine-specific APIs, often
with different implementations for streaming and batch scenarios.
On behalf of scalability,
lower latency, flexibility, performance and agility, henceforth, thanks to Dataflow Java SDK by
Google, data Artisans for Apache Flink, and Cloudera for Apache Spark, you can move your application or data pipeline
to the appropriate engine, or to the appropriate environment (from on-prem to
cloud) while keeping the business logic intact.
You can write one portable data pipeline, which can be used for either
batch or stream, and executed in a number of runtimes including Flink, Spark,
Google Cloud Dataflow or the local direct pipeline.