Apache Hadoop MapReduce, Apache Spark and Apache Flink adjusted for your business logic



Everyone is aware of Apache Hadoop MapReduce  and Apache Spark as the obvious engine for all things big data; also Apache Flink, as a streaming-native engine. 

 

However more often, these modern engines require rewriting pipelines to adopt engine-specific APIs, often with different implementations for streaming and batch scenarios.

On behalf of scalability, lower latency, flexibility, performance and agility, henceforth, thanks to Dataflow Java SDK  by Google, data Artisans  for Apache Flink, and Cloudera  for Apache Spark, you can move your application or data pipeline to the appropriate engine, or to the appropriate environment (from on-prem to cloud) while keeping the business logic intact.

You can write one portable data pipeline, which can be used for either batch or stream, and executed in a number of runtimes including Flink, Spark, Google Cloud Dataflow or the local direct pipeline.

Popular Posts