The new Amazon Web Service Data Pipeline: Stakes, Opportunities ad Challenge for the companies


It is obvious that the increase or the range of multiple sources of data to process is getting more and more complex inside the companies, government and organizations. 

The frequent issues meet here include: Increasing Size, variety of formats, disparate storage and Distributed, Scalable Processing.
Therefore since November 29, 2012 Amazon Web Services has marked the step forward via the new Amazon Web Service Data Pipeline.
the concept of the Pipeline here includes: a set of data sources, preconditions, destinations, processing steps, and an operational schedule, all aims to be define in a Pipeline Definition.

In fact the definition aims to specify where the data comes from, what to do with it, and where to store it. This also means that, once you define and activate a pipeline, it will run according to a regular schedule. You could, for example, arrange to copy log files from a cluster of Amazon EC2 instances to an S3 bucket every day, and then launch a massively parallel data analysis job on an Elastic MapReduce cluster once a week.etc.
You can create a Pipeline Definition in the AWS Management Console or externally, in text form.
The AWS Data Pipeline is currently in a limited private beta. In case of you are interested, you can contact AWS sales.

Comments

Popular Posts