Amazon Redshift is now natively supported within the AWS Data Pipeline.



For those who unfamiliar, AWS Data Pipeline is a web service that helps you to integrate and process data across compute and storage services at specified intervals. In this environment, you can transform and process data that is stored in the cloud or on-premises in a highly scalable fashion without having to worry about resource availability, inter-task dependencies, transient failures, or
timeouts.
Regarding Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse optimized for datasets that range from a few hundred gigabytes to a petabyte or more.
Henceforth Amazon Redshift is natively supported within the AWS Data Pipeline. This means:
The RedshiftCopyActivity is used to bulk copy data from Amazon DynamoDB or Amazon S3 to a new or existing Redshift table. You can use this new power in a variety of different ways. If you are using Amazon RDS to store relational data or Amazon Elastic MapReduce to do Hadoop-style parallel processing, you can stage data in S3 before loading it into Redshift.
The SqlActivity is used to run SQL queries on data stored in Redshift. You specify the input and output tables, along with the query to be run. You can create a new table for the output, or you can merge the results of the query into an existing table.
Connectikpeople can recall that, you can access these new activities using the graphical pipeline editor in the AWS Management Console, the AWS CLI, and the AWS Data Pipeline APIs.

Popular Posts