Amazon Redshift is now natively supported within the AWS Data Pipeline.
For those who unfamiliar, AWS Data Pipeline is a web service that helps you to integrate and process
data across compute and storage services at specified intervals. In this environment,
you can transform and process data that is stored in the cloud or on-premises
in a highly scalable fashion without having to worry about resource
availability, inter-task dependencies, transient failures, or
timeouts.
Regarding Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse
optimized for datasets that range from a few hundred gigabytes to a petabyte or
more.
Henceforth
Amazon Redshift is natively supported within the
AWS Data Pipeline. This means:
The RedshiftCopyActivity
is used to bulk copy data from Amazon
DynamoDB or Amazon S3 to a new or existing Redshift table. You can use this new power in a
variety of different ways. If you are using Amazon
RDS to store relational data or Amazon Elastic MapReduce to do Hadoop-style parallel processing, you can stage data in S3 before
loading it into Redshift.
The SqlActivity is used
to run SQL queries on data stored in Redshift. You specify the input and output
tables, along with the query to be run. You can create a new table for the
output, or you can merge the results of the query into an existing table.
Connectikpeople can recall
that, you can access these new activities using the graphical pipeline editor
in the AWS Management Console, the AWS CLI, and the AWS
Data Pipeline APIs.