5.23.2. RedShift Bulk Upload through a S3 Bucket

<< Click to Display Table of Contents >>

Navigation:  5. Detailed description of the Actions > 5.23. TA - Data Engineering >

5.23.2. RedShift Bulk Upload through a S3 Bucket

 

Icon: clip1184

 

Function: BulkUpload

 

Property window:

 

clip1187

 

Short description:

 

Upload a table to RedShift.

 

Long Description:

 

Inserting rows into RedShift using the standard ODBC connector is horribly slow.
 

To get more speed, you have two options:

You can use the clip1185 PostGres Writer action from Section 5.26.24. Indeed, the RedShift engine is based on the PostGresSQL engine and it still uses the same access protocol as a normal PostGresSQL engine. This might be the simplest solution.

You can use the current   clip1186 Action. This action proceeds in 5 steps:
 

1.Using the user-provided .gel file (parameter P1), it creates on the local hard drive a temporary csv file with all the data to copy into RedShift.

2.It copies the temporary csv file from the local hard drive into a S3 bucket.

3.It copies the temporary csv file from the S3 bucket to a Staging Table (parameter P4) into RedShift.

4.It copies the content of the Staging Table to the Final Table (parameter P3) into RedShift. Before the copy starts, to avoid to get any Primary Keys “in double” inside the Final Table, it removes from the Final table all the rows that have the same Primary Keys as the Primary Keys inside the Staging table. The name(s) of the Primary key column(s) is(are) given in parameter P2.

5.Cleaning: It deletes the staging table, the csv file inside the S3 bucket and inside the local hard drive.

 

 
To be able to connect to your S3 bucket storage, you need to get from Amazon these 4 parameters:
 

Paramater P5: your “Bucket Name”

Paramater P6: your “Region”  

Paramater P7: your “Access Key ID”

Paramater P8: your “Secret Access Key”

 
Please refer to the section 5.22.6. for the procedure to get these 4 parameters.
 

 

To be able to connect to your RedShift database (to run the SQL command to import the data from the S3 bucket), you need to:
 

Download & install the ODBC drivers for RedShift. You can download these drivers from us here: http://download.timi.eu/ODBC/ODBC_drivers_Redshift/
 

Get the ODBC connection URL to connect to your Redshift database: Enter this URL as the Anatella parameter P9. The procedure to get this ODBC connection URL is given here:
 

https://docs.aws.amazon.com/redshift/latest/mgmt/configure-odbc-connection.html#obtain-odbc-url

 
Typically, this ODBC connection URL looks like this:
 
Driver={Amazon Redshift (x64)}; Server=redshift-cluster-1.chhydacago64.eu-west-3.redshift.amazonaws.com; Database=dev; UID=<my_user>; PWD=<my_password>