Skip to main content

Create an Amazon Redshift Connection in Airflow

Amazon Redshift is a data warehouse product from AWS. Integrating Redshift with Airflow allows you to automate, schedule and monitor a variety of tasks. These tasks include creating, deleting, and resuming a cluster, ingesting or exporting data to and from Redshift, as well as running SQL queries against Redshift.

This document covers two different methods to connect Airflow to Amazon Redshift:

  • Using database (DB) user credentials
  • Using IAM credentials
  • Using IAM role
tip

If you're an Astro user, Astronomer recommends using workload identity to authorize to your Deployments to Redshift. This eliminates the need to specify secrets in your Airflow connections or copying credentials file to your Airflow project. See Authorize Deployments to your cloud.

Prerequisites

Get connection details

DB user credentials can be used to establish a connection to an Amazon Redshift cluster. While straightforward to use, this approach lacks the strong security and user access controls provided by identity and access management (IAM). Connecting this way requires the following information:

  • Cluster identifier
  • Database name
  • Port
  • User
  • Password

Complete the following steps to retrieve these values:

  1. In your AWS console, select the region that contains your Redshift cluster, open the Redshift cluster dashboard, then open your cluster.

  2. From the General information section, copy the Cluster identifier and Endpoint.

  3. Open the Properties tab and copy the Database name and Port.

  4. Create a Redshift user and grant a role so that Airflow can access Redshift through the user. Copy the username and password.

Create your connection

  1. Open your Astro project and add the following line to your requirements.txt file:

    apache-airflow-providers-amazon

    This will install the Amazon provider package, which makes the Amazon Redshift connection type available in Airflow.

  2. Run astro dev restart to restart your local Airflow environment and apply your changes in requirements.txt.

  3. In the Airflow UI for your local Airflow environment, go to Admin > Connections. Click + to add a new connection, then select the connection type as Amazon Redshift.

  4. Fill out the following connection fields using the information you retrieved from Get connection details:

    • Connection Id: Enter a name for the connection.
    • Host: Enter the cluster Endpoint.
    • Database: Enter the Database name.
    • User: Enter the DB user username.
    • Password: Enter the DB user password.
    • Port: Enter the Port.
  5. Click Test. After the connection test succeeds, click Save.

    aws-connection-db-creds

How it works

Airflow uses the Amazon Redshift Python Connector to connect to Redshift through the RedshiftSQLHook.

See also

Was this page helpful?