Skip to main content

Create a BigQuery connection in Airflow

BigQuery is Google's fully managed and serverless data warehouse. Integrating BigQuery with Airflow lets you execute BigQuery jobs from a DAG.

There are multiple ways to connect Airflow and BigQuery, all of which require a GCP Service Account:

  • Use the contents of a service account key file directly in an Airflow connection.
  • Copy the service account key file to your Airflow project.
  • Store the contents of a service account key file in a secrets backend.
  • Use a Kubernetes service account to integrate Airflow and BigQuery. This is possible only if you run Airflow on Astro or Google Kubernetes Engine (GKE).

Using a Kubernetes service account is the most secure method because it doesn't require storing a secret in Airflow's metadata database, on disk, or in a secrets backend. The next most secure connection method is to store the contents of your service account key file in a secrets backend.

tip

If you're an Astro user, Astronomer recommends using workload identity to authorize to your Deployments to BigQuery. This eliminates the need to specify secrets in your Airflow connections or copying credentials file to your Astro project. See Authorize Deployments to your cloud.

Prerequisites

Get connection details

A connection from Airflow to Google BigQuery requires the following information:

  • Service account name
  • Service account key file
  • Google Cloud Project ID

Complete one of the following sets of steps to retrieve these values:

This method requires you to save the contents of your service account key file in your Airflow connection.

  1. In your Google Cloud console, select your Google Cloud project and copy its ID.
  2. Create a new service account.
  3. Grant roles to your service account so that it can access BigQuery. See BigQuery roles for a list of available roles and the permissions.
  4. Add a new JSON key file to the service account.
  5. Copy the contents of the key file.

Create your connection

  1. Open your Astro project and add the following line to your requirements.txt file:

    apache-airflow-providers-google

    This installs the Google Cloud provider package, which makes the Google Cloud connection type available in Airflow.

  2. Run astro dev restart to restart your local Airflow environment and apply your changes in requirements.txt.

  3. In the Airflow UI for your local Airflow environment, go to Admin > Connections. Click + to add a new connection, then choose the Google Cloud connection type.

  4. Fill out the following connection fields using the information you retrieved from Get connection details:

    • Connection Id: Enter a name for the connection.
    • Keyfile JSON: Enter the contents of the key file.
  5. Click Test. After the connection test succeeds, click Save.

    gcp-connection-key-in-ui

How it works

Airflow uses the python-bigquery library to connect to GCP BigQuery through the BigQueryHook. If you don't define specific key credentials in the connection, Google defaults to using Application Default Credentials (ADC). This means when you use Workload Identity to connect to BigQuery, Airflow relies on ADC to authenticate.

See also

Was this page helpful?