Skip to main content

Create a Databricks connection in Airflow

Databricks is a SaaS product for data processing using Apache Spark. Integrating Databricks with Airflow lets you manage Databricks clusters, as well as execute and monitor Databricks jobs from an Airflow DAG.

This guide provides the basic setup for creating a Databricks connection. For a complete integration tutorial, see Orchestrate Databricks jobs with Airflow.

Prerequisites

Get connection details

A connection from Airflow to Databricks requires the following information:

  • Databricks URL
  • Personal access token

Complete the following steps to retrieve these values:

  1. In the Databricks Cloud UI, copy the URL of your Databricks workspace. For example, it should be formatted as either https://dbc-75fc7ab7-96a6.cloud.databricks.com/ or https://your-org.cloud.databricks.com/.
  2. To use a personal access token for a user, follow the Databricks documentation to generate a new token. To generate a personal access token for a service principal, see Manage personal access tokens for a service principal. Copy the personal access token.

Create your connection

  1. Open your Astro project and add the following line to your requirements.txt file:

    apache-airflow-providers-databricks

    This will install the Databricks provider package, which makes the Databricks connection type available in Airflow.

  2. Run astro dev restart to restart your local Airflow environment and apply your changes in requirements.txt.

  3. In the Airflow UI for your local Airflow environment, go to Admin > Connections. Click + to add a new connection, then select the connection type as Databricks.

  4. Fill out the following connection fields using the information you retrieved from Get connection details:

    • Connection Id: Enter a name for the connection.
    • Host: Enter the Databricks URL.
    • Password: Enter your personal access token.
  5. Click Test. After the connection test succeeds, click Save.

    databricks-connection

How it works

Airflow uses Python's requests library to connect to Databricks through the BaseDatabricksHook.

See also

Was this page helpful?