Skip to main content
Version: 0.36

Customize your image on Astronomer Software

The Astro CLI is intended to make it easier to develop with Apache Airflow, whether you're developing on your local machine or deploying code to Astronomer. The following guidelines describe a few of the methods you can use to customize the Docker Image that gets pushed to Airflow every time you rebuild your image locally using $ astro dev start or deploy to Astronomer using $ astro deploy.

More specifically, this doc includes instructions for how to:

  • Add Python and OS-level packages
  • Add dependencies
  • Run commands on build
  • Access the Airflow CLI
  • Add environment variables locally
  • Build from a private repository

Note: The guidelines below assume that you've initialized a project on Astronomer via $ astro dev init. If you haven't done so already, refer to our "CLI Quickstart" doc.

Add Python and OS-level dependencies

To add Python packages to an Airflow Deployment, add the packages to the Deployment requirements.txt file. To add OS-level packages to an Airflow Deployment, add them to the Deployment packages.txt file. The requirements.txt and packages.txt files were automatically generated after running astro dev init to initialize your local Astro project.

Add Python dependencies

To add all Python packages to an Airflow Deployment, add the packages to the Deployment requirements.txt file.

To pin a version of that package, use the following syntax:

<package-name>==<version>

If you'd like to exclusively use Pymongo 3.7.2, for example, you'd add the following in your requirements.txt:

pymongo==3.7.2

If you do not pin a package to a version, the latest version of the package that's publicly available will be installed by default.

Add OS-level dependencies

To add OS-level packages to an Airflow Deployment, add them to the Deployment packages.txt file.

Rebuild your image

Once you've saved those packages in your text editor or version control tool, rebuild your image by running:

astro dev stop

followed by

astro dev start

This process stops your running Docker containers and restarts them with your updated image.

Confirm your package was installed (Optional)

If you added pymongo to your requirements.txt file, for example, you can confirm that it was properly installed by running a $ docker exec command into your scheduler.

  1. Run $ docker ps to identify the 3 running docker containers on your machine
  2. Grab the container ID of your scheduler container
  3. Run the following:
docker exec -it <scheduler-container-id> pip freeze | grep pymongo

pymongo==3.7.2

Add other dependencies

In the same way you can build Python and OS-level packages into your image, you're free to build additional dependencies and files for your DAGs to use.

In the example below, we'll add a folder of helper_functions with a file (or set of files) that our Airflow DAGs can then use.

Add the folder into your project directory

.
├── airflow_settings.yaml
├── dags
│ └── example-dag.py
├── Dockerfile
├── helper_functions
│ └── helper.py
├── include
├── packages.txt
├── plugins
│ └── example-plugin.py
└── requirements.txt

Rebuild your image

Follow the instructions in the "Rebuild your Image" section above.

Confirm your files were added (Optional)

Similar to the pymongo example above, you can confirm that helper.py was properly built into your image by running a $ docker exec command into your scheduler.

  1. Run $ docker ps to identify the 3 running docker containers on your machine
  2. Grab the container ID of your scheduler container
  3. Run the following:
docker exec -it <scheduler-container-id> /bin/bash
bash-4.4$ ls
Dockerfile airflow_settings.yaml helper_functions logs plugins unittests.cfg
airflow.cfg dags include packages.txt requirements.txt

Notice that helper_functions folder has been built into your image.

Configure airflow_settings.yaml

When you first initialize a new Astro project on Astronomer, a file titled airflow_settings.yaml will be automatically generated. With this file you can configure and programmatically generate Airflow connections, Pools, and Variables when you're developing locally.

For security reasons, the airflow_settings.yaml file is currently only for local development and should not be used for pushing up code to Astronomer via $ astro deploy. For the same reason, we'd recommend adding this file to your .gitignore.

tip

If you're interested in programmatically managing Airflow connections, Variables or environment variables on Astronomer Software, we recommend integrating a Secret Backend.

Add Airflow connections, pools, variables

By default, the airflow_settings.yaml file includes the following template:

airflow:
connections: ## conn_id and conn_type are required
- conn_id: my_new_connection
conn_type: postgres
conn_host: 123.0.0.4
conn_schema: airflow
conn_login: user
conn_password: pw
conn_port: 5432
conn_extra:
pools: ## pool_name, pool_slot, and pool_description are required
- pool_name: my_new_pool
pool_slot: 5
pool_description:
variables: ## variable_name and variable_value are required
- variable_name: my_variable
variable_value: my_value

Make sure to specify all required fields that correspond to the objects you create. If you don't specify them, you will see a build error on $ astro dev start.

Additional entries

If you want to add a second Connection/Pool/Variable, copy the existing fields and make a new entry like so:

variables:
- variable_name: my_first_variable
variable_value: value123
- variable_name: my_second_variable
variable_value: value987

Run commands on build

If you're interested in running any extra commands when your Airflow image builds, it can be added to your Dockerfile as a RUN command. These will run as the last step in the image build process.

For example, if you wanted to run ls when your image builds, your Dockerfile would look like this:

FROM quay.io/astronomer/astro-runtime-5.0.6
RUN ls

Docker Compose override

The Astro CLI is built on top of Docker Compose, a tool for defining and running multi-container Docker applications. To override the default CLI configurations, add a docker-compose.override.yml file to your Astro project directory. The values in this file override the default settings when you run $ astro dev start.

To add another volume mount for a directory named custom_dependencies, for example, add the following to your docker-compose.override.yml:

version: "3.1"
services:
scheduler:
volumes:
- /home/astronomer_project/custom_dependencies:/usr/local/airflow/custom_dependencies:ro

Make sure to specify version: "3.1" and mimic the format of the source code file linked above.

When your image builds on $ astro dev start, any changes made within the custom_dependencies directory will be picked up automatically the same way they are with files in your dags directory:

$ docker exec -it astronomer_project239673_scheduler_1 ls -al
total 76
drwxr-xr-x 1 astro astro 4096 Dec 30 17:21 .
drwxr-xr-x 1 root root 4096 Dec 14 2018 ..
-rw-rw-r-- 1 root root 38 Oct 8 00:07 .dockerignore
-rw-rw-r-- 1 root root 31 Oct 8 00:07 .gitignore
-rw-rw-r-- 1 root root 50 Oct 8 00:10 Dockerfile
-rw-r--r-- 1 astro astro 20770 Dec 30 17:21 airflow.cfg
drwxrwxr-x 2 1000 1000 4096 Oct 8 00:07 dags
-rw-r--r-- 1 root root 153 Dec 30 17:21 docker-compose.override.yml
drwxrwxr-x 2 1000 1000 4096 Oct 8 00:07 include
drwxr-xr-x 4 astro astro 4096 Oct 8 00:11 logs
drwxr-xr-x 2 1000 1000 4096 Dec 30 17:15 custom_dependencies
-rw-rw-r-- 1 root root 0 Oct 8 00:07 packages.txt
drwxrwxr-x 2 1000 1000 4096 Oct 8 00:07 plugins
-rw-rw-r-- 1 root root 0 Oct 8 00:07 requirements.txt
-rw-r--r-- 1 astro astro 2338 Dec 30 17:21 unittests.cfg

Note: The Astro CLI does not support overrides to Environment Variables. For more information on how to set, configure and customize those values, see "Environment Variables" doc.

Access to the Airflow CLI

You're free to use native Airflow CLI commands on Astronomer when developing locally by wrapping them around docker commands.

To add a connection, for example, you can run:

docker exec -it SCHEDULER_CONTAINER bash -c "airflow connections -a --conn_id test_three  --conn_type ' ' --conn_login etl --conn_password pw --conn_extra {"account":"blah"}"

Refer to the native Airflow CLI for a list of all commands.

Add environment variables locally

To import environment variables from a specific file, run $ astro dev start with an --env flag:

astro dev start --env .env

Note: This feature is limited to local development only. Whatever .env you use locally will not be bundled up when you deploy to Astronomer.

For more detail on how to add environment variables both locally and on Astronomer, refer to our Environment Variables doc.

Install Python packages from private sources

Python packages can be installed from public and private locations into your image. To install public packages listed on PyPI, follow the steps in Add Python and OS-level Packages. To install packages listed on private PyPI indices or a private git-based repository, you need to complete additional configuration in your project.

Depending on where your private packages are stored, use one of the following setups to install your packages to an Astro project by customizing your Runtime image.

Install Python packages from a private GitHub repository

This topic provides instructions for building your Astro project with Python packages from a private GitHub repository. At a high level, this setup entails specifying your private packages in requirements.txt, creating a custom Docker image that mounts a GitHub SSH key for your private GitHub repositories, and building your project with this Docker image.

Although this setup is based on GitHub, the general steps can be completed with any hosted Git repository.

info

The following setup has been validated only with a single SSH key. Due to the nature of ssh-agent, you might need to modify this setup when using more than one SSH key per Docker image.

Prerequisites

To install Python packages from a private GitHub repository on Astronomer Software, you need:

danger

If your organization enforces SAML single sign-on (SSO), you must first authorize your key to be used with that authentication method. For instructions, see GitHub documentation.

This setup assumes that each custom Python package is hosted within its own private GitHub repository. Installing multiple packages from a single private GitHub repository is not supported.

Step 1: Specify the private repository in your project

To add a Python package from a private repository to your Software project, specify the repository's SSH URL in your project's requirements.txt file. This URL should be formatted as git+ssh://git@github.com/<your-github-organization-name>/<your-private-repository>.git.

For example, to install the mypackage1 & mypackage2 from myorganization, as well as numpy v 1.22.1, you would add the following to requirements.txt:

git+ssh://git@github.com/myorganization/mypackage1.git
git+ssh://git@github.com/myorganization/mypackage2.git
numpy==1.22.1

This example assumes that the name of each of your Python packages is identical to the name of its corresponding GitHub repository. In other words,mypackage1 is both the name of the package and the name of the repository.

Step 2. Create Dockerfile.build

  1. In your Astro project, create a duplicate of your Dockerfile and name it Dockerfile.build.

  2. In Dockerfile.build, add AS stage to the FROM line which specifies your Astronomer image. For example, if you use Astro Runtime 5.0.6, your FROM line would be:

    FROM quay.io/astronomer/astro-runtime:5.0.6-base AS stage1
warning

Make sure to use the -base version of the Astro Runtime image that you want to customize. This image tag is customizable and does not include default build logic. For more information, see Distributions.

  1. In Dockerfile.build after the FROM line specifying your image, add the following configuration:

    LABEL maintainer="Astronomer <humans@astronomer.io>"
    ARG BUILD_NUMBER=-1
    LABEL io.astronomer.docker=true
    LABEL io.astronomer.docker.build.number=$BUILD_NUMBER
    LABEL io.astronomer.docker.airflow.onbuild=true
    # Install Python and OS-Level Packages
    COPY packages.txt .
    RUN apt-get update && cat packages.txt | xargs apt-get install -y

    FROM stage1 AS stage2
    USER root
    RUN apt-get -y install git python3 openssh-client \
    && mkdir -p -m 0600 ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
    # Install Python Packages
    COPY requirements.txt .
    RUN --mount=type=ssh,id=github pip install --no-cache-dir -q -r requirements.txt

    FROM stage1 AS stage3
    # Copy requirements directory
    COPY --from=stage2 /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
    COPY --from=stage2 /usr/local/bin /home/astro/.local/bin
    ENV PATH='/home/astro/.local/bin:$PATH'

    COPY . .

    In order, these commands:

    • Install any OS-level packages specified in packages.txt.
    • Securely mount your GitHub SSH key at build time. This ensures that the key itself is not stored in the resulting Docker image filesystem or metadata.
    • Install all Python-level packages in requirements.txt file, including those from a private GitHub repository.
info

If your repository is hosted somewhere other than GitHub, replace the domain in the ssh-keyscan command with the domain where the package is hosted.

Step 3. Build a custom Docker image

  1. Run the following command to create a new Docker image from your Dockerfile.build file, making sure to replace <ssh-key> with your SSH private key file name and <your-image> with your Astronomer image:

    DOCKER_BUILDKIT=1 docker build -f Dockerfile.build --progress=plain --ssh=github="$HOME/.ssh/<ssh-key>" -t custom-<your-image> .

    For example, if you have quay.io/astronomer/runtime:5.0.6-base in your Dockerfile.build, this command would be:

    DOCKER_BUILDKIT=1 docker build -f Dockerfile.build --progress=plain --ssh=github="$HOME/.ssh/<authorized-key>" -t custom-runtime:5.0.6-base .
info

If your repository is hosted somewhere other than GitHub, replace the location of your SSH key in the --ssh flag.

  1. Replace the contents of your Software project's Dockerfile with the following:

    FROM custom-<your-image>

    For example, if your base image was quay.io/astronomer/runtime:5.0.6-base, this line would be:

    FROM custom-runtime:5.0.6-base

Your Software project can now utilize Python packages from your private GitHub repository.

Add a CA certificate to an Astro Runtime image

If you need your Astro deployment to communicate securely with a remote service using a certificate signed by an untrusted or internal certificate authority (CA), you need to add the CA certificate to the trust store inside the image.

  1. In your Astro project Dockerfile, add the following entry below the existing FROM statement which specifies your Astro Runtime image version:

    USER root
    COPY <path-for-internal-cert.crt> /usr/local/share/ca-certificates/<your-company-name>/
    RUN update-ca-certificates
    USER astro
  2. Optional. Add additional COPY statements before the RUN update-ca-certificates stanza for each CA certificate your organization is using for external access.

  3. Save your changes and test them locally, or deploy them to a test Deployment.

Was this page helpful?