Create DAG documentation in Apache Airflow
One of the more powerful and lesser-known features of Airflow is that you can create Markdown-based DAG documentation that appears in the Airflow UI
After you complete this tutorial, you'll be able to:
- Add custom doc strings to an Airflow DAG.
- Add custom doc strings to an Airflow task.
Time to complete
This tutorial takes approximately 15 minutes to complete.
Assumed knowledge
- Basic Airflow concepts. See Introduction to Apache Airflow.
- Basic Python. See the Python Documentation.
Prerequisites
- The Astro CLI.
Step 1: Create an Astro project
To run Airflow locally, you first need to create an Astro project.
-
Create a new directory for your Astro project:
mkdir <your-astro-project-name> && cd <your-astro-project-name>
-
Run the following Astro CLI command to initialize an Astro project in the directory:
astro dev init
-
To enable raw HTML in your Markdown DAG descriptions, add the following Airflow config environment variable to your
.env
file. This will allow you to use HTML in your DAG descriptions. If you don't want to enable this setting due to security concerns you will still be able to use Markdown in your DAG descriptions and the HTML shown in this tutorial will be displayed as raw content.AIRFLOW__WEBSERVER__ALLOW_RAW_HTML_DESCRIPTIONS=True
-
Start your Airflow instance by running:
astro dev start
Step 2: Create a new DAG
-
In your
dags
folder, create a file nameddocs_example_dag.py
. -
Copy and paste one of the following DAGs based on which coding style you're most comfortable with.
- TaskFlow API
- Traditional syntax
from airflow.decorators import task, dag
from pendulum import datetime
import requests
@dag(
start_date=datetime(2022,11,1),
schedule="@daily",
catchup=False
)
def docs_example_dag():
@task
def tell_me_what_to_do():
response = requests.get("https://bored-api.appbrewery.com/random")
return response.json()["activity"]
tell_me_what_to_do()
docs_example_dag()
from airflow.models.dag import DAG
from airflow.operators.python import PythonOperator
from pendulum import datetime
import requests
def query_api():
response = requests.get("https://bored-api.appbrewery.com/random")
return response.json()["activity"]
with DAG(
dag_id="docs_example_dag",
start_date=datetime(2022,11,1),
schedule=None,
catchup=False,
):
tell_me_what_to_do = PythonOperator(
task_id="tell_me_what_to_do",
python_callable=query_api,
)
This DAG has one task called tell_me_what_to_do
, which queries an API that provides a random activity for the day and prints it to the logs.
Step 3: Add docs to your DAG
You can add Markdown-based documentation to your DAGs that will render in the Grid, Graph and Calendar pages of the Airflow UI.
-
In your
docs_example_dag.py
file, add the following doc string above the definition of your DAG:doc_md_DAG = """
### The Activity DAG
This DAG will help me decide what to do today. It uses the [BoredAPI](https://bored-api.appbrewery.com/random) to do so.
Before I get to do the activity I will have to:
- Clean up the kitchen.
- Check on my pipelines.
- Water the plants.
Here are some happy plants:
<img src="https://www.publicdomainpictures.net/pictures/80000/velka/succulent-roses-echeveria.jpg" alt="plants" width="300"/>
"""This doc string is written in Markdown. It includes a title, a link to an external website, a bulleted list, as well as an image which has been formatted using HTML. To learn more about Markdown, see The Markdown Guide.
-
Add the documentation to your DAG by passing
doc_md_DAG
to thedoc_md
parameter of your DAG class as shown in the code snippet below:
- TaskFlow API
- Traditional syntax
@dag(
start_date=datetime(2022,11,1),
schedule="@daily",
catchup=False,
doc_md=doc_md_DAG
)
def docs_example_dag():
with DAG(
dag_id="docs_example_dag",
start_date=datetime(2022,11,1),
schedule="@daily",
catchup=False,
doc_md=doc_md_DAG
):
-
Go to the Grid view and click on the DAG Docs banner to view the rendered documentation.
Airflow will automatically pick up a doc string written directly beneath the definition of the DAG context and add it as DAG Docs.
Additionally, using with DAG():
lets you pass the filepath of a markdown file to the doc_md
parameter. This can be useful if you want to add the same documentation to several of your DAGs.
Step 4: Add docs to a task
You can also add docs to specific Airflow tasks using Markdown, Monospace, JSON, YAML or reStructuredText. Note that only Markdown will be rendered and other formats will be displayed as rich content.
To add documentation to your task, follow these steps:
-
Add the following code with a string in Markdown format:
doc_md_task = """
### Purpose of this task
This task **boldly** suggests a daily activity.
""" -
Add the following code with a string written in monospace format:
doc_monospace_task = """
If you don't like the suggested activity you can always just go to the park instead.
""" -
Add the following code with a string in JSON format:
doc_json_task = """
{
"previous_suggestions": {
"go to the gym": ["frequency": 2, "rating": 8],
"mow your lawn": ["frequency": 1, "rating": 2],
"read a book": ["frequency": 3, "rating": 10],
}
}
""" -
Add the following code with a string written in YAML format:
doc_yaml_task = """
clothes_to_wear: sports
gear: |
- climbing: true
- swimming: false
""" -
Add the following code containing reStructuredText:
doc_rst_task = """
===========
This feature is pretty neat
===========
* there are many ways to add docs
* luckily Airflow supports a lot of them
.. note:: `Learn more about rst here! <https://gdal.org/contributing/rst_style.html#>`__
""" -
Create a task definition as shown in the following snippet. The task definition includes parameters for specifying each of the documentation strings you created. Pick the coding style you're most comfortable with.
- TaskFlow API
- Traditional syntax
@task(
doc_md=doc_md_task,
doc=doc_monospace_task,
doc_json=doc_json_task,
doc_yaml=doc_yaml_task,
doc_rst=doc_rst_task
)
def tell_me_what_to_do():
response = requests.get("https://bored-api.appbrewery.com/random")
return response.json()["activity"]
tell_me_what_to_do()
tell_me_what_to_do = PythonOperator(
task_id="tell_me_what_to_do",
python_callable=query_api,
doc_md=doc_md_task,
doc=doc_monospace_task,
doc_json=doc_json_task,
doc_yaml=doc_yaml_task,
doc_rst=doc_rst_task
)
-
Go to the Airflow UI and run your DAG.
-
In the Grid view, click on the green square for your task instance.
-
Click on Task Instance Details.
-
See the docs under their respective attribute:
In Airflow 2.10+, task docs provided to doc_md
or as a doc string in a @task
decorated task are rendered in the task details in the Airflow UI.
Step 5: Add notes to a task instance and DAG run
You can add notes to task instances and DAG runs from the Grid view in the Airflow UI. This feature is useful if you need to share contextual information about a DAG or task run with your team, such as why a specific run failed.
-
Go to the Grid View of the
docs_example_dag
DAG you created in Step 2. -
Select a task instance or DAG run.
-
Click Details > Task Instance Notes or DAG Run notes > Add Note.
-
Write a note and click Save Note.
Conclusion
Congratulations! You now know how to add fancy documentation to both your DAGs and your Airflow tasks.