Use a MySQL or PostgreSQL database for metadata or storage
You can create Astronomer Software Deployments with the Houston API that use pre-created databases, external to the Airflow Deployment, as both a metadata storage and result storage backend.
Prerequisites
- Workspace Admin user privileges and a Workspace ID
- (Optional) A MySQL or PostgreSQL database
- (Optional) An existing Deployment
If you create a new connection to an external database from a Deployment with existing DAG data, you must migrate that historic data to the new database. Information about your historic Deployment activity, such as task instances and DAG runs, won't be displayed as the database where you stored that information has changed.
Step 1: (Optional) Create your database
Substitute astro-db-name
with your own database name, if you need to create a new database.
CREATE DATABASE astro-db-name;
Step 2: Add a user account to your database for the connection
Substitute astro-user-name
and astro-user-password
with your information. You can use an existing database for this step.
- PostgreSQL
- MySQL
- Create a user with a password for Astronomer Software to use to access the database.
CREATE USER astro-user-name WITH PASSWORD 'astro-user-password';
- Grant all privileges on the database to the user.
GRANT ALL PRIVILEGES ON DATABASE postgreSQL_linked_DB TO astro-user-name;
- Grant
USAGE
andCREATE
privileges on thepublic
schema toastro-user-name
:
GRANT USAGE, CREATE ON SCHEMA public TO astro-user-name;
Now, go into the database you created, which is astro-db-name
in this example, and run the following queries
- Grant all privileges on all tables, sequences, and functions to the user.
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO astro-user-name;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO astro-user-name;
GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public TO astro-user-name;
- Set default privileges for the user, so any new tables, sequences, or functions automatically have the user's access.
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON TABLES TO astro-user-name;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON SEQUENCES TO astro-user-name;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON FUNCTIONS TO astro-user-name;
GRANT USAGE, CREATE ON SCHEMA public TO astro-user-name;
- Create a new user and password.
CREATE USER 'astro-user-name'@'%' IDENTIFIED BY 'astro-user-password';
- Assign privileges.
GRANT ALL PRIVILEGES ON astro-db-name.* TO 'astro-user-name'@'%';
Step 2: Retrieve database host information
Retrieve the connection information for your external database. For example, with AWS, you can retrieve your endpoint information by Finding the connection information for an RDS for MySQL DB instance.
Step 3: Compose a connection strings for your database
You need connection strings that define how Astronomer Software configures the connection to your external databases from your Airflow Deployment. The values of these strings are used when you define your metadataConnection
or resultBackendConnection
when you create, update, or upsert your Deployment.
Use the values for your astro-user-name
, astro-user-password
, astro-db-name
, and the host information you retrieved to compose the connection strings in the following format, depending on whether you want to define a result backend connection or a metadata database connection.
- PostgreSQL
- MySQL
With PGBouncer disabled
-
metadataConnection
:postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name
-
resultBackendConnection
:db+postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name
The connection string format validation regex don't cover the resultbackend
connection string format, which includes db+
. This is specifically required for the Celery executor worker. If the connection string doesn't include db+
, then Celery worker pod fails. The regex validation is not implemented because it adds the complications on format validation logic in different scenarios.
With PGBouncer enabled
If you have PGBouncer enabled, and are using Postgres, you must configure metadataConnectionJson
and resultBackendConnectionJson
instead.
Use the values for your astro-user-name
, astro-user-password
, astro-db-name
, and the host information you retrieved to compose the connection strings in the following format, depending on whether you want to define a result backend connection or a metadata database connection.
-
metadataConnectionJson
:
"metadataConnectionJson": {
"user": "astro-user-name",
"pass": "astro-user-password",
"protocol": "postgresql",
"host": "host",
"port": 5432,
"db": "astro-db-name"
}, -
resultBackendConnectionJson
:"resultBackendConnectionJson": {
"user": "astro-user-name",
"pass": "astro-user-password",
"protocol": "postgresql",
"host": "host",
"port": 5432,
"db": "astro-db-name"
},
-
metadataConnection
:mysql+mysqldb://astro-user-name:astro-user-password@host:3306/astro-db-name
-
resultBackendConnection
:db+mysql+mysqldb://astro-user-name:astro-user-password@host:3306/astro-db-name
The connection string format validation regex don't cover the resultbackend
connection string format, which includes db+
. This is specifically required for the Celery executor worker. If the connection string doesn't include db+
, then Celery worker pod fails. The regex validation is not implemented because it adds the complications on format validation logic in different scenarios.
Step 4: Add to Deployment configuration
Use the Houston API to create your Deployment configuration.
The following example shows the mutation and queries for using createDeployment
. See Houston API code examples for examples on how to use the update
and upsert
options for configuring your Deployment.
Create a new Deployment
mutation createDeployment(
$workspaceUuid: Uuid!
$releaseName: String
$namespace: String!
$type: String!
$label: String!
$description: String
$version: String
$airflowVersion: String
$runtimeVersion: String
$executor: ExecutorType
$workers: Workers
$webserver: Webserver
$scheduler: Scheduler
$triggerer: Triggerer
$config: JSON
$properties: JSON
$dagDeployment: DagDeployment
$astroUnitsEnabled: Boolean
$rollbackEnabled: Boolean
$metadataConnection: String
$resultBackendConnection: String
$metadataConnectionJson: JSON
$resultBackendConnectionJson: JSON
) {
createDeployment(
workspaceUuid: $workspaceUuid
releaseName: $releaseName
namespace: $namespace
type: $type
label: $label
airflowVersion: $airflowVersion
description: $description
version: $version
executor: $executor
workers: $workers
webserver: $webserver
scheduler: $scheduler
triggerer: $triggerer
config: $config
properties: $properties
runtimeVersion: $runtimeVersion
dagDeployment: $dagDeployment
astroUnitsEnabled: $astroUnitsEnabled
rollbackEnabled: $rollbackEnabled
metadataConnection: $metadataConnection
resultBackendConnection: $resultBackendConnection
metadataConnectionJson: $metadataConnectionJson
resultBackendConnectionJson: $resultBackendConnectionJson
) {
id
config
urls {
type
url
__typename
}
properties
description
label
releaseName
namespace
status
type
version
workspace {
id
label
__typename
}
airflowVersion
runtimeVersion
desiredAirflowVersion
dagDeployment {
type
nfsLocation
repositoryUrl
branchName
syncInterval
syncTimeout
ephemeralStorage
dagDirectoryLocation
rev
sshKey
knownHosts
__typename
}
createdAt
updatedAt
__typename
}
}
JSON Query example
{
"workspaceUuid": "cm3g0cjd2000008l74jigb54y",
"metadataConnectionJson": {
"user": "astro-user-name",
"pass": "astro-password",
"protocol": "postgresql",
"host": "host",
"port": 5432,
"db": "astro-db-name"
},
"resultBackendConnectionJson": {
"user": "astro-user-name",
"pass": "astro-password",
"protocol": "postgresql",
"host": "postgres-db-lb.external-postgres.svc.cluster.local",
"port": 5432,
"db": "astro-db-name"
},
"namespace": "",
"type": "airflow",
"config": {
"executor": "CeleryExecutor",
"workers": {},
"webserver": {},
"scheduler": {
"replicas": 1
},
"triggerer": {}
},
"executor": "CeleryExecutor",
"workers": {},
"webserver": {},
"scheduler": {
"replicas": 1
},
"triggerer": {},
"label": "Rt1160-Celery-Pgbouncer-Enabled-Json-5",
"description": "",
"runtimeVersion": "11.6.0",
"properties": {
"extra_capacity": {
"cpu": 1000,
"memory": 3840
}
},
"astroUnitsEnabled": false,
"rollbackEnabled": true,
"dagDeployment": {
"type": "dag_deploy",
"nfsLocation": "",
"repositoryUrl": "",
"branchName": "",
"syncInterval": 1,
"syncTimeout": 120,
"ephemeralStorage": 2,
"dagDirectoryLocation": "",
"rev": "",
"sshKey": "",
"knownHosts": ""
}
}
Example query string variables
{
"workspaceUuid": "cm3g0cjd2000008l74jigb54y",
"metadataConnection": "postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name"
"resultBackendConnection": "db+postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name"
"namespace": "",
"type": "airflow",
"config": {
"executor": "CeleryExecutor",
"workers": {},
"webserver": {},
"scheduler": {
"replicas": 1
},
"triggerer": {}
},
"executor": "CeleryExecutor",
"workers": {},
"webserver": {},
"scheduler": {
"replicas": 1
},
"triggerer": {},
"label": "Rt1160-Celery-Pgbouncer-Enabled-Json-5",
"description": "",
"runtimeVersion": "11.6.0",
"properties": {
"extra_capacity": {
"cpu": 1000,
"memory": 3840
}
},
"astroUnitsEnabled": false,
"rollbackEnabled": true,
"dagDeployment": {
"type": "dag_deploy",
"nfsLocation": "",
"repositoryUrl": "",
"branchName": "",
"syncInterval": 1,
"syncTimeout": 120,
"ephemeralStorage": 2,
"dagDirectoryLocation": "",
"rev": "",
"sshKey": "",
"knownHosts": ""
}
}