SnowPatrol: Snowflake Usage Anomaly Detection & Alerting System
SnowPatrol is an anomaly detection and alerting system for Snowflake built with Apache Airflow®. It uses an Isolation Forest model to detect anomalies in your Snowflake compute driving cost. Airflow DAGs orchestrate feature engineering, model training, and prediction. Anomaly records are stored in a Snowflake table and alerts are sent to a Slack channel. The full source code is open-source and available on GitHub.
SnowPatrol serves a dual purpose:
- Resource for Airflow + Snowflake users: Many organizations use Snowflake and have experienced challenges in managing associated costs, especially incurred from virtual warehouse compute. SnowPatrol is a tool organizations can use to be alerted of anomalies in their Snowflake compute usage, and take action to reduce costs.
- Learning Tool: SnowPatrol is an MLOps reference implementation, showing how you can use Airflow to train, test, deploy, and monitor predictive models. The structure of the DAGs can be adapted to other use cases, such as fraud detection.
There are multiple resources for learning about this topic. See also:
Architecture
SnowPatrol performs the following steps:
- Data Ingestion and Feature Engineering: Snowflake usage data is persisted in a Snowflake table to create a cost time series. STL decomposition is used to extract trends, seasonality, and residuals.
- Model Training: An Isolation Forest model is trained on the features to detect anomalies. Model training and versions are tracked using Weights & Biases.
- Predictions: The trained model is used to predict anomalies in Snowflake usage data.
- Reporting and Alerting: Anomalies are stored in a Snowflake table and alerts are sent to a Slack channel.
Airflow features
The DAGs that power Ask Astro highlight several key Airflow best practices and features:
- Dynamic task mapping: Every time the DAGs run, the list of warehouses in Snowflake is determined at runtime and Airflow automatically creates a dynamically mapped task instance per warehouse for model training and prediction to run in parallel.
- Data-driven scheduling: While the first DAG runs on a time-based
@daily
schedule, all other DAGs are triggered based on updates to the datasets they depend upon. - Notifications: If any task fails, a Slack notification is automatically sent using an
on_failure_callback
in thedefault_args
of the DAGs. - Airflow retries: All tasks are configured to automatically retry after a set delay.
- Modularization: SQL statements are stored in the
include
folder and executed by SQLExecuteQueryOperator in the DAG. This makes the DAG code more readable and offers the ability to reuse SQL queries across multiple DAGs.
Next Steps
If you want to lower your Snowflake spend or build a similar anomaly detection system, fork the SnowPatrol repository and follow the steps in the README to set up the detection system. We recommend using a free trial of Astro to run SnowPatrol.