Skip to main content

SnowPatrol: Snowflake Usage Anomaly Detection & Alerting System

SnowPatrol is an anomaly detection and alerting system for Snowflake built with Apache Airflow®. It uses an Isolation Forest model to detect anomalies in your Snowflake compute driving cost. Airflow DAGs orchestrate feature engineering, model training, and prediction. Anomaly records are stored in a Snowflake table and alerts are sent to a Slack channel. The full source code is open-source and available on GitHub.

Screenshot a chart depicting anomalies detected for different Warehouses in Snowflake.

SnowPatrol serves a dual purpose:

  • Resource for Airflow + Snowflake users: Many organizations use Snowflake and have experienced challenges in managing associated costs, especially incurred from virtual warehouse compute. SnowPatrol is a tool organizations can use to be alerted of anomalies in their Snowflake compute usage, and take action to reduce costs.
  • Learning Tool: SnowPatrol is an MLOps reference implementation, showing how you can use Airflow to train, test, deploy, and monitor predictive models. The structure of the DAGs can be adapted to other use cases, such as fraud detection.

Architecture

SnowPatrol reference architecture diagram.

SnowPatrol performs the following steps:

  • Data Ingestion and Feature Engineering: Snowflake usage data is persisted in a Snowflake table to create a cost time series. STL decomposition is used to extract trends, seasonality, and residuals.
  • Model Training: An Isolation Forest model is trained on the features to detect anomalies. Model training and versions are tracked using Weights & Biases.
  • Predictions: The trained model is used to predict anomalies in Snowflake usage data.
  • Reporting and Alerting: Anomalies are stored in a Snowflake table and alerts are sent to a Slack channel.

Airflow features

The DAGs that power Ask Astro highlight several key Airflow best practices and features:

  • Dynamic task mapping: Every time the DAGs run, the list of warehouses in Snowflake is determined at runtime and Airflow automatically creates a dynamically mapped task instance per warehouse for model training and prediction to run in parallel.
  • Data-driven scheduling: While the first DAG runs on a time-based @daily schedule, all other DAGs are triggered based on updates to the datasets they depend upon.
  • Notifications: If any task fails, a Slack notification is automatically sent using an on_failure_callback in the default_args of the DAGs.
  • Airflow retries: All tasks are configured to automatically retry after a set delay.
  • Modularization: SQL statements are stored in the include folder and executed by SQLExecuteQueryOperator in the DAG. This makes the DAG code more readable and offers the ability to reuse SQL queries across multiple DAGs.

Next Steps

If you want to lower your Snowflake spend or build a similar anomaly detection system, fork the SnowPatrol repository and follow the steps in the README to set up the detection system. We recommend using a free trial of Astro to run SnowPatrol.

Was this page helpful?