
Better Data Engineering Part 5: Mastering Snapshots in dbt
A practical guide to dbt snapshots and how to track historical changes in your warehouse.
Why?
Because data changes, and if you do not track those changes, you lose the truth.
Introduction
Most warehouses store only the latest version of a record. But businesses often need to answer questions like:
- What was the customer's plan last month?
- When did this price change?
- How many users downgraded over time?
This is where dbt snapshots shine.
Snapshots let you track historical changes automatically, without writing complex SCD logic.
Rule 1: Understand the Problem - SCDs Are Everywhere
Slowly Changing Dimensions (SCDs) appear in:
- customer profiles
- subscription plans
- product catalogs
- pricing tables
- employee records
If you overwrite data, you lose history. Snapshots preserve it.
Rule 2: Use dbt's Built-In Snapshot Framework
dbt snapshots give you:
- automatic change detection
- versioning of records
- start and end timestamps
- simple configuration
A typical snapshot looks like:
{% snapshot customers_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
)
}}
select * from {{ source('crm', 'customers') }}
{% endsnapshot %}dbt handles the rest.
Rule 3: Choose the Right Strategy
dbt supports two strategies:
-
timestamp
Use when the source table has a reliableupdated_atcolumn. -
check
Use when you want dbt to compare columns directly.
Example:
strategy='check',
check_cols=['email', 'plan', 'status']Rule 4: Store Snapshots in Their Own Schema
Keep snapshots separate:
- easier to manage
- easier to query
- easier to clean up
Use a dedicated schema like:
snapshots/Rule 5: Use Snapshots for Business Logic, Not Everything
Snapshots are powerful, but do not snapshot:
- high-volume event tables
- logs
- metrics
- ephemeral data
Use them only for slowly changing business entities.
Part 6 covers CI/CD, testing, and observability.
