Data Modeling & Warehousing

Slowly Changing Dimensions

4 min read

Slowly Changing Dimensions (SCDs) handle historical tracking in dimensional models. This is a classic interview topic—expect to explain types, trade-offs, and implementation.

SCD Types Overview

Type Description History Storage
Type 0 Retain original value None Minimal
Type 1 Overwrite old value None Minimal
Type 2 Add new row with versioning Full High
Type 3 Add column for previous value Limited Medium
Type 4 Separate history table Full Medium
Type 6 Hybrid (1+2+3) Full Highest

Type 1: Overwrite

No history maintained—simply replace old values.

Implementation

-- Before update
SELECT * FROM dim_customer WHERE customer_id = 100;
-- customer_key=1, customer_id=100, email='old@email.com'

-- Type 1 Update
UPDATE dim_customer
SET email = 'new@email.com',
    updated_at = CURRENT_TIMESTAMP
WHERE customer_id = 100;

-- After update
-- customer_key=1, customer_id=100, email='new@email.com'

Use Cases

  • Correction of data entry errors
  • Non-analytically significant attributes
  • When history is not required

Type 2: Add New Row

Full history preserved with versioning columns.

Schema Design

CREATE TABLE dim_customer (
    customer_key INT PRIMARY KEY,        -- Surrogate key
    customer_id VARCHAR(50),             -- Natural key
    customer_name VARCHAR(100),
    segment VARCHAR(50),
    region VARCHAR(50),
    effective_date DATE,                 -- Row valid from
    expiration_date DATE,                -- Row valid to
    is_current BOOLEAN,                  -- Current record flag
    version INT                          -- Version number
);

Implementation

-- Step 1: Expire current record
UPDATE dim_customer
SET expiration_date = CURRENT_DATE - 1,
    is_current = FALSE
WHERE customer_id = 100
AND is_current = TRUE;

-- Step 2: Insert new version
INSERT INTO dim_customer (
    customer_key, customer_id, customer_name, segment, region,
    effective_date, expiration_date, is_current, version
)
SELECT
    nextval('customer_key_seq'),
    100,
    'John Smith',
    'Premium',           -- Changed from 'Standard'
    'West',
    CURRENT_DATE,
    '9999-12-31',        -- High date for current
    TRUE,
    (SELECT MAX(version) + 1 FROM dim_customer WHERE customer_id = 100);

Query Patterns

-- Current state only
SELECT * FROM dim_customer WHERE is_current = TRUE;

-- Point-in-time lookup
SELECT * FROM dim_customer
WHERE customer_id = 100
AND '2024-06-15' BETWEEN effective_date AND expiration_date;

-- Historical analysis with facts
SELECT
    f.sale_date,
    f.amount,
    d.segment  -- Segment at time of sale
FROM fact_sales f
JOIN dim_customer d ON f.customer_key = d.customer_key;

Interview Question: "Why use surrogate keys with SCD Type 2?"

Answer: "Surrogate keys (customer_key) allow each version to have a unique identifier. Fact tables reference the surrogate key, automatically capturing the dimension state at the time of the transaction. Without surrogate keys, you'd need complex date-based joins for every query."

Type 3: Previous Value Column

Track limited history with additional columns.

Schema Design

CREATE TABLE dim_customer (
    customer_key INT PRIMARY KEY,
    customer_id VARCHAR(50),
    current_segment VARCHAR(50),
    previous_segment VARCHAR(50),
    segment_change_date DATE
);

Implementation

UPDATE dim_customer
SET previous_segment = current_segment,
    current_segment = 'Premium',
    segment_change_date = CURRENT_DATE
WHERE customer_id = 100;

Use Cases

  • When only one previous value matters
  • Analyzing "before and after" scenarios
  • Storage-constrained environments

Type 4: History Table

Separate table for historical records.

Schema Design

-- Current dimension (Type 1 updates)
CREATE TABLE dim_customer (
    customer_key INT PRIMARY KEY,
    customer_id VARCHAR(50),
    customer_name VARCHAR(100),
    segment VARCHAR(50),
    updated_at TIMESTAMP
);

-- Historical tracking
CREATE TABLE dim_customer_history (
    history_id SERIAL PRIMARY KEY,
    customer_key INT,
    customer_name VARCHAR(100),
    segment VARCHAR(50),
    effective_date DATE,
    expiration_date DATE,
    change_type VARCHAR(20)  -- INSERT, UPDATE, DELETE
);

Benefits

  • Current queries remain simple and fast
  • History available when needed
  • Cleaner separation of concerns

Type 6: Hybrid (1+2+3)

Combines multiple approaches for maximum flexibility.

Schema Design

CREATE TABLE dim_customer (
    customer_key INT PRIMARY KEY,
    customer_id VARCHAR(50),
    customer_name VARCHAR(100),
    -- Type 3 columns
    current_segment VARCHAR(50),
    previous_segment VARCHAR(50),
    original_segment VARCHAR(50),    -- Type 0
    -- Type 2 columns
    effective_date DATE,
    expiration_date DATE,
    is_current BOOLEAN
);

Use Cases

  • Need both current comparison and full history
  • Complex analytical requirements
  • When different attributes need different SCD treatment

Implementation with dbt

Type 2 Snapshot

# models/snapshots/dim_customer_snapshot.sql
{% snapshot dim_customer_snapshot %}

{{
    config(
      target_schema='snapshots',
      unique_key='customer_id',
      strategy='timestamp',
      updated_at='updated_at',
    )
}}

SELECT
    customer_id,
    customer_name,
    email,
    segment,
    region,
    updated_at
FROM {{ source('raw', 'customers') }}

{% endsnapshot %}

Result Schema (auto-generated by dbt)

customer_id | customer_name | segment | dbt_scd_id | dbt_valid_from | dbt_valid_to | dbt_updated_at

Performance Considerations

Aspect Type 1 Type 2 Type 3 Type 4
Storage Low High Medium Medium
Current Query Fast Medium Fast Fast
Historical Query N/A Complex Limited Medium
ETL Complexity Low High Medium Medium
Fact Table Joins Simple Simple Simple Complex

Interview Design Question

Question: "Design an SCD strategy for an e-commerce customer dimension"

Answer:

Attribute SCD Type Rationale
customer_name Type 1 Name corrections, not analytically significant
email Type 1 Current contact info needed
segment Type 2 Critical for historical analysis
lifetime_value_tier Type 2 Drives marketing analysis
address Type 2 or 4 Depends on shipping analysis needs
phone Type 1 Current contact only
registration_date Type 0 Never changes

Implementation Choice: "I'd use Type 2 for segment and tier, Type 1 for contact info, and consider Type 4 if we need detailed address change history for fraud detection."

Interview Tip: When discussing SCDs, always connect your choice to business requirements. There's no universally "correct" type—the best choice depends on analytical needs and storage constraints.

Next module covers ETL pipelines and orchestration patterns. :::

Quiz

Module 3: Data Modeling & Warehousing

Take Quiz