Data Modeling & Warehousing
Dimensional Modeling Fundamentals
Dimensional modeling is the backbone of data warehouse design. Interview questions frequently test your understanding of star schemas, snowflake schemas, and modeling best practices.
Star Schema
The most common dimensional modeling approach for analytical workloads.
Structure
┌─────────────┐
│ dim_date │
└──────┬──────┘
│
┌─────────────┐ │ ┌─────────────┐
│dim_customer ├──────┼──────┤dim_product │
└─────────────┘ │ └─────────────┘
│
┌──────┴──────┐
│ fact_sales │
└──────┬──────┘
│
┌──────┴──────┐
│ dim_store │
└─────────────┘
Components
| Component | Purpose | Example |
|---|---|---|
| Fact Table | Measures/metrics | fact_sales (quantity, revenue, cost) |
| Dimension Table | Context/attributes | dim_customer (name, email, segment) |
| Foreign Keys | Links facts to dimensions | customer_id, product_id |
| Surrogate Keys | Stable identifiers | Auto-increment integers |
Example Star Schema
-- Fact table
CREATE TABLE fact_sales (
sale_id BIGINT PRIMARY KEY,
date_key INT REFERENCES dim_date(date_key),
customer_key INT REFERENCES dim_customer(customer_key),
product_key INT REFERENCES dim_product(product_key),
store_key INT REFERENCES dim_store(store_key),
quantity INT,
unit_price DECIMAL(10,2),
total_amount DECIMAL(12,2),
discount_amount DECIMAL(10,2)
);
-- Dimension table
CREATE TABLE dim_customer (
customer_key INT PRIMARY KEY, -- Surrogate key
customer_id VARCHAR(50), -- Natural/business key
customer_name VARCHAR(100),
email VARCHAR(100),
segment VARCHAR(50),
region VARCHAR(50),
created_date DATE
);
Snowflake Schema
Normalized dimension tables—reduces redundancy but increases query complexity.
Structure
┌─────────────┐
│ dim_date │
└──────┬──────┘
│
┌─────────────┐ │ ┌─────────────┐
│dim_customer ├──────┼──────┤dim_product │
└──────┬──────┘ │ └──────┬──────┘
│ │ │
┌──────┴──────┐┌─────┴─────┐┌──────┴──────┐
│dim_geography││fact_sales ││dim_category │
└─────────────┘└───────────┘└─────────────┘
When to Use Each
| Aspect | Star Schema | Snowflake Schema |
|---|---|---|
| Query Performance | Faster (fewer joins) | Slower (more joins) |
| Storage | More (denormalized) | Less (normalized) |
| Maintenance | Easier | More complex |
| ETL Complexity | Lower | Higher |
| Best For | Analytics, BI tools | Storage-constrained, OLTP hybrid |
Interview Answer: "I prefer star schemas for analytics because they minimize joins and are optimized for BI tool patterns. I'd consider snowflake only when dimension tables are very large with high redundancy."
Fact Table Types
Transaction Facts
Record individual events at the most granular level.
-- Each row = one transaction
CREATE TABLE fact_order_line (
order_line_id BIGINT,
order_id BIGINT,
date_key INT,
product_key INT,
quantity INT,
unit_price DECIMAL(10,2)
);
Periodic Snapshot Facts
Capture state at regular intervals.
-- Each row = account state at month end
CREATE TABLE fact_account_monthly_snapshot (
snapshot_date_key INT,
account_key INT,
balance DECIMAL(15,2),
transactions_count INT,
average_balance DECIMAL(15,2)
);
Accumulating Snapshot Facts
Track workflow progress with multiple dates.
-- Each row = order lifecycle
CREATE TABLE fact_order_fulfillment (
order_key INT,
order_date_key INT,
ship_date_key INT,
delivery_date_key INT,
return_date_key INT,
order_amount DECIMAL(12,2),
days_to_ship INT,
days_to_deliver INT
);
Dimension Design Patterns
Role-Playing Dimensions
Same dimension used multiple times in a fact table.
-- Date dimension used for multiple date contexts
SELECT
f.order_id,
order_date.full_date as order_date,
ship_date.full_date as ship_date,
delivery_date.full_date as delivery_date
FROM fact_orders f
JOIN dim_date order_date ON f.order_date_key = order_date.date_key
JOIN dim_date ship_date ON f.ship_date_key = ship_date.date_key
JOIN dim_date delivery_date ON f.delivery_date_key = delivery_date.date_key;
Junk Dimensions
Combine low-cardinality flags into a single dimension.
-- Instead of multiple flag columns in fact table
CREATE TABLE dim_order_flags (
flag_key INT PRIMARY KEY,
is_rush_order BOOLEAN,
is_gift BOOLEAN,
is_online BOOLEAN,
payment_method VARCHAR(20)
);
-- Fact table references single key
CREATE TABLE fact_orders (
order_key BIGINT,
flag_key INT REFERENCES dim_order_flags(flag_key),
...
);
Degenerate Dimensions
Dimension attributes stored directly in fact table (no separate dimension).
-- order_number is a degenerate dimension
CREATE TABLE fact_order_line (
order_line_key BIGINT,
order_number VARCHAR(20), -- Degenerate dimension
product_key INT,
quantity INT,
...
);
Interview Design Questions
Question: "Design a data model for an e-commerce platform"
Answer Structure:
-
Identify business processes: Orders, inventory, returns, customer interactions
-
Define grain: One row = one order line item
-
Identify dimensions:
dim_customer(customer attributes, segment)dim_product(product attributes, category, brand)dim_date(calendar attributes)dim_store/dim_channel(sales channel)
-
Identify facts/measures:
quantity,unit_price,discount,tax- Derived:
net_amount,gross_amount
-
Consider SCDs: Customer segment changes, product price changes
Interview Tip: When asked to design a data model, always start by clarifying the grain (what does one row represent?) before diving into tables.
Next, we'll explore modern warehouse architectures and their trade-offs. :::