SQL Mastery for Interviews
Window Functions Deep Dive
Window functions are the most tested advanced SQL concept in data science interviews. They let you calculate across rows while keeping individual row detail - something regular GROUP BY cannot do.
The Window Function Syntax
function_name(column) OVER (
PARTITION BY partition_column -- Optional: group rows
ORDER BY order_column -- Optional: order within partition
ROWS/RANGE frame_specification -- Optional: define window frame
)
Ranking Functions
Three ranking functions serve different purposes:
| Function | Ties | Gaps |
|---|---|---|
| ROW_NUMBER() | Arbitrary order | No gaps |
| RANK() | Same rank | Gaps after ties |
| DENSE_RANK() | Same rank | No gaps |
Example: Ranking salespeople by revenue
SELECT
name,
revenue,
ROW_NUMBER() OVER (ORDER BY revenue DESC) as row_num,
RANK() OVER (ORDER BY revenue DESC) as rank,
DENSE_RANK() OVER (ORDER BY revenue DESC) as dense_rank
FROM sales;
-- Results:
-- name | revenue | row_num | rank | dense_rank
-- Alice | 100 | 1 | 1 | 1
-- Bob | 100 | 2 | 1 | 1 (tie with Alice)
-- Charlie | 90 | 3 | 3 | 2 (rank skips 2, dense_rank doesn't)
Classic interview question: "Find the top 3 products by sales in each category"
WITH ranked AS (
SELECT
category,
product_name,
sales,
DENSE_RANK() OVER (
PARTITION BY category
ORDER BY sales DESC
) as rank
FROM products
)
SELECT category, product_name, sales
FROM ranked
WHERE rank <= 3;
LAG and LEAD
Access previous or next row values - essential for time-series analysis.
-- Calculate day-over-day change in revenue
SELECT
date,
revenue,
LAG(revenue, 1) OVER (ORDER BY date) as prev_day_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY date) as daily_change,
ROUND(
100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY date))
/ LAG(revenue, 1) OVER (ORDER BY date),
2
) as pct_change
FROM daily_sales;
Interview pattern: "Identify users whose spending decreased for 3 consecutive months"
WITH monthly_spending AS (
SELECT
user_id,
DATE_TRUNC('month', order_date) as month,
SUM(amount) as spending,
LAG(SUM(amount), 1) OVER (PARTITION BY user_id ORDER BY DATE_TRUNC('month', order_date)) as prev_month,
LAG(SUM(amount), 2) OVER (PARTITION BY user_id ORDER BY DATE_TRUNC('month', order_date)) as prev_2_month
FROM orders
GROUP BY user_id, DATE_TRUNC('month', order_date)
)
SELECT DISTINCT user_id
FROM monthly_spending
WHERE spending < prev_month
AND prev_month < prev_2_month;
Running Totals and Moving Averages
Window frames let you calculate cumulative or moving calculations:
-- Running total of revenue
SELECT
date,
revenue,
SUM(revenue) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total
FROM daily_sales;
-- 7-day moving average
SELECT
date,
revenue,
AVG(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as moving_avg_7d
FROM daily_sales;
PARTITION BY for Segmented Analysis
PARTITION BY restarts calculations for each group:
-- Running total PER customer
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) as customer_running_total
FROM orders;
Interview Pro Tips
- PARTITION BY vs GROUP BY: PARTITION BY keeps all rows, GROUP BY collapses them
- ORDER BY in OVER: Required for ranking and LAG/LEAD, optional for simple aggregates
- Default frame: If ORDER BY is specified, default is
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW - Multiple windows: You can use different OVER() clauses in the same query
Window functions separate intermediate from senior candidates. Master these patterns and you'll stand out. :::