SQL Mastery & Query Optimization
Advanced SQL Patterns
Master the advanced SQL patterns that appear in 90%+ of data engineering interviews. These patterns form the foundation of technical assessments.
Window Functions Deep-Dive
Window functions are the most tested advanced SQL concept. Master these patterns.
ROW_NUMBER, RANK, DENSE_RANK
-- Classic interview question: Find top N per group
SELECT
department,
employee_name,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rnk,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rnk
FROM employees;
| Function | Ties Handling | Gaps |
|---|---|---|
| ROW_NUMBER | Different numbers | No gaps |
| RANK | Same number | Gaps after ties |
| DENSE_RANK | Same number | No gaps |
Interview Pattern: "Get the top 3 products by sales in each category"
WITH ranked AS (
SELECT
category_id,
product_id,
total_sales,
DENSE_RANK() OVER (
PARTITION BY category_id
ORDER BY total_sales DESC
) as sales_rank
FROM product_sales
)
SELECT * FROM ranked WHERE sales_rank <= 3;
LAG and LEAD
For time-series and sequential analysis.
-- Month-over-month growth
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) as prev_month_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month) as mom_change,
ROUND(
(revenue - LAG(revenue, 1) OVER (ORDER BY month)) * 100.0 /
NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0), 2
) as mom_growth_pct
FROM monthly_revenue;
Interview Pattern: "Find users who had a decrease in activity from previous month"
Running Totals and Moving Averages
-- Running total and 7-day moving average
SELECT
date,
daily_sales,
SUM(daily_sales) OVER (ORDER BY date) as running_total,
AVG(daily_sales) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as moving_avg_7d
FROM daily_metrics;
Frame Specifications:
| Syntax | Meaning |
|---|---|
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
All rows from start to current |
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW |
Last 7 rows including current |
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING |
Sliding window of 3 rows |
Common Table Expressions (CTEs)
CTEs improve readability and enable complex query decomposition.
Basic CTE Structure
WITH
daily_totals AS (
SELECT
date,
SUM(amount) as total_amount
FROM transactions
GROUP BY date
),
weekly_avg AS (
SELECT
AVG(total_amount) as avg_daily_amount
FROM daily_totals
)
SELECT
d.date,
d.total_amount,
w.avg_daily_amount,
d.total_amount - w.avg_daily_amount as variance
FROM daily_totals d
CROSS JOIN weekly_avg w;
Recursive CTEs
Essential for hierarchical data and graph traversal.
-- Employee hierarchy (org chart)
WITH RECURSIVE org_tree AS (
-- Base case: CEO (no manager)
SELECT
employee_id,
name,
manager_id,
1 as level,
CAST(name AS VARCHAR(1000)) as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees with managers
SELECT
e.employee_id,
e.name,
e.manager_id,
t.level + 1,
t.path || ' -> ' || e.name
FROM employees e
INNER JOIN org_tree t ON e.manager_id = t.employee_id
)
SELECT * FROM org_tree ORDER BY level, name;
Interview Pattern: "Find all subordinates (direct and indirect) of a manager"
WITH RECURSIVE subordinates AS (
SELECT employee_id, name, manager_id
FROM employees
WHERE manager_id = 101 -- Target manager
UNION ALL
SELECT e.employee_id, e.name, e.manager_id
FROM employees e
INNER JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;
Complex Joins and Set Operations
Self-Joins
-- Find employees earning more than their manager
SELECT
e.name as employee,
e.salary as employee_salary,
m.name as manager,
m.salary as manager_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary;
EXCEPT and INTERSECT
-- Find customers who ordered in 2024 but not in 2025
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2024
EXCEPT
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2025;
-- Find customers who ordered in both years
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2024
INTERSECT
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2025;
Aggregation Patterns
GROUPING SETS, CUBE, ROLLUP
-- Multiple grouping levels in one query
SELECT
COALESCE(region, 'All Regions') as region,
COALESCE(product_category, 'All Categories') as category,
SUM(sales) as total_sales
FROM sales_data
GROUP BY GROUPING SETS (
(region, product_category),
(region),
(product_category),
()
);
| Clause | Result |
|---|---|
ROLLUP(a, b) |
(a,b), (a), () |
CUBE(a, b) |
(a,b), (a), (b), () |
GROUPING SETS((a,b), (a)) |
Exactly those combinations |
Conditional Aggregation
-- Pivot without PIVOT keyword
SELECT
product_id,
SUM(CASE WHEN YEAR(sale_date) = 2024 THEN amount ELSE 0 END) as sales_2024,
SUM(CASE WHEN YEAR(sale_date) = 2025 THEN amount ELSE 0 END) as sales_2025,
COUNT(DISTINCT CASE WHEN status = 'returned' THEN order_id END) as return_count
FROM sales
GROUP BY product_id;
Interview Tip: When given a complex SQL problem, start by identifying which pattern applies (window function, recursive CTE, self-join, etc.). This framework helps you structure your solution quickly.
Next, we'll explore query optimization and execution plan analysis. :::