SQL Mastery & Query Optimization

Advanced SQL Patterns

4 min read

Master the advanced SQL patterns that appear in 90%+ of data engineering interviews. These patterns form the foundation of technical assessments.

Window Functions Deep-Dive

Window functions are the most tested advanced SQL concept. Master these patterns.

ROW_NUMBER, RANK, DENSE_RANK

-- Classic interview question: Find top N per group
SELECT
    department,
    employee_name,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rnk,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rnk
FROM employees;
Function Ties Handling Gaps
ROW_NUMBER Different numbers No gaps
RANK Same number Gaps after ties
DENSE_RANK Same number No gaps

Interview Pattern: "Get the top 3 products by sales in each category"

WITH ranked AS (
    SELECT
        category_id,
        product_id,
        total_sales,
        DENSE_RANK() OVER (
            PARTITION BY category_id
            ORDER BY total_sales DESC
        ) as sales_rank
    FROM product_sales
)
SELECT * FROM ranked WHERE sales_rank <= 3;

LAG and LEAD

For time-series and sequential analysis.

-- Month-over-month growth
SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY month) as prev_month_revenue,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) as mom_change,
    ROUND(
        (revenue - LAG(revenue, 1) OVER (ORDER BY month)) * 100.0 /
        NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0), 2
    ) as mom_growth_pct
FROM monthly_revenue;

Interview Pattern: "Find users who had a decrease in activity from previous month"

Running Totals and Moving Averages

-- Running total and 7-day moving average
SELECT
    date,
    daily_sales,
    SUM(daily_sales) OVER (ORDER BY date) as running_total,
    AVG(daily_sales) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as moving_avg_7d
FROM daily_metrics;

Frame Specifications:

Syntax Meaning
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW All rows from start to current
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW Last 7 rows including current
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING Sliding window of 3 rows

Common Table Expressions (CTEs)

CTEs improve readability and enable complex query decomposition.

Basic CTE Structure

WITH
    daily_totals AS (
        SELECT
            date,
            SUM(amount) as total_amount
        FROM transactions
        GROUP BY date
    ),
    weekly_avg AS (
        SELECT
            AVG(total_amount) as avg_daily_amount
        FROM daily_totals
    )
SELECT
    d.date,
    d.total_amount,
    w.avg_daily_amount,
    d.total_amount - w.avg_daily_amount as variance
FROM daily_totals d
CROSS JOIN weekly_avg w;

Recursive CTEs

Essential for hierarchical data and graph traversal.

-- Employee hierarchy (org chart)
WITH RECURSIVE org_tree AS (
    -- Base case: CEO (no manager)
    SELECT
        employee_id,
        name,
        manager_id,
        1 as level,
        CAST(name AS VARCHAR(1000)) as path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: employees with managers
    SELECT
        e.employee_id,
        e.name,
        e.manager_id,
        t.level + 1,
        t.path || ' -> ' || e.name
    FROM employees e
    INNER JOIN org_tree t ON e.manager_id = t.employee_id
)
SELECT * FROM org_tree ORDER BY level, name;

Interview Pattern: "Find all subordinates (direct and indirect) of a manager"

WITH RECURSIVE subordinates AS (
    SELECT employee_id, name, manager_id
    FROM employees
    WHERE manager_id = 101  -- Target manager

    UNION ALL

    SELECT e.employee_id, e.name, e.manager_id
    FROM employees e
    INNER JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;

Complex Joins and Set Operations

Self-Joins

-- Find employees earning more than their manager
SELECT
    e.name as employee,
    e.salary as employee_salary,
    m.name as manager,
    m.salary as manager_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary;

EXCEPT and INTERSECT

-- Find customers who ordered in 2024 but not in 2025
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2024
EXCEPT
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2025;

-- Find customers who ordered in both years
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2024
INTERSECT
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2025;

Aggregation Patterns

GROUPING SETS, CUBE, ROLLUP

-- Multiple grouping levels in one query
SELECT
    COALESCE(region, 'All Regions') as region,
    COALESCE(product_category, 'All Categories') as category,
    SUM(sales) as total_sales
FROM sales_data
GROUP BY GROUPING SETS (
    (region, product_category),
    (region),
    (product_category),
    ()
);
Clause Result
ROLLUP(a, b) (a,b), (a), ()
CUBE(a, b) (a,b), (a), (b), ()
GROUPING SETS((a,b), (a)) Exactly those combinations

Conditional Aggregation

-- Pivot without PIVOT keyword
SELECT
    product_id,
    SUM(CASE WHEN YEAR(sale_date) = 2024 THEN amount ELSE 0 END) as sales_2024,
    SUM(CASE WHEN YEAR(sale_date) = 2025 THEN amount ELSE 0 END) as sales_2025,
    COUNT(DISTINCT CASE WHEN status = 'returned' THEN order_id END) as return_count
FROM sales
GROUP BY product_id;

Interview Tip: When given a complex SQL problem, start by identifying which pattern applies (window function, recursive CTE, self-join, etc.). This framework helps you structure your solution quickly.

Next, we'll explore query optimization and execution plan analysis. :::

Quiz

Module 2: SQL Mastery & Query Optimization

Take Quiz