SQL Mastery for Interviews

Window Functions Deep Dive

4 min read

Window functions are the most tested advanced SQL concept in data science interviews. They let you calculate across rows while keeping individual row detail - something regular GROUP BY cannot do.

The Window Function Syntax

function_name(column) OVER (
    PARTITION BY partition_column    -- Optional: group rows
    ORDER BY order_column            -- Optional: order within partition
    ROWS/RANGE frame_specification   -- Optional: define window frame
)

Ranking Functions

Three ranking functions serve different purposes:

Function Ties Gaps
ROW_NUMBER() Arbitrary order No gaps
RANK() Same rank Gaps after ties
DENSE_RANK() Same rank No gaps

Example: Ranking salespeople by revenue

SELECT
    name,
    revenue,
    ROW_NUMBER() OVER (ORDER BY revenue DESC) as row_num,
    RANK() OVER (ORDER BY revenue DESC) as rank,
    DENSE_RANK() OVER (ORDER BY revenue DESC) as dense_rank
FROM sales;

-- Results:
-- name    | revenue | row_num | rank | dense_rank
-- Alice   | 100     | 1       | 1    | 1
-- Bob     | 100     | 2       | 1    | 1  (tie with Alice)
-- Charlie | 90      | 3       | 3    | 2  (rank skips 2, dense_rank doesn't)

Classic interview question: "Find the top 3 products by sales in each category"

WITH ranked AS (
    SELECT
        category,
        product_name,
        sales,
        DENSE_RANK() OVER (
            PARTITION BY category
            ORDER BY sales DESC
        ) as rank
    FROM products
)
SELECT category, product_name, sales
FROM ranked
WHERE rank <= 3;

LAG and LEAD

Access previous or next row values - essential for time-series analysis.

-- Calculate day-over-day change in revenue
SELECT
    date,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY date) as prev_day_revenue,
    revenue - LAG(revenue, 1) OVER (ORDER BY date) as daily_change,
    ROUND(
        100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY date))
        / LAG(revenue, 1) OVER (ORDER BY date),
        2
    ) as pct_change
FROM daily_sales;

Interview pattern: "Identify users whose spending decreased for 3 consecutive months"

WITH monthly_spending AS (
    SELECT
        user_id,
        DATE_TRUNC('month', order_date) as month,
        SUM(amount) as spending,
        LAG(SUM(amount), 1) OVER (PARTITION BY user_id ORDER BY DATE_TRUNC('month', order_date)) as prev_month,
        LAG(SUM(amount), 2) OVER (PARTITION BY user_id ORDER BY DATE_TRUNC('month', order_date)) as prev_2_month
    FROM orders
    GROUP BY user_id, DATE_TRUNC('month', order_date)
)
SELECT DISTINCT user_id
FROM monthly_spending
WHERE spending < prev_month
  AND prev_month < prev_2_month;

Running Totals and Moving Averages

Window frames let you calculate cumulative or moving calculations:

-- Running total of revenue
SELECT
    date,
    revenue,
    SUM(revenue) OVER (
        ORDER BY date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total
FROM daily_sales;

-- 7-day moving average
SELECT
    date,
    revenue,
    AVG(revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as moving_avg_7d
FROM daily_sales;

PARTITION BY for Segmented Analysis

PARTITION BY restarts calculations for each group:

-- Running total PER customer
SELECT
    customer_id,
    order_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) as customer_running_total
FROM orders;

Interview Pro Tips

  1. PARTITION BY vs GROUP BY: PARTITION BY keeps all rows, GROUP BY collapses them
  2. ORDER BY in OVER: Required for ranking and LAG/LEAD, optional for simple aggregates
  3. Default frame: If ORDER BY is specified, default is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  4. Multiple windows: You can use different OVER() clauses in the same query

Window functions separate intermediate from senior candidates. Master these patterns and you'll stand out. :::

Quiz

Module 2: SQL Mastery for Interviews

Take Quiz