Advanced SQL Patterns

These patterns appear in senior data science interviews. They test your ability to solve complex problems efficiently and write maintainable queries.

Self-Joins for Hierarchical Data

Self-joins let you compare rows within the same table - common for employee hierarchies and time-series comparisons.

Example: Find employees and their managers

SELECT
    e.name as employee,
    e.salary as employee_salary,
    m.name as manager,
    m.salary as manager_salary
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

Interview question: "Find employees who earn more than their manager"

SELECT e.name, e.salary, m.name as manager, m.salary as manager_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary;

CTEs vs Subqueries

Common Table Expressions (CTEs) make complex queries readable. Modern SQL engines optimize them equally to subqueries.

Subquery approach (harder to read):

SELECT customer_id, order_count
FROM (
    SELECT customer_id, COUNT(*) as order_count
    FROM orders
    GROUP BY customer_id
) sub
WHERE order_count > (
    SELECT AVG(order_count)
    FROM (
        SELECT customer_id, COUNT(*) as order_count
        FROM orders
        GROUP BY customer_id
    ) sub2
);

CTE approach (cleaner):

WITH customer_orders AS (
    SELECT customer_id, COUNT(*) as order_count
    FROM orders
    GROUP BY customer_id
),
avg_orders AS (
    SELECT AVG(order_count) as avg_count
    FROM customer_orders
)
SELECT co.customer_id, co.order_count
FROM customer_orders co
CROSS JOIN avg_orders ao
WHERE co.order_count > ao.avg_count;

When to use CTEs:

Query logic needs to be referenced multiple times
Complex multi-step transformations
Recursive queries (org charts, category trees)

Recursive CTEs

Handle hierarchical data like org charts or category trees:

-- Get all subordinates under a manager
WITH RECURSIVE subordinates AS (
    -- Base case: direct reports
    SELECT employee_id, name, manager_id, 1 as level
    FROM employees
    WHERE manager_id = 100  -- Starting manager

    UNION ALL

    -- Recursive case: reports of reports
    SELECT e.employee_id, e.name, e.manager_id, s.level + 1
    FROM employees e
    INNER JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;

Date/Time Manipulation

Date functions vary by database, but core patterns are universal:

-- Extracting date parts
SELECT
    order_date,
    EXTRACT(YEAR FROM order_date) as year,
    EXTRACT(MONTH FROM order_date) as month,
    EXTRACT(DOW FROM order_date) as day_of_week,  -- 0=Sunday
    DATE_TRUNC('month', order_date) as month_start
FROM orders;

-- Date arithmetic
SELECT
    order_date,
    order_date + INTERVAL '7 days' as week_later,
    order_date - INTERVAL '1 month' as month_ago,
    CURRENT_DATE - order_date as days_since_order
FROM orders;

Common interview pattern: "Calculate 7-day retention rate"

WITH first_purchase AS (
    SELECT
        user_id,
        MIN(order_date) as first_order_date
    FROM orders
    GROUP BY user_id
),
retained AS (
    SELECT
        fp.user_id,
        fp.first_order_date,
        CASE WHEN EXISTS (
            SELECT 1 FROM orders o
            WHERE o.user_id = fp.user_id
            AND o.order_date BETWEEN fp.first_order_date + INTERVAL '1 day'
                                 AND fp.first_order_date + INTERVAL '7 days'
        ) THEN 1 ELSE 0 END as is_retained
    FROM first_purchase fp
)
SELECT
    ROUND(100.0 * SUM(is_retained) / COUNT(*), 2) as retention_rate_7d
FROM retained;

Performance Considerations

Interviewers may ask about query optimization:

Pattern	Performance Impact
SELECT *	Bad - fetches unnecessary columns
Missing indexes on JOIN columns	Bad - full table scans
Subquery in WHERE vs JOIN	Subquery often slower
DISTINCT on large datasets	Expensive - consider GROUP BY
Functions on indexed columns	Prevents index usage

Example: Index-friendly vs index-breaking

-- Index-breaking (function on column)
WHERE YEAR(order_date) = 2025

-- Index-friendly (range comparison)
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01'

In interviews, mention performance considerations even if not asked. It shows you think about production systems. :::

Self-Joins for Hierarchical Data

CTEs vs Subqueries

Recursive CTEs

Date/Time Manipulation

Performance Considerations

Quiz

Stay on the Nerd Track