SQL Mastery for Interviews
Advanced SQL Patterns
These patterns appear in senior data science interviews. They test your ability to solve complex problems efficiently and write maintainable queries.
Self-Joins for Hierarchical Data
Self-joins let you compare rows within the same table - common for employee hierarchies and time-series comparisons.
Example: Find employees and their managers
SELECT
e.name as employee,
e.salary as employee_salary,
m.name as manager,
m.salary as manager_salary
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;
Interview question: "Find employees who earn more than their manager"
SELECT e.name, e.salary, m.name as manager, m.salary as manager_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary;
CTEs vs Subqueries
Common Table Expressions (CTEs) make complex queries readable. Modern SQL engines optimize them equally to subqueries.
Subquery approach (harder to read):
SELECT customer_id, order_count
FROM (
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id
) sub
WHERE order_count > (
SELECT AVG(order_count)
FROM (
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id
) sub2
);
CTE approach (cleaner):
WITH customer_orders AS (
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id
),
avg_orders AS (
SELECT AVG(order_count) as avg_count
FROM customer_orders
)
SELECT co.customer_id, co.order_count
FROM customer_orders co
CROSS JOIN avg_orders ao
WHERE co.order_count > ao.avg_count;
When to use CTEs:
- Query logic needs to be referenced multiple times
- Complex multi-step transformations
- Recursive queries (org charts, category trees)
Recursive CTEs
Handle hierarchical data like org charts or category trees:
-- Get all subordinates under a manager
WITH RECURSIVE subordinates AS (
-- Base case: direct reports
SELECT employee_id, name, manager_id, 1 as level
FROM employees
WHERE manager_id = 100 -- Starting manager
UNION ALL
-- Recursive case: reports of reports
SELECT e.employee_id, e.name, e.manager_id, s.level + 1
FROM employees e
INNER JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;
Date/Time Manipulation
Date functions vary by database, but core patterns are universal:
-- Extracting date parts
SELECT
order_date,
EXTRACT(YEAR FROM order_date) as year,
EXTRACT(MONTH FROM order_date) as month,
EXTRACT(DOW FROM order_date) as day_of_week, -- 0=Sunday
DATE_TRUNC('month', order_date) as month_start
FROM orders;
-- Date arithmetic
SELECT
order_date,
order_date + INTERVAL '7 days' as week_later,
order_date - INTERVAL '1 month' as month_ago,
CURRENT_DATE - order_date as days_since_order
FROM orders;
Common interview pattern: "Calculate 7-day retention rate"
WITH first_purchase AS (
SELECT
user_id,
MIN(order_date) as first_order_date
FROM orders
GROUP BY user_id
),
retained AS (
SELECT
fp.user_id,
fp.first_order_date,
CASE WHEN EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = fp.user_id
AND o.order_date BETWEEN fp.first_order_date + INTERVAL '1 day'
AND fp.first_order_date + INTERVAL '7 days'
) THEN 1 ELSE 0 END as is_retained
FROM first_purchase fp
)
SELECT
ROUND(100.0 * SUM(is_retained) / COUNT(*), 2) as retention_rate_7d
FROM retained;
Performance Considerations
Interviewers may ask about query optimization:
| Pattern | Performance Impact |
|---|---|
| SELECT * | Bad - fetches unnecessary columns |
| Missing indexes on JOIN columns | Bad - full table scans |
| Subquery in WHERE vs JOIN | Subquery often slower |
| DISTINCT on large datasets | Expensive - consider GROUP BY |
| Functions on indexed columns | Prevents index usage |
Example: Index-friendly vs index-breaking
-- Index-breaking (function on column)
WHERE YEAR(order_date) = 2025
-- Index-friendly (range comparison)
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01'
In interviews, mention performance considerations even if not asked. It shows you think about production systems. :::