Database Design & SQL Mastery
Indexing, Query Optimization & Transactions
Indexes, query plans, and transaction safety are the topics where backend interviews go from surface-level to deep. This lesson covers how indexes work internally, how to read query plans, and how databases guarantee data integrity under concurrent access.
B-tree Indexes: The Default Workhorse
B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, and most relational databases. It keeps data sorted and allows searches, insertions, and deletions in O(log n) time.
How a B-tree Works
A B-tree is a self-balancing tree where:
- Internal nodes store keys and pointers to child nodes
- Leaf nodes store keys and pointers to actual table rows (heap tuples)
- All leaf nodes are at the same depth, guaranteeing O(log n) lookups
- Leaf nodes are linked together in a doubly-linked list, enabling efficient range scans
For a table with 10 million rows, a B-tree index typically has 3-4 levels deep. Finding any row requires reading only 3-4 pages from disk.
-- Creates a B-tree index (default type)
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- B-tree supports: =, <, >, <=, >=, BETWEEN, IN, IS NULL
-- Also supports ORDER BY without an additional sort step
SELECT * FROM orders
WHERE created_at BETWEEN '2026-01-01' AND '2026-01-31'
ORDER BY created_at DESC;
Hash Indexes
Hash indexes use a hash function to map keys to buckets. They are only useful for equality lookups and cannot support range queries or sorting.
CREATE INDEX idx_users_email_hash ON users USING hash(email);
-- Good: exact match (O(1) average)
SELECT * FROM users WHERE email = 'user@example.com';
-- CANNOT use hash index: range query
SELECT * FROM users WHERE email > 'a' AND email < 'm';
When to use hash indexes: Only when you exclusively query by exact equality on a column and need slightly faster lookups than B-tree. In practice, B-tree is almost always preferred because the performance difference is marginal while B-tree is far more versatile.
Composite Indexes and the Leftmost Prefix Rule
A composite index covers multiple columns. The column order matters critically due to the leftmost prefix rule.
CREATE INDEX idx_orders_user_status_date
ON orders(user_id, status, created_at);
This single index can serve these queries:
| Query Filter | Uses Index? | Reason |
|---|---|---|
WHERE user_id = 5 |
Yes | Leftmost column |
WHERE user_id = 5 AND status = 'shipped' |
Yes | Left two columns |
WHERE user_id = 5 AND status = 'shipped' AND created_at > '2026-01-01' |
Yes | All three columns |
WHERE status = 'shipped' |
No | Skips leftmost column |
WHERE created_at > '2026-01-01' |
No | Skips leftmost columns |
WHERE user_id = 5 AND created_at > '2026-01-01' |
Partial | Uses user_id, skips status for range |
Interview tip: When asked about composite index design, think about your most common query patterns. Place equality columns first, then range columns last. The column with the highest cardinality (most distinct values) should generally come first among equality columns.
Specialized Index Types
GIN Indexes (Generalized Inverted Index)
GIN indexes are designed for values that contain multiple elements -- arrays, JSONB documents, and full-text search vectors.
-- Full-text search on product descriptions
CREATE INDEX idx_products_description_gin
ON products USING gin(to_tsvector('english', description));
SELECT * FROM products
WHERE to_tsvector('english', description) @@ to_tsquery('wireless & headphones');
-- JSONB containment queries
CREATE INDEX idx_events_metadata_gin ON events USING gin(metadata);
SELECT * FROM events
WHERE metadata @> '{"type": "purchase", "platform": "mobile"}';
Covering Indexes (INCLUDE)
A covering index includes extra columns that are not part of the search key but are needed by the query. This enables index-only scans, avoiding the table lookup entirely.
-- Without covering index: index scan + table lookup for email and name
CREATE INDEX idx_users_username ON users(username);
-- With covering index: index-only scan (no table lookup needed)
CREATE INDEX idx_users_username_covering
ON users(username) INCLUDE (email, display_name);
-- This query can be answered entirely from the index
SELECT email, display_name FROM users WHERE username = 'johndoe';
Reading EXPLAIN ANALYZE Output
EXPLAIN ANALYZE shows the actual execution plan and timing of a query. This is the most important debugging tool for slow queries.
EXPLAIN ANALYZE
SELECT u.username, COUNT(p.id) as post_count
FROM users u
JOIN posts p ON p.author_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.username
ORDER BY post_count DESC
LIMIT 10;
Example output (simplified):
Limit (cost=1250.00..1250.02 rows=10) (actual time=45.2..45.3 rows=10)
-> Sort (cost=1250.00..1262.00 rows=4800) (actual time=45.2..45.2 rows=10)
Sort Key: (count(p.id)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=1100.00..1148.00 rows=4800) (actual time=42.1..43.8 rows=4800)
Group Key: u.username
-> Hash Join (cost=180.00..950.00 rows=30000) (actual time=5.2..35.6 rows=30000)
Hash Cond: (p.author_id = u.id)
-> Seq Scan on posts p (cost=0.00..620.00 rows=50000) (actual time=0.01..12.3 rows=50000)
-> Hash (cost=150.00..150.00 rows=4800) (actual time=4.8..4.8 rows=4800)
-> Index Scan using idx_users_created_at on users u
(cost=0.29..150.00 rows=4800) (actual time=0.03..3.2 rows=4800)
Index Cond: (created_at > '2025-01-01')
Key scan types to know:
| Scan Type | What It Means | Performance |
|---|---|---|
| Seq Scan | Reads entire table row by row | Slow for large tables |
| Index Scan | Uses index to find rows, then fetches from table | Fast for selective queries |
| Index Only Scan | Reads from index only, no table access | Fastest |
| Bitmap Index Scan | Builds bitmap of matching pages, then fetches | Good for medium selectivity |
| Hash Join | Builds hash table from smaller relation | Fast for equi-joins |
| Nested Loop | For each row in outer, scan inner | Fast when inner has index |
| Merge Join | Merges two sorted inputs | Fast when both inputs are sorted |
ACID Properties
Every relational database guarantees ACID properties for transactions:
Atomicity -- A transaction is all-or-nothing. If any statement fails, all changes are rolled back.
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
-- If the second UPDATE fails, the first is also rolled back
COMMIT;
Consistency -- A transaction moves the database from one valid state to another. Constraints (CHECK, FOREIGN KEY, UNIQUE) are enforced.
Isolation -- Concurrent transactions do not interfere with each other (degree depends on isolation level).
Durability -- Once committed, changes survive system crashes. This is achieved through write-ahead logging (WAL).
Isolation Levels and Concurrency Anomalies
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible |
| Read Committed | Prevented | Possible | Possible |
| Repeatable Read | Prevented | Prevented | Possible (prevented in PostgreSQL) |
| Serializable | Prevented | Prevented | Prevented |
Dirty Read: Transaction A reads data that Transaction B has modified but not yet committed. If B rolls back, A has read data that never officially existed.
Non-Repeatable Read: Transaction A reads a row, Transaction B updates and commits it, then A reads the same row again and gets a different value.
Phantom Read: Transaction A queries rows matching a condition, Transaction B inserts a new row matching that condition and commits, then A runs the same query and gets an extra row.
-- PostgreSQL default is Read Committed
-- To use Serializable:
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM accounts WHERE id = 1;
-- Another transaction cannot modify this row until we commit
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
Important: PostgreSQL implements Repeatable Read using MVCC snapshots, which also prevents phantom reads -- making it behave more like Serializable in other databases. True Serializable in PostgreSQL uses Serializable Snapshot Isolation (SSI).
MVCC: How PostgreSQL Handles Concurrency
Multi-Version Concurrency Control (MVCC) means that readers never block writers and writers never block readers. PostgreSQL achieves this by keeping multiple versions of each row.
When a row is updated, PostgreSQL does not overwrite the original. Instead:
- The old row version is marked with an
xmax(the transaction ID that deleted/updated it) - A new row version is created with an
xmin(the transaction ID that created it) - Each transaction sees only row versions that were committed before the transaction's snapshot
This is why PostgreSQL needs VACUUM -- dead row versions accumulate and must be cleaned up periodically.
Deadlock Detection and Prevention
A deadlock occurs when two or more transactions are each waiting for a lock held by the other.
-- Transaction A:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- locks row 1
UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- waits for row 2
-- Transaction B (concurrent):
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2; -- locks row 2
UPDATE accounts SET balance = balance + 50 WHERE id = 1; -- waits for row 1
-- DEADLOCK: A waits for B, B waits for A
Prevention strategies:
- Consistent lock ordering -- Always acquire locks in the same order (e.g., by ascending ID). Both transactions should lock id=1 first, then id=2
- Lock timeout -- Set
lock_timeoutto fail fast instead of waiting indefinitely - Keep transactions short -- Minimize the time locks are held
- Use SELECT ... FOR UPDATE NOWAIT -- Fail immediately if the row is locked instead of waiting
-- Safe pattern: always lock in ascending ID order
BEGIN;
SELECT * FROM accounts WHERE id IN (1, 2) ORDER BY id FOR UPDATE;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
Interview tip: When discussing transactions, always mention: isolation level choice depends on your consistency requirements versus throughput needs. Most production systems use Read Committed (the PostgreSQL default) and handle edge cases with explicit locking where needed, rather than using Serializable globally.
Next: NoSQL databases -- when to choose document stores, key-value stores, column-family stores, or graph databases. :::