Database Design & SQL Mastery

Indexing, Query Optimization & Transactions

5 min read

Indexes, query plans, and transaction safety are the topics where backend interviews go from surface-level to deep. This lesson covers how indexes work internally, how to read query plans, and how databases guarantee data integrity under concurrent access.

B-tree Indexes: The Default Workhorse

B-tree (balanced tree) is the default index type in PostgreSQL, MySQL, and most relational databases. It keeps data sorted and allows searches, insertions, and deletions in O(log n) time.

How a B-tree Works

A B-tree is a self-balancing tree where:

  • Internal nodes store keys and pointers to child nodes
  • Leaf nodes store keys and pointers to actual table rows (heap tuples)
  • All leaf nodes are at the same depth, guaranteeing O(log n) lookups
  • Leaf nodes are linked together in a doubly-linked list, enabling efficient range scans

For a table with 10 million rows, a B-tree index typically has 3-4 levels deep. Finding any row requires reading only 3-4 pages from disk.

-- Creates a B-tree index (default type)
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- B-tree supports: =, <, >, <=, >=, BETWEEN, IN, IS NULL
-- Also supports ORDER BY without an additional sort step
SELECT * FROM orders
WHERE created_at BETWEEN '2026-01-01' AND '2026-01-31'
ORDER BY created_at DESC;

Hash Indexes

Hash indexes use a hash function to map keys to buckets. They are only useful for equality lookups and cannot support range queries or sorting.

CREATE INDEX idx_users_email_hash ON users USING hash(email);

-- Good: exact match (O(1) average)
SELECT * FROM users WHERE email = 'user@example.com';

-- CANNOT use hash index: range query
SELECT * FROM users WHERE email > 'a' AND email < 'm';

When to use hash indexes: Only when you exclusively query by exact equality on a column and need slightly faster lookups than B-tree. In practice, B-tree is almost always preferred because the performance difference is marginal while B-tree is far more versatile.

Composite Indexes and the Leftmost Prefix Rule

A composite index covers multiple columns. The column order matters critically due to the leftmost prefix rule.

CREATE INDEX idx_orders_user_status_date
ON orders(user_id, status, created_at);

This single index can serve these queries:

Query Filter Uses Index? Reason
WHERE user_id = 5 Yes Leftmost column
WHERE user_id = 5 AND status = 'shipped' Yes Left two columns
WHERE user_id = 5 AND status = 'shipped' AND created_at > '2026-01-01' Yes All three columns
WHERE status = 'shipped' No Skips leftmost column
WHERE created_at > '2026-01-01' No Skips leftmost columns
WHERE user_id = 5 AND created_at > '2026-01-01' Partial Uses user_id, skips status for range

Interview tip: When asked about composite index design, think about your most common query patterns. Place equality columns first, then range columns last. The column with the highest cardinality (most distinct values) should generally come first among equality columns.

Specialized Index Types

GIN Indexes (Generalized Inverted Index)

GIN indexes are designed for values that contain multiple elements -- arrays, JSONB documents, and full-text search vectors.

-- Full-text search on product descriptions
CREATE INDEX idx_products_description_gin
ON products USING gin(to_tsvector('english', description));

SELECT * FROM products
WHERE to_tsvector('english', description) @@ to_tsquery('wireless & headphones');

-- JSONB containment queries
CREATE INDEX idx_events_metadata_gin ON events USING gin(metadata);

SELECT * FROM events
WHERE metadata @> '{"type": "purchase", "platform": "mobile"}';

Covering Indexes (INCLUDE)

A covering index includes extra columns that are not part of the search key but are needed by the query. This enables index-only scans, avoiding the table lookup entirely.

-- Without covering index: index scan + table lookup for email and name
CREATE INDEX idx_users_username ON users(username);

-- With covering index: index-only scan (no table lookup needed)
CREATE INDEX idx_users_username_covering
ON users(username) INCLUDE (email, display_name);

-- This query can be answered entirely from the index
SELECT email, display_name FROM users WHERE username = 'johndoe';

Reading EXPLAIN ANALYZE Output

EXPLAIN ANALYZE shows the actual execution plan and timing of a query. This is the most important debugging tool for slow queries.

EXPLAIN ANALYZE
SELECT u.username, COUNT(p.id) as post_count
FROM users u
JOIN posts p ON p.author_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.username
ORDER BY post_count DESC
LIMIT 10;

Example output (simplified):

Limit (cost=1250.00..1250.02 rows=10) (actual time=45.2..45.3 rows=10)
  -> Sort (cost=1250.00..1262.00 rows=4800) (actual time=45.2..45.2 rows=10)
        Sort Key: (count(p.id)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
        -> HashAggregate (cost=1100.00..1148.00 rows=4800) (actual time=42.1..43.8 rows=4800)
              Group Key: u.username
              -> Hash Join (cost=180.00..950.00 rows=30000) (actual time=5.2..35.6 rows=30000)
                    Hash Cond: (p.author_id = u.id)
                    -> Seq Scan on posts p (cost=0.00..620.00 rows=50000) (actual time=0.01..12.3 rows=50000)
                    -> Hash (cost=150.00..150.00 rows=4800) (actual time=4.8..4.8 rows=4800)
                          -> Index Scan using idx_users_created_at on users u
                             (cost=0.29..150.00 rows=4800) (actual time=0.03..3.2 rows=4800)
                                Index Cond: (created_at > '2025-01-01')

Key scan types to know:

Scan Type What It Means Performance
Seq Scan Reads entire table row by row Slow for large tables
Index Scan Uses index to find rows, then fetches from table Fast for selective queries
Index Only Scan Reads from index only, no table access Fastest
Bitmap Index Scan Builds bitmap of matching pages, then fetches Good for medium selectivity
Hash Join Builds hash table from smaller relation Fast for equi-joins
Nested Loop For each row in outer, scan inner Fast when inner has index
Merge Join Merges two sorted inputs Fast when both inputs are sorted

ACID Properties

Every relational database guarantees ACID properties for transactions:

Atomicity -- A transaction is all-or-nothing. If any statement fails, all changes are rolled back.

BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
-- If the second UPDATE fails, the first is also rolled back
COMMIT;

Consistency -- A transaction moves the database from one valid state to another. Constraints (CHECK, FOREIGN KEY, UNIQUE) are enforced.

Isolation -- Concurrent transactions do not interfere with each other (degree depends on isolation level).

Durability -- Once committed, changes survive system crashes. This is achieved through write-ahead logging (WAL).

Isolation Levels and Concurrency Anomalies

Isolation Level Dirty Read Non-Repeatable Read Phantom Read
Read Uncommitted Possible Possible Possible
Read Committed Prevented Possible Possible
Repeatable Read Prevented Prevented Possible (prevented in PostgreSQL)
Serializable Prevented Prevented Prevented

Dirty Read: Transaction A reads data that Transaction B has modified but not yet committed. If B rolls back, A has read data that never officially existed.

Non-Repeatable Read: Transaction A reads a row, Transaction B updates and commits it, then A reads the same row again and gets a different value.

Phantom Read: Transaction A queries rows matching a condition, Transaction B inserts a new row matching that condition and commits, then A runs the same query and gets an extra row.

-- PostgreSQL default is Read Committed
-- To use Serializable:
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM accounts WHERE id = 1;
-- Another transaction cannot modify this row until we commit
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

Important: PostgreSQL implements Repeatable Read using MVCC snapshots, which also prevents phantom reads -- making it behave more like Serializable in other databases. True Serializable in PostgreSQL uses Serializable Snapshot Isolation (SSI).

MVCC: How PostgreSQL Handles Concurrency

Multi-Version Concurrency Control (MVCC) means that readers never block writers and writers never block readers. PostgreSQL achieves this by keeping multiple versions of each row.

When a row is updated, PostgreSQL does not overwrite the original. Instead:

  1. The old row version is marked with an xmax (the transaction ID that deleted/updated it)
  2. A new row version is created with an xmin (the transaction ID that created it)
  3. Each transaction sees only row versions that were committed before the transaction's snapshot

This is why PostgreSQL needs VACUUM -- dead row versions accumulate and must be cleaned up periodically.

Deadlock Detection and Prevention

A deadlock occurs when two or more transactions are each waiting for a lock held by the other.

-- Transaction A:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- locks row 1
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- waits for row 2

-- Transaction B (concurrent):
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;   -- locks row 2
UPDATE accounts SET balance = balance + 50 WHERE id = 1;   -- waits for row 1
-- DEADLOCK: A waits for B, B waits for A

Prevention strategies:

  1. Consistent lock ordering -- Always acquire locks in the same order (e.g., by ascending ID). Both transactions should lock id=1 first, then id=2
  2. Lock timeout -- Set lock_timeout to fail fast instead of waiting indefinitely
  3. Keep transactions short -- Minimize the time locks are held
  4. Use SELECT ... FOR UPDATE NOWAIT -- Fail immediately if the row is locked instead of waiting
-- Safe pattern: always lock in ascending ID order
BEGIN;
SELECT * FROM accounts WHERE id IN (1, 2) ORDER BY id FOR UPDATE;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Interview tip: When discussing transactions, always mention: isolation level choice depends on your consistency requirements versus throughput needs. Most production systems use Read Committed (the PostgreSQL default) and handle edge cases with explicit locking where needed, rather than using Serializable globally.

Next: NoSQL databases -- when to choose document stores, key-value stores, column-family stores, or graph databases. :::

Quiz

Module 2 Quiz: Database Design & SQL Mastery

Take Quiz