System Design Framework

System design interviews test your ability to design large-scale systems. Unlike coding rounds, there's no single correct answer -- interviewers evaluate your thought process, trade-off analysis, and communication.

The 4-Step Framework

Use this structure for every system design interview:

Step 1: Clarify Requirements (5 minutes)

Ask questions to narrow scope. Split into:

Functional Requirements (what the system does):

What are the core features?
Who are the users?
What are the main use cases?

Non-Functional Requirements (how the system performs):

How many users? (scale)
What latency is acceptable?
Is consistency or availability more important?
What's the read-to-write ratio?

Example: "Design a URL shortener"

Functional: Create short URL, redirect to long URL, optional custom aliases

Non-functional: 100M URLs/month, <100ms redirect latency, 99.9% availability

Step 2: Back-of-Envelope Estimation (5 minutes)

Quick math to understand scale:

URL Shortener estimates:
- 100M new URLs/month = ~40 URLs/second (write)
- Read:Write ratio = 100:1 → 4,000 reads/second
- Storage: 100M × 500 bytes = 50 GB/month
- 5 years: 50 GB × 60 = 3 TB total
- Bandwidth: 4,000 × 500 bytes = 2 MB/s

Key Formulas:

QPS (Queries Per Second) = Total requests / Seconds in period
Storage = Records × Average record size
Bandwidth = QPS × Average response size

Step 3: High-Level Design (10-15 minutes)

Draw the main components and their interactions:

Client → Load Balancer → API Servers → Database
                                    ↓
                              Cache (Redis)

For the URL shortener:

API Layer: POST /shorten (create), GET /:code (redirect)
ID Generation: Base62 encoding or hash-based
Storage: Database for URL mappings
Cache: Redis for hot URLs (most URLs follow power-law distribution)

Step 4: Deep Dive (15-20 minutes)

The interviewer picks areas to explore. Be ready to discuss:

Database choice and schema
Caching strategy
Handling edge cases
Scaling bottlenecks

CAP Theorem

In a distributed system, you can only guarantee two of three:

Property	Meaning	Example
Consistency	Every read returns the latest write	Banking transactions
Availability	Every request gets a response	Social media feeds
Partition Tolerance	System works despite network failures	Any distributed system

In practice: Network partitions are unavoidable, so the real choice is between CP (consistent but may be unavailable) and AP (available but may serve stale data).

PACELC Theorem

An extension of CAP: even without partitions, there's a trade-off between Latency and Consistency.

System	During Partition	Normal Operation
DynamoDB	AP (available)	EL (low latency)
PostgreSQL	CP (consistent)	EC (consistent)
Cassandra	AP (available)	EL (tunable)

Next, let's cover the core building blocks that appear in every system design. :::