System Design Fundamentals
System Design Framework
System design interviews test your ability to design large-scale systems. Unlike coding rounds, there's no single correct answer -- interviewers evaluate your thought process, trade-off analysis, and communication.
The 4-Step Framework
Use this structure for every system design interview:
Step 1: Clarify Requirements (5 minutes)
Ask questions to narrow scope. Split into:
Functional Requirements (what the system does):
- What are the core features?
- Who are the users?
- What are the main use cases?
Non-Functional Requirements (how the system performs):
- How many users? (scale)
- What latency is acceptable?
- Is consistency or availability more important?
- What's the read-to-write ratio?
Example: "Design a URL shortener"
- Functional: Create short URL, redirect to long URL, optional custom aliases
- Non-functional: 100M URLs/month, <100ms redirect latency, 99.9% availability
Step 2: Back-of-Envelope Estimation (5 minutes)
Quick math to understand scale:
URL Shortener estimates:
- 100M new URLs/month = ~40 URLs/second (write)
- Read:Write ratio = 100:1 → 4,000 reads/second
- Storage: 100M × 500 bytes = 50 GB/month
- 5 years: 50 GB × 60 = 3 TB total
- Bandwidth: 4,000 × 500 bytes = 2 MB/s
Key Formulas:
- QPS (Queries Per Second) = Total requests / Seconds in period
- Storage = Records × Average record size
- Bandwidth = QPS × Average response size
Step 3: High-Level Design (10-15 minutes)
Draw the main components and their interactions:
Client → Load Balancer → API Servers → Database
↓
Cache (Redis)
For the URL shortener:
- API Layer: POST /shorten (create), GET /:code (redirect)
- ID Generation: Base62 encoding or hash-based
- Storage: Database for URL mappings
- Cache: Redis for hot URLs (most URLs follow power-law distribution)
Step 4: Deep Dive (15-20 minutes)
The interviewer picks areas to explore. Be ready to discuss:
- Database choice and schema
- Caching strategy
- Handling edge cases
- Scaling bottlenecks
CAP Theorem
In a distributed system, you can only guarantee two of three:
| Property | Meaning | Example |
|---|---|---|
| Consistency | Every read returns the latest write | Banking transactions |
| Availability | Every request gets a response | Social media feeds |
| Partition Tolerance | System works despite network failures | Any distributed system |
In practice: Network partitions are unavoidable, so the real choice is between CP (consistent but may be unavailable) and AP (available but may serve stale data).
PACELC Theorem
An extension of CAP: even without partitions, there's a trade-off between Latency and Consistency.
| System | During Partition | Normal Operation |
|---|---|---|
| DynamoDB | AP (available) | EL (low latency) |
| PostgreSQL | CP (consistent) | EC (consistent) |
| Cassandra | AP (available) | EL (tunable) |
Next, let's cover the core building blocks that appear in every system design. :::