Backend System Design
System Design Framework & Building Blocks
System design interviews test your ability to architect large-scale backend systems under ambiguity. The key is not memorizing solutions but demonstrating a structured approach to breaking down problems and making trade-off decisions. This lesson covers the 4-step framework interviewers expect and the core building blocks you will use in every design.
The 4-Step System Design Framework
Every system design interview should follow this structure. Interviewers are evaluating your process as much as your solution.
Step 1: Requirements Clarification (3-5 minutes)
Never start designing immediately. Ask targeted questions to narrow scope.
Functional requirements define what the system does:
- Who are the users? How many?
- What are the core features? (List 3-5, then prioritize)
- What are the input/output formats?
Non-functional requirements define how well the system performs:
| Property | Question to Ask | Typical Target |
|---|---|---|
| Latency | What is acceptable response time? | p99 < 200ms for reads |
| Throughput | How many requests per second? | Derived from DAU |
| Availability | What uptime is required? | 99.9% = 8.7 hours downtime/year |
| Consistency | Is eventual consistency acceptable? | Depends on domain |
| Durability | Can we lose data? | 99.999999999% for storage |
Step 2: Back-of-Envelope Estimation (3-5 minutes)
Demonstrate that you can reason about scale. Use simple formulas:
QPS (queries per second) = DAU x requests_per_user / 86,400
Storage per year = daily_new_records x record_size x 365
Bandwidth = QPS x average_response_size
Example: Social media feed service
DAU = 100M users
Each user loads feed 5 times/day → reads = 500M/day
Read QPS = 500M / 86,400 ≈ 5,800 QPS
Peak QPS = 5,800 x 3 ≈ 17,400 QPS (3x average for peak)
Each user posts 0.5 times/day → writes = 50M/day
Write QPS = 50M / 86,400 ≈ 580 QPS
Storage per post = 1 KB text + 500 KB media avg = ~500 KB
Daily storage = 50M x 500 KB = 25 TB/day
Yearly storage = 25 TB x 365 ≈ 9 PB/year
Step 3: High-Level Architecture (5-10 minutes)
Sketch the main components. Start with this universal backend template:
┌─────────────┐
│ Clients │
│ (Web/Mobile)│
└──────┬──────┘
│
┌──────▼──────┐
│ CDN │
│ (Static/Img)│
└──────┬──────┘
│
┌──────▼──────┐
│ Load │
│ Balancer │
└──────┬──────┘
│
┌────────────────┼────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ API Server │ │ API Server │ │ API Server │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────┼────────────────┘
┌────────┼────────┐
│ │ │
┌──────▼──┐ ┌──▼───┐ ┌──▼──────────┐
│ Cache │ │ DB │ │ Message │
│ (Redis) │ │(SQL) │ │ Queue │
└──────────┘ └──────┘ └─────────────┘
Step 4: Deep Dive (15-25 minutes)
Pick 2-3 components based on the interviewer's interest or the system's bottleneck. Go deep on data models, algorithms, failure handling, and scaling strategies.
Pro tip: Interviewers often ask "What would break first at 10x scale?" This is your cue to discuss the component under most pressure.
Building Block: Load Balancers
Load balancers distribute traffic across multiple servers to ensure no single server is overwhelmed.
L4 vs L7 Load Balancers
| Feature | L4 (Transport) | L7 (Application) |
|---|---|---|
| Layer | TCP/UDP | HTTP/HTTPS |
| Speed | Faster (no payload inspection) | Slower (inspects headers/body) |
| Routing | IP + port only | URL path, headers, cookies |
| SSL | Pass-through or terminate | Always terminates |
| Use case | Raw throughput, TCP services | API routing, A/B testing |
| Examples | AWS NLB, HAProxy (TCP mode) | AWS ALB, Nginx, Envoy |
Load Balancing Algorithms
| Algorithm | How It Works | Best For |
|---|---|---|
| Round-robin | Rotate through servers sequentially | Equal-capacity servers |
| Weighted round-robin | Higher-capacity servers get more requests | Mixed server sizes |
| Least connections | Route to server with fewest active connections | Varied request durations |
| IP hash | Hash client IP to pick server | Session affinity without cookies |
| Consistent hashing | Hash-ring-based distribution | Caching layers, stateful services |
Health Checks
Load balancers send periodic health checks (HTTP GET to /health or TCP connect) and remove unhealthy nodes from the pool. Typical configuration: check every 10 seconds, mark unhealthy after 3 consecutive failures, re-add after 2 consecutive successes.
Building Block: Caching Strategies
Caching is the single most impactful optimization in backend systems. You must know these four patterns and their trade-offs.
Caching Strategy Comparison
| Strategy | Read Path | Write Path | Consistency | Use Case |
|---|---|---|---|---|
| Cache-aside | App checks cache → miss → read DB → populate cache | App writes to DB only | Eventual (stale reads possible) | General-purpose, read-heavy |
| Write-through | App reads from cache | App writes to cache and DB simultaneously | Strong (always up to date) | Read-heavy with consistency needs |
| Write-back | App reads from cache | App writes to cache only, async flush to DB | Eventual (risk of data loss) | Write-heavy, can tolerate loss |
| Write-around | App checks cache → miss → read DB → populate cache | App writes to DB only, invalidate cache | Eventual | Data rarely re-read after write |
Cache-Aside Pattern (Most Common)
def get_user(user_id: str) -> dict:
# Step 1: Check cache
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# Step 2: Cache miss — read from database
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
# Step 3: Populate cache with TTL
redis.setex(f"user:{user_id}", 3600, json.dumps(user))
return user
Cache Eviction Policies
| Policy | Mechanism | Best For |
|---|---|---|
| LRU (Least Recently Used) | Evict the item not accessed for the longest time | General-purpose — most popular choice |
| LFU (Least Frequently Used) | Evict the item accessed the fewest times | Workloads with stable hot keys |
| TTL (Time to Live) | Evict after a fixed expiration time | Data with known staleness tolerance |
Building Block: CDN (Content Delivery Network)
CDNs cache static content (images, JS, CSS, videos) at edge locations close to users, reducing latency and offloading origin servers.
| Model | How It Works | Best For |
|---|---|---|
| Push CDN | Origin pushes content to CDN nodes proactively | Small, rarely-changing content (logos, CSS bundles) |
| Pull CDN | CDN fetches from origin on first request, then caches | Large catalogs, user-generated content |
Interview tip: Most real-world CDNs use pull model with TTL-based invalidation. Mention CloudFront, Cloudflare, or Akamai to show practical knowledge.
Building Block: Message Queues
Message queues decouple producers from consumers, enabling asynchronous processing, load leveling, and fault tolerance.
Point-to-Point vs Pub-Sub
| Feature | Point-to-Point | Pub-Sub |
|---|---|---|
| Delivery | One consumer per message | All subscribers get every message |
| Use case | Task distribution, job queues | Event broadcasting, notifications |
| Example | SQS, Celery task queue | Kafka topics, SNS, Redis Pub/Sub |
Message Queue Comparison
| Feature | Kafka | RabbitMQ | SQS |
|---|---|---|---|
| Model | Distributed log (pub-sub) | Message broker (point-to-point + pub-sub) | Managed queue (point-to-point) |
| Throughput | Very high (millions/sec) | Moderate (tens of thousands/sec) | High (managed, auto-scales) |
| Ordering | Per-partition ordering | Per-queue ordering | Best-effort (FIFO available) |
| Retention | Configurable (days/weeks) | Until consumed | 14 days max |
| Replay | Yes (consumer offsets) | No (once consumed, gone) | No |
| Best for | Event streaming, log aggregation | Task queues, RPC | Simple async jobs, serverless |
Building Block: Consistent Hashing
When distributing data across multiple nodes (cache servers, DB shards), naive modular hashing (hash(key) % N) breaks when nodes are added or removed — almost all keys remap. Consistent hashing solves this.
How It Works
1. Map the hash space to a ring (0 to 2^32 - 1)
2. Place each server at a position on the ring: hash(server_id)
3. For each key, hash it and walk clockwise to the nearest server
Server A
|
───────●───────
/ \
● Key X ● Server C
\ (goes to A) /
───────●───────
|
Server B
Virtual Nodes
Real servers are uneven in capacity and hash positions can cluster. Virtual nodes fix this by placing each physical server at multiple positions on the ring (e.g., 150-200 virtual nodes per server). This ensures even distribution and smooth rebalancing.
When a node is added: Only keys between the new node and its predecessor on the ring need to move — roughly K / N keys instead of all keys (where K = total keys, N = total nodes).
When a node is removed: Only its keys move to the next node clockwise on the ring.
Used by: DynamoDB (partition routing), Cassandra (token ring), Memcached client-side sharding, Akamai CDN.
Next: We will walk through four classic backend design problems step by step — URL shortener, rate limiter, notification service, and chat system. :::