System Design Framework & Building Blocks

System design interviews test your ability to architect large-scale backend systems under ambiguity. The key is not memorizing solutions but demonstrating a structured approach to breaking down problems and making trade-off decisions. This lesson covers the 4-step framework interviewers expect and the core building blocks you will use in every design.

The 4-Step System Design Framework

Every system design interview should follow this structure. Interviewers are evaluating your process as much as your solution.

Step 1: Requirements Clarification (3-5 minutes)

Never start designing immediately. Ask targeted questions to narrow scope.

Functional requirements define what the system does:

Who are the users? How many?
What are the core features? (List 3-5, then prioritize)
What are the input/output formats?

Non-functional requirements define how well the system performs:

Property	Question to Ask	Typical Target
Latency	What is acceptable response time?	p99 < 200ms for reads
Throughput	How many requests per second?	Derived from DAU
Availability	What uptime is required?	99.9% = 8.7 hours downtime/year
Consistency	Is eventual consistency acceptable?	Depends on domain
Durability	Can we lose data?	99.999999999% for storage

Step 2: Back-of-Envelope Estimation (3-5 minutes)

Demonstrate that you can reason about scale. Use simple formulas:

QPS (queries per second) = DAU x requests_per_user / 86,400

Storage per year = daily_new_records x record_size x 365

Bandwidth = QPS x average_response_size

Example: Social media feed service

DAU = 100M users
Each user loads feed 5 times/day → reads = 500M/day
Read QPS = 500M / 86,400 ≈ 5,800 QPS
Peak QPS = 5,800 x 3 ≈ 17,400 QPS (3x average for peak)

Each user posts 0.5 times/day → writes = 50M/day
Write QPS = 50M / 86,400 ≈ 580 QPS

Storage per post = 1 KB text + 500 KB media avg = ~500 KB
Daily storage = 50M x 500 KB = 25 TB/day
Yearly storage = 25 TB x 365 ≈ 9 PB/year

Step 3: High-Level Architecture (5-10 minutes)

Sketch the main components. Start with this universal backend template:

                        ┌─────────────┐
                        │   Clients   │
                        │ (Web/Mobile)│
                        └──────┬──────┘
                               │
                        ┌──────▼──────┐
                        │     CDN     │
                        │ (Static/Img)│
                        └──────┬──────┘
                               │
                        ┌──────▼──────┐
                        │    Load     │
                        │  Balancer   │
                        └──────┬──────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
       ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
       │  API Server │ │  API Server │ │  API Server │
       └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
              │                │                │
              └────────────────┼────────────────┘
                      ┌────────┼────────┐
                      │        │        │
               ┌──────▼──┐ ┌──▼───┐ ┌──▼──────────┐
               │  Cache   │ │  DB  │ │ Message     │
               │ (Redis)  │ │(SQL) │ │ Queue       │
               └──────────┘ └──────┘ └─────────────┘

Step 4: Deep Dive (15-25 minutes)

Pick 2-3 components based on the interviewer's interest or the system's bottleneck. Go deep on data models, algorithms, failure handling, and scaling strategies.

Pro tip: Interviewers often ask "What would break first at 10x scale?" This is your cue to discuss the component under most pressure.

Building Block: Load Balancers

Load balancers distribute traffic across multiple servers to ensure no single server is overwhelmed.

L4 vs L7 Load Balancers

Feature	L4 (Transport)	L7 (Application)
Layer	TCP/UDP	HTTP/HTTPS
Speed	Faster (no payload inspection)	Slower (inspects headers/body)
Routing	IP + port only	URL path, headers, cookies
SSL	Pass-through or terminate	Always terminates
Use case	Raw throughput, TCP services	API routing, A/B testing
Examples	AWS NLB, HAProxy (TCP mode)	AWS ALB, Nginx, Envoy

Load Balancing Algorithms

Algorithm	How It Works	Best For
Round-robin	Rotate through servers sequentially	Equal-capacity servers
Weighted round-robin	Higher-capacity servers get more requests	Mixed server sizes
Least connections	Route to server with fewest active connections	Varied request durations
IP hash	Hash client IP to pick server	Session affinity without cookies
Consistent hashing	Hash-ring-based distribution	Caching layers, stateful services

Health Checks

Load balancers send periodic health checks (HTTP GET to /health or TCP connect) and remove unhealthy nodes from the pool. Typical configuration: check every 10 seconds, mark unhealthy after 3 consecutive failures, re-add after 2 consecutive successes.

Building Block: Caching Strategies

Caching is the single most impactful optimization in backend systems. You must know these four patterns and their trade-offs.

Caching Strategy Comparison

Strategy	Read Path	Write Path	Consistency	Use Case
Cache-aside	App checks cache → miss → read DB → populate cache	App writes to DB only	Eventual (stale reads possible)	General-purpose, read-heavy
Write-through	App reads from cache	App writes to cache and DB simultaneously	Strong (always up to date)	Read-heavy with consistency needs
Write-back	App reads from cache	App writes to cache only, async flush to DB	Eventual (risk of data loss)	Write-heavy, can tolerate loss
Write-around	App checks cache → miss → read DB → populate cache	App writes to DB only, invalidate cache	Eventual	Data rarely re-read after write

Cache-Aside Pattern (Most Common)

def get_user(user_id: str) -> dict:
    # Step 1: Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # Step 2: Cache miss — read from database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # Step 3: Populate cache with TTL
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))

    return user

Cache Eviction Policies

Policy	Mechanism	Best For
LRU (Least Recently Used)	Evict the item not accessed for the longest time	General-purpose — most popular choice
LFU (Least Frequently Used)	Evict the item accessed the fewest times	Workloads with stable hot keys
TTL (Time to Live)	Evict after a fixed expiration time	Data with known staleness tolerance

Building Block: CDN (Content Delivery Network)

CDNs cache static content (images, JS, CSS, videos) at edge locations close to users, reducing latency and offloading origin servers.

Model	How It Works	Best For
Push CDN	Origin pushes content to CDN nodes proactively	Small, rarely-changing content (logos, CSS bundles)
Pull CDN	CDN fetches from origin on first request, then caches	Large catalogs, user-generated content

Interview tip: Most real-world CDNs use pull model with TTL-based invalidation. Mention CloudFront, Cloudflare, or Akamai to show practical knowledge.

Building Block: Message Queues

Message queues decouple producers from consumers, enabling asynchronous processing, load leveling, and fault tolerance.

Point-to-Point vs Pub-Sub

Feature	Point-to-Point	Pub-Sub
Delivery	One consumer per message	All subscribers get every message
Use case	Task distribution, job queues	Event broadcasting, notifications
Example	SQS, Celery task queue	Kafka topics, SNS, Redis Pub/Sub

Message Queue Comparison

Feature	Kafka	RabbitMQ	SQS
Model	Distributed log (pub-sub)	Message broker (point-to-point + pub-sub)	Managed queue (point-to-point)
Throughput	Very high (millions/sec)	Moderate (tens of thousands/sec)	High (managed, auto-scales)
Ordering	Per-partition ordering	Per-queue ordering	Best-effort (FIFO available)
Retention	Configurable (days/weeks)	Until consumed	14 days max
Replay	Yes (consumer offsets)	No (once consumed, gone)	No
Best for	Event streaming, log aggregation	Task queues, RPC	Simple async jobs, serverless

Building Block: Consistent Hashing

When distributing data across multiple nodes (cache servers, DB shards), naive modular hashing (hash(key) % N) breaks when nodes are added or removed — almost all keys remap. Consistent hashing solves this.

How It Works

1. Map the hash space to a ring (0 to 2^32 - 1)
2. Place each server at a position on the ring: hash(server_id)
3. For each key, hash it and walk clockwise to the nearest server

        Server A
           |
    ───────●───────
   /                \
  ●  Key X           ●  Server C
   \   (goes to A)  /
    ───────●───────
           |
        Server B

Virtual Nodes

Real servers are uneven in capacity and hash positions can cluster. Virtual nodes fix this by placing each physical server at multiple positions on the ring (e.g., 150-200 virtual nodes per server). This ensures even distribution and smooth rebalancing.

When a node is added: Only keys between the new node and its predecessor on the ring need to move — roughly K / N keys instead of all keys (where K = total keys, N = total nodes).

When a node is removed: Only its keys move to the next node clockwise on the ring.

Used by: DynamoDB (partition routing), Cassandra (token ring), Memcached client-side sharding, Akamai CDN.

Next: We will walk through four classic backend design problems step by step — URL shortener, rate limiter, notification service, and chat system. :::