Backend System Design

System Design Framework & Building Blocks

5 min read

System design interviews test your ability to architect large-scale backend systems under ambiguity. The key is not memorizing solutions but demonstrating a structured approach to breaking down problems and making trade-off decisions. This lesson covers the 4-step framework interviewers expect and the core building blocks you will use in every design.

The 4-Step System Design Framework

Every system design interview should follow this structure. Interviewers are evaluating your process as much as your solution.

Step 1: Requirements Clarification (3-5 minutes)

Never start designing immediately. Ask targeted questions to narrow scope.

Functional requirements define what the system does:

  • Who are the users? How many?
  • What are the core features? (List 3-5, then prioritize)
  • What are the input/output formats?

Non-functional requirements define how well the system performs:

Property Question to Ask Typical Target
Latency What is acceptable response time? p99 < 200ms for reads
Throughput How many requests per second? Derived from DAU
Availability What uptime is required? 99.9% = 8.7 hours downtime/year
Consistency Is eventual consistency acceptable? Depends on domain
Durability Can we lose data? 99.999999999% for storage

Step 2: Back-of-Envelope Estimation (3-5 minutes)

Demonstrate that you can reason about scale. Use simple formulas:

QPS (queries per second) = DAU x requests_per_user / 86,400

Storage per year = daily_new_records x record_size x 365

Bandwidth = QPS x average_response_size

Example: Social media feed service

DAU = 100M users
Each user loads feed 5 times/day → reads = 500M/day
Read QPS = 500M / 86,400 ≈ 5,800 QPS
Peak QPS = 5,800 x 3 ≈ 17,400 QPS (3x average for peak)

Each user posts 0.5 times/day → writes = 50M/day
Write QPS = 50M / 86,400 ≈ 580 QPS

Storage per post = 1 KB text + 500 KB media avg = ~500 KB
Daily storage = 50M x 500 KB = 25 TB/day
Yearly storage = 25 TB x 365 ≈ 9 PB/year

Step 3: High-Level Architecture (5-10 minutes)

Sketch the main components. Start with this universal backend template:

                        ┌─────────────┐
                        │   Clients   │
                        │ (Web/Mobile)│
                        └──────┬──────┘
                        ┌──────▼──────┐
                        │     CDN     │
                        │ (Static/Img)│
                        └──────┬──────┘
                        ┌──────▼──────┐
                        │    Load     │
                        │  Balancer   │
                        └──────┬──────┘
              ┌────────────────┼────────────────┐
              │                │                │
       ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
       │  API Server │ │  API Server │ │  API Server │
       └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
              │                │                │
              └────────────────┼────────────────┘
                      ┌────────┼────────┐
                      │        │        │
               ┌──────▼──┐ ┌──▼───┐ ┌──▼──────────┐
               │  Cache   │ │  DB  │ │ Message     │
               │ (Redis)  │ │(SQL) │ │ Queue       │
               └──────────┘ └──────┘ └─────────────┘

Step 4: Deep Dive (15-25 minutes)

Pick 2-3 components based on the interviewer's interest or the system's bottleneck. Go deep on data models, algorithms, failure handling, and scaling strategies.

Pro tip: Interviewers often ask "What would break first at 10x scale?" This is your cue to discuss the component under most pressure.

Building Block: Load Balancers

Load balancers distribute traffic across multiple servers to ensure no single server is overwhelmed.

L4 vs L7 Load Balancers

Feature L4 (Transport) L7 (Application)
Layer TCP/UDP HTTP/HTTPS
Speed Faster (no payload inspection) Slower (inspects headers/body)
Routing IP + port only URL path, headers, cookies
SSL Pass-through or terminate Always terminates
Use case Raw throughput, TCP services API routing, A/B testing
Examples AWS NLB, HAProxy (TCP mode) AWS ALB, Nginx, Envoy

Load Balancing Algorithms

Algorithm How It Works Best For
Round-robin Rotate through servers sequentially Equal-capacity servers
Weighted round-robin Higher-capacity servers get more requests Mixed server sizes
Least connections Route to server with fewest active connections Varied request durations
IP hash Hash client IP to pick server Session affinity without cookies
Consistent hashing Hash-ring-based distribution Caching layers, stateful services

Health Checks

Load balancers send periodic health checks (HTTP GET to /health or TCP connect) and remove unhealthy nodes from the pool. Typical configuration: check every 10 seconds, mark unhealthy after 3 consecutive failures, re-add after 2 consecutive successes.

Building Block: Caching Strategies

Caching is the single most impactful optimization in backend systems. You must know these four patterns and their trade-offs.

Caching Strategy Comparison

Strategy Read Path Write Path Consistency Use Case
Cache-aside App checks cache → miss → read DB → populate cache App writes to DB only Eventual (stale reads possible) General-purpose, read-heavy
Write-through App reads from cache App writes to cache and DB simultaneously Strong (always up to date) Read-heavy with consistency needs
Write-back App reads from cache App writes to cache only, async flush to DB Eventual (risk of data loss) Write-heavy, can tolerate loss
Write-around App checks cache → miss → read DB → populate cache App writes to DB only, invalidate cache Eventual Data rarely re-read after write

Cache-Aside Pattern (Most Common)

def get_user(user_id: str) -> dict:
    # Step 1: Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # Step 2: Cache miss — read from database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # Step 3: Populate cache with TTL
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))

    return user

Cache Eviction Policies

Policy Mechanism Best For
LRU (Least Recently Used) Evict the item not accessed for the longest time General-purpose — most popular choice
LFU (Least Frequently Used) Evict the item accessed the fewest times Workloads with stable hot keys
TTL (Time to Live) Evict after a fixed expiration time Data with known staleness tolerance

Building Block: CDN (Content Delivery Network)

CDNs cache static content (images, JS, CSS, videos) at edge locations close to users, reducing latency and offloading origin servers.

Model How It Works Best For
Push CDN Origin pushes content to CDN nodes proactively Small, rarely-changing content (logos, CSS bundles)
Pull CDN CDN fetches from origin on first request, then caches Large catalogs, user-generated content

Interview tip: Most real-world CDNs use pull model with TTL-based invalidation. Mention CloudFront, Cloudflare, or Akamai to show practical knowledge.

Building Block: Message Queues

Message queues decouple producers from consumers, enabling asynchronous processing, load leveling, and fault tolerance.

Point-to-Point vs Pub-Sub

Feature Point-to-Point Pub-Sub
Delivery One consumer per message All subscribers get every message
Use case Task distribution, job queues Event broadcasting, notifications
Example SQS, Celery task queue Kafka topics, SNS, Redis Pub/Sub

Message Queue Comparison

Feature Kafka RabbitMQ SQS
Model Distributed log (pub-sub) Message broker (point-to-point + pub-sub) Managed queue (point-to-point)
Throughput Very high (millions/sec) Moderate (tens of thousands/sec) High (managed, auto-scales)
Ordering Per-partition ordering Per-queue ordering Best-effort (FIFO available)
Retention Configurable (days/weeks) Until consumed 14 days max
Replay Yes (consumer offsets) No (once consumed, gone) No
Best for Event streaming, log aggregation Task queues, RPC Simple async jobs, serverless

Building Block: Consistent Hashing

When distributing data across multiple nodes (cache servers, DB shards), naive modular hashing (hash(key) % N) breaks when nodes are added or removed — almost all keys remap. Consistent hashing solves this.

How It Works

1. Map the hash space to a ring (0 to 2^32 - 1)
2. Place each server at a position on the ring: hash(server_id)
3. For each key, hash it and walk clockwise to the nearest server

        Server A
           |
    ───────●───────
   /                \
  ●  Key X           ●  Server C
   \   (goes to A)  /
    ───────●───────
           |
        Server B

Virtual Nodes

Real servers are uneven in capacity and hash positions can cluster. Virtual nodes fix this by placing each physical server at multiple positions on the ring (e.g., 150-200 virtual nodes per server). This ensures even distribution and smooth rebalancing.

When a node is added: Only keys between the new node and its predecessor on the ring need to move — roughly K / N keys instead of all keys (where K = total keys, N = total nodes).

When a node is removed: Only its keys move to the next node clockwise on the ring.

Used by: DynamoDB (partition routing), Cassandra (token ring), Memcached client-side sharding, Akamai CDN.

Next: We will walk through four classic backend design problems step by step — URL shortener, rate limiter, notification service, and chat system. :::

Quiz

Module 4 Quiz: Backend System Design

Take Quiz