Backend System Design

System Design Framework & Building Blocks

5 min read

System design interviews test your ability to architect large-scale backend systems under ambiguity. The key is not memorizing solutions but demonstrating a structured approach to breaking down problems and making trade-off decisions. This lesson covers the 4-step framework interviewers expect and the core building blocks you will use in every design.

The 4-Step System Design Framework

Every system design interview should follow this structure. Interviewers are evaluating your process as much as your solution.

Step 1: Requirements Clarification (3-5 minutes)

Never start designing immediately. Ask targeted questions to narrow scope.

Functional requirements define what the system does:

  • Who are the users? How many?
  • What are the core features? (List 3-5, then prioritize)
  • What are the input/output formats?

Non-functional requirements define how well the system performs:

PropertyQuestion to AskTypical Target
LatencyWhat is acceptable response time?p99 < 200ms for reads
ThroughputHow many requests per second?Derived from DAU
AvailabilityWhat uptime is required?99.9% = 8.7 hours downtime/year
ConsistencyIs eventual consistency acceptable?Depends on domain
DurabilityCan we lose data?99.999999999% for storage

Step 2: Back-of-Envelope Estimation (3-5 minutes)

Demonstrate that you can reason about scale. Use simple formulas:

QPS (queries per second) = DAU x requests_per_user / 86,400

Storage per year = daily_new_records x record_size x 365

Bandwidth = QPS x average_response_size

Example: Social media feed service

DAU = 100M users
Each user loads feed 5 times/day → reads = 500M/day
Read QPS = 500M / 86,400 ≈ 5,800 QPS
Peak QPS = 5,800 x 3 ≈ 17,400 QPS (3x average for peak)

Each user posts 0.5 times/day → writes = 50M/day
Write QPS = 50M / 86,400 ≈ 580 QPS

Storage per post = 1 KB text + 500 KB media avg = ~500 KB
Daily storage = 50M x 500 KB = 25 TB/day
Yearly storage = 25 TB x 365 ≈ 9 PB/year

Step 3: High-Level Architecture (5-10 minutes)

Sketch the main components. Start with this universal backend template:

                        ┌─────────────┐
                        │   Clients   │
                        │ (Web/Mobile)│
                        └──────┬──────┘
                        ┌──────▼──────┐
                        │     CDN     │
                        │ (Static/Img)│
                        └──────┬──────┘
                        ┌──────▼──────┐
                        │    Load     │
                        │  Balancer   │
                        └──────┬──────┘
              ┌────────────────┼────────────────┐
              │                │                │
       ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
       │  API Server │ │  API Server │ │  API Server │
       └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
              │                │                │
              └────────────────┼────────────────┘
                      ┌────────┼────────┐
                      │        │        │
               ┌──────▼──┐ ┌──▼───┐ ┌──▼──────────┐
               │  Cache   │ │  DB  │ │ Message     │
               │ (Redis)  │ │(SQL) │ │ Queue       │
               └──────────┘ └──────┘ └─────────────┘

Step 4: Deep Dive (15-25 minutes)

Pick 2-3 components based on the interviewer's interest or the system's bottleneck. Go deep on data models, algorithms, failure handling, and scaling strategies.

Pro tip: Interviewers often ask "What would break first at 10x scale?" This is your cue to discuss the component under most pressure.

Building Block: Load Balancers

Load balancers distribute traffic across multiple servers to ensure no single server is overwhelmed.

L4 vs L7 Load Balancers

FeatureL4 (Transport)L7 (Application)
LayerTCP/UDPHTTP/HTTPS
SpeedFaster (no payload inspection)Slower (inspects headers/body)
RoutingIP + port onlyURL path, headers, cookies
SSLPass-through or terminateAlways terminates
Use caseRaw throughput, TCP servicesAPI routing, A/B testing
ExamplesAWS NLB, HAProxy (TCP mode)AWS ALB, Nginx, Envoy

Load Balancing Algorithms

AlgorithmHow It WorksBest For
Round-robinRotate through servers sequentiallyEqual-capacity servers
Weighted round-robinHigher-capacity servers get more requestsMixed server sizes
Least connectionsRoute to server with fewest active connectionsVaried request durations
IP hashHash client IP to pick serverSession affinity without cookies
Consistent hashingHash-ring-based distributionCaching layers, stateful services

Health Checks

Load balancers send periodic health checks (HTTP GET to /health or TCP connect) and remove unhealthy nodes from the pool. Typical configuration: check every 10 seconds, mark unhealthy after 3 consecutive failures, re-add after 2 consecutive successes.

Building Block: Caching Strategies

Caching is the single most impactful optimization in backend systems. You must know these four patterns and their trade-offs.

Caching Strategy Comparison

StrategyRead PathWrite PathConsistencyUse Case
Cache-asideApp checks cache → miss → read DB → populate cacheApp writes to DB onlyEventual (stale reads possible)General-purpose, read-heavy
Write-throughApp reads from cacheApp writes to cache and DB simultaneouslyStrong (always up to date)Read-heavy with consistency needs
Write-backApp reads from cacheApp writes to cache only, async flush to DBEventual (risk of data loss)Write-heavy, can tolerate loss
Write-aroundApp checks cache → miss → read DB → populate cacheApp writes to DB only, invalidate cacheEventualData rarely re-read after write

Cache-Aside Pattern (Most Common)

def get_user(user_id: str) -> dict:
    # Step 1: Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # Step 2: Cache miss — read from database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # Step 3: Populate cache with TTL
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))

    return user

Cache Eviction Policies

PolicyMechanismBest For
LRU (Least Recently Used)Evict the item not accessed for the longest timeGeneral-purpose — most popular choice
LFU (Least Frequently Used)Evict the item accessed the fewest timesWorkloads with stable hot keys
TTL (Time to Live)Evict after a fixed expiration timeData with known staleness tolerance

Building Block: CDN (Content Delivery Network)

CDNs cache static content (images, JS, CSS, videos) at edge locations close to users, reducing latency and offloading origin servers.

ModelHow It WorksBest For
Push CDNOrigin pushes content to CDN nodes proactivelySmall, rarely-changing content (logos, CSS bundles)
Pull CDNCDN fetches from origin on first request, then cachesLarge catalogs, user-generated content

Interview tip: Most real-world CDNs use pull model with TTL-based invalidation. Mention CloudFront, Cloudflare, or Akamai to show practical knowledge.

Building Block: Message Queues

Message queues decouple producers from consumers, enabling asynchronous processing, load leveling, and fault tolerance.

Point-to-Point vs Pub-Sub

FeaturePoint-to-PointPub-Sub
DeliveryOne consumer per messageAll subscribers get every message
Use caseTask distribution, job queuesEvent broadcasting, notifications
ExampleSQS, Celery task queueKafka topics, SNS, Redis Pub/Sub

Message Queue Comparison

FeatureKafkaRabbitMQSQS
ModelDistributed log (pub-sub)Message broker (point-to-point + pub-sub)Managed queue (point-to-point)
ThroughputVery high (millions/sec)Moderate (tens of thousands/sec)High (managed, auto-scales)
OrderingPer-partition orderingPer-queue orderingBest-effort (FIFO available)
RetentionConfigurable (days/weeks)Until consumed14 days max
ReplayYes (consumer offsets)No (once consumed, gone)No
Best forEvent streaming, log aggregationTask queues, RPCSimple async jobs, serverless

Building Block: Consistent Hashing

When distributing data across multiple nodes (cache servers, DB shards), naive modular hashing (hash(key) % N) breaks when nodes are added or removed — almost all keys remap. Consistent hashing solves this.

How It Works

1. Map the hash space to a ring (0 to 2^32 - 1)
2. Place each server at a position on the ring: hash(server_id)
3. For each key, hash it and walk clockwise to the nearest server

        Server A
           |
    ───────●───────
   /                \
  ●  Key X           ●  Server C
   \   (goes to A)  /
    ───────●───────
           |
        Server B

Virtual Nodes

Real servers are uneven in capacity and hash positions can cluster. Virtual nodes fix this by placing each physical server at multiple positions on the ring (e.g., 150-200 virtual nodes per server). This ensures even distribution and smooth rebalancing.

When a node is added: Only keys between the new node and its predecessor on the ring need to move — roughly K / N keys instead of all keys (where K = total keys, N = total nodes).

When a node is removed: Only its keys move to the next node clockwise on the ring.

Used by: DynamoDB (partition routing), Cassandra (token ring), Memcached client-side sharding, Akamai CDN.

Next: We will walk through four classic backend design problems step by step — URL shortener, rate limiter, notification service, and chat system. :::

Quiz

Module 4 Quiz: Backend System Design

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.