Backend System Design

Classic Backend Design Problems

5 min read

This lesson walks through four systems that appear repeatedly in backend interviews. For each, we follow the 4-step framework: requirements, estimation, high-level design, and deep dive into the most interesting components.

1. URL Shortener (TinyURL)

This is the most common system design question for mid-level backend engineers. It tests database design, encoding, caching, and analytics.

Requirements

Functional: Generate short URL from long URL, redirect short URL to original, optional custom aliases, link analytics (clicks, geo, referrer).

Non-functional: Low redirect latency (< 50ms), high availability (99.99%), short URLs should not be guessable.

Estimation

Write: 100M new URLs/month ≈ 40 URLs/sec
Read: 10:1 read:write ratio → 400 redirects/sec
Peak: 400 x 3 = 1,200 redirects/sec

Storage per URL: ~500 bytes (short code + long URL + metadata)
5-year storage: 100M x 12 x 5 x 500B = 3 TB

Cache: Top 20% of URLs handle 80% of traffic (Pareto)
Cache size: 400 req/sec x 86,400 sec x 500B x 0.2 ≈ 3.5 GB/day

High-Level Design

  Client                                             Analytics
    │                                                Consumer
    ▼                                                   ▲
┌────────┐    ┌──────────┐    ┌──────────┐    ┌────────┴─────┐
│  Load  │───▶│   API    │───▶│  Redis   │    │   Kafka      │
│Balancer│    │  Server  │    │  Cache   │    │  (click logs)│
└────────┘    └────┬─────┘    └──────────┘    └──────────────┘
                   │                ▲
                   ▼                │ cache miss
              ┌──────────┐         │
              │    DB    │─────────┘
              │(Postgres)│
              └──────────┘

Deep Dive: Base62 Encoding

Base62 uses characters a-z, A-Z, 0-9 (62 characters). A 7-character code gives:

62^7 = 3,521,614,606,208 ≈ 3.5 trillion unique URLs

Approach 1: Hash and truncate

import hashlib

def shorten(long_url: str) -> str:
    # MD5 produces 128-bit hash → take first 43 bits for 7 Base62 chars
    hash_hex = hashlib.md5(long_url.encode()).hexdigest()
    hash_int = int(hash_hex, 16)

    chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
    code = ""
    for _ in range(7):
        code += chars[hash_int % 62]
        hash_int //= 62

    return code

Approach 2: Pre-generated key service (preferred at scale)

A separate Key Generation Service (KGS) pre-generates millions of unique 7-character codes and stores them in a database. When the API server needs a new code, it takes one from the pool atomically. This eliminates collision checking entirely.

def redirect(short_code: str) -> str:
    # Step 1: Check Redis cache (sub-millisecond)
    long_url = redis.get(f"url:{short_code}")
    if long_url:
        # Fire-and-forget analytics event
        kafka.send("clicks", {"code": short_code, "timestamp": now()})
        return long_url

    # Step 2: Cache miss — query database
    long_url = db.query(
        "SELECT long_url FROM urls WHERE short_code = %s", short_code
    )
    if not long_url:
        raise NotFoundError("Short URL not found")

    # Step 3: Populate cache with 24-hour TTL
    redis.setex(f"url:{short_code}", 86400, long_url)

    kafka.send("clicks", {"code": short_code, "timestamp": now()})
    return long_url

2. Rate Limiter

Rate limiters protect APIs from abuse and ensure fair usage. This question tests algorithmic thinking and distributed systems knowledge.

Requirements

Functional: Limit requests per API key/user/IP, return HTTP 429 when limit exceeded, configurable rules per endpoint.

Non-functional: Low latency overhead (< 1ms per check), highly available, eventually consistent across distributed nodes is acceptable.

Token Bucket Algorithm

The most common rate limiting algorithm. Each user has a bucket that fills with tokens at a fixed rate and drains one token per request.

Bucket capacity: 10 tokens (burst limit)
Refill rate: 1 token/second (sustained rate)

Timeline:
  t=0:  bucket=10  → request arrives → bucket=9  ✓
  t=0:  bucket=9   → request arrives → bucket=8  ✓
  ...
  t=0:  bucket=1   → request arrives → bucket=0  ✓
  t=0:  bucket=0   → request arrives → REJECT    ✗ (429)
  t=1:  bucket=1   → (1 token refilled)

Implementation with Redis (atomic operation):

import time
import redis

def is_allowed(user_id: str, capacity: int, refill_rate: float) -> bool:
    """Token bucket rate limiter using Redis."""
    key = f"ratelimit:{user_id}"
    now = time.time()

    # Lua script for atomicity — runs entirely on Redis server
    lua_script = """
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    -- Refill tokens based on elapsed time
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    if tokens >= 1 then
        tokens = tokens - 1
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)
        return 1  -- allowed
    else
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)
        return 0  -- rejected
    end
    """
    result = redis.eval(lua_script, 1, key, capacity, refill_rate, now)
    return result == 1

Algorithm Comparison

Algorithm Accuracy Memory Burst Handling Complexity
Token bucket Good Low (2 values per user) Allows controlled bursts Simple
Sliding window log Exact High (stores every timestamp) No bursts Moderate
Sliding window counter Approximate Low (2 counters per user) Weighted approximation Simple
Fixed window counter Approximate Very low (1 counter) Double burst at window edge Simplest

Distributed Rate Limiting

For multi-region deployments, each region keeps a local Redis counter and periodically syncs to a global count. Accept small over-limit inaccuracy (e.g., allow 105 requests instead of exactly 100) in exchange for lower cross-region latency.


3. Notification Service

This tests your ability to handle multi-channel delivery, async processing, and reliability at scale.

Requirements

Functional: Send push notifications, email, SMS, and in-app messages. Support templates, scheduling, user preferences, priority levels.

Non-functional: At-least-once delivery, < 30 second delivery for urgent notifications, handle 1M+ notifications/minute.

High-Level Design

┌───────────┐     ┌──────────────┐     ┌───────────────────┐
│  API /    │────▶│  Validation  │────▶│  Priority Router  │
│ Triggers  │     │  & Template  │     │                   │
└───────────┘     └──────────────┘     └─────┬────┬────┬───┘
                                             │    │    │
                              ┌──────────────┘    │    └──────────────┐
                              ▼                   ▼                   ▼
                     ┌──────────────┐   ┌──────────────┐   ┌──────────────┐
                     │ Urgent Queue │   │ Normal Queue │   │ Batch Queue  │
                     │  (< 30 sec) │   │  (< 5 min)  │   │  (< 1 hour) │
                     └──────┬───────┘   └──────┬───────┘   └──────┬───────┘
                            │                  │                   │
                            ▼                  ▼                   ▼
                     ┌─────────────────────────────────────────────────┐
                     │            Channel Workers                      │
                     │  ┌──────┐  ┌───────┐  ┌─────┐  ┌──────────┐  │
                     │  │ Push │  │ Email │  │ SMS │  │ In-App   │  │
                     │  │(APNs/│  │(SES/  │  │(Twi-│  │(WebSocket│  │
                     │  │ FCM) │  │Mailgun│  │lio) │  │/ SSE)    │  │
                     │  └──────┘  └───────┘  └─────┘  └──────────┘  │
                     └─────────────────────────────────────────────────┘
                               ┌──────────────┐
                               │  Delivery    │
                               │  Tracker DB  │
                               └──────────────┘

Deep Dive: Retry with Exponential Backoff

import time
import random

def send_with_retry(channel: str, payload: dict, max_retries: int = 5):
    """Send notification with exponential backoff and jitter."""
    for attempt in range(max_retries):
        try:
            response = channel_clients[channel].send(payload)
            # Record successful delivery
            db.update_delivery_status(payload["id"], "delivered")
            return response
        except TemporaryError:
            if attempt == max_retries - 1:
                # Move to dead letter queue after final failure
                dead_letter_queue.send(payload)
                db.update_delivery_status(payload["id"], "failed")
                raise

            # Exponential backoff: 1s, 2s, 4s, 8s, 16s
            base_delay = 2 ** attempt
            # Add jitter to prevent thundering herd
            jitter = random.uniform(0, base_delay * 0.5)
            time.sleep(base_delay + jitter)

Deep Dive: Deduplication

Prevent the same notification from being sent twice using an idempotency key:

def process_notification(notification: dict):
    idempotency_key = f"notif:{notification['user_id']}:{notification['template_id']}:{notification['dedup_window']}"

    # Check if already sent (Redis SET with NX and TTL)
    already_sent = redis.set(idempotency_key, "1", nx=True, ex=3600)
    if not already_sent:
        return  # Duplicate — skip silently

    # Proceed with delivery
    route_to_channels(notification)

4. Chat System (1-on-1 and Group)

This tests WebSocket management, message ordering, presence, and storage trade-offs.

Requirements

Functional: 1-on-1 and group messaging, online/offline presence, read receipts, message history, media sharing.

Non-functional: Message delivery < 200ms for online users, 100% delivery (no lost messages), message ordering within a conversation.

High-Level Design

┌────────┐   WebSocket   ┌──────────────┐    ┌──────────────┐
│Client A│───────────────▶│   Gateway    │───▶│  Chat Service│
└────────┘               │  (WS Mgmt)   │    │              │
                         └──────┬───────┘    └──────┬───────┘
┌────────┐   WebSocket          │                   │
│Client B│──────────────────────┘                   │
└────────┘                                          │
                              ┌─────────────────────┤
                              │                     │
                              ▼                     ▼
                     ┌──────────────┐      ┌──────────────┐
                     │   Presence   │      │   Message    │
                     │   Service   │      │   Store      │
                     │  (Redis)    │      │              │
                     └──────────────┘      │ Recent:Redis │
                                           │ History:     │
                                           │ Cassandra    │
                                           └──────────────┘

Deep Dive: WebSocket Connection Management

# Connection registry (in Redis for distributed deployment)
class ConnectionManager:
    def __init__(self):
        self.redis = redis.Redis()

    def register(self, user_id: str, server_id: str, ws_id: str):
        """Register a WebSocket connection for a user."""
        self.redis.hset(f"connections:{user_id}", ws_id, server_id)
        self.redis.expire(f"connections:{user_id}", 300)  # 5-min TTL

    def heartbeat(self, user_id: str, ws_id: str):
        """Refresh connection TTL on heartbeat (every 30 seconds)."""
        self.redis.expire(f"connections:{user_id}", 300)

    def get_servers(self, user_id: str) -> list:
        """Find which servers hold connections for a user."""
        return self.redis.hvals(f"connections:{user_id}")

    def unregister(self, user_id: str, ws_id: str):
        """Remove connection on disconnect."""
        self.redis.hdel(f"connections:{user_id}", ws_id)

Deep Dive: Message Ordering

Each conversation maintains a monotonically increasing sequence number. The server assigns the sequence, not the client, to prevent conflicts.

def send_message(conversation_id: str, sender_id: str, content: str):
    # Atomic increment ensures unique, ordered sequence per conversation
    seq = redis.incr(f"seq:{conversation_id}")

    message = {
        "id": generate_uuid(),
        "conversation_id": conversation_id,
        "sender_id": sender_id,
        "content": content,
        "sequence": seq,
        "timestamp": datetime.utcnow().isoformat(),
        "status": "sent"
    }

    # Write to recent messages cache (Redis sorted set, scored by sequence)
    redis.zadd(
        f"messages:{conversation_id}",
        {json.dumps(message): seq}
    )

    # Async write to persistent storage (Cassandra)
    kafka.send("message-persist", message)

    # Deliver to all online participants
    deliver_to_recipients(conversation_id, sender_id, message)

    return message

Deep Dive: Presence Service

Online/Offline Detection:

1. Client connects   → presence = "online"
2. Heartbeat every 30s → refresh "online" TTL
3. Missed 2 heartbeats (60s) → presence = "away"
4. Missed 3 heartbeats (90s) → presence = "offline"
5. Explicit disconnect → presence = "offline" immediately

Group Chat: Fan-Out Strategies

Strategy How It Works Pros Cons
Fan-out on write When user sends message, write to every recipient's inbox Fast reads (inbox pre-built) Expensive writes for large groups
Fan-out on read Store message once, recipients query on load Cheap writes Slower reads, complex queries
Hybrid Fan-out on write for small groups (< 100), fan-out on read for large groups/channels Balanced More complex logic

Interview tip: Most real-world chat systems (WhatsApp, Slack) use the hybrid approach. Small group chats use fan-out on write for instant delivery. Channels with thousands of members use fan-out on read to avoid write amplification.

Next: We will cover scaling patterns, database sharding, search infrastructure, and data pipeline design for handling growth. :::

Quiz

Module 4 Quiz: Backend System Design

Take Quiz