Backend System Design
Classic Backend Design Problems
This lesson walks through four systems that appear repeatedly in backend interviews. For each, we follow the 4-step framework: requirements, estimation, high-level design, and deep dive into the most interesting components.
1. URL Shortener (TinyURL)
This is the most common system design question for mid-level backend engineers. It tests database design, encoding, caching, and analytics.
Requirements
Functional: Generate short URL from long URL, redirect short URL to original, optional custom aliases, link analytics (clicks, geo, referrer).
Non-functional: Low redirect latency (< 50ms), high availability (99.99%), short URLs should not be guessable.
Estimation
Write: 100M new URLs/month ≈ 40 URLs/sec
Read: 10:1 read:write ratio → 400 redirects/sec
Peak: 400 x 3 = 1,200 redirects/sec
Storage per URL: ~500 bytes (short code + long URL + metadata)
5-year storage: 100M x 12 x 5 x 500B = 3 TB
Cache: Top 20% of URLs handle 80% of traffic (Pareto)
Cache size: 400 req/sec x 86,400 sec x 500B x 0.2 ≈ 3.5 GB/day
High-Level Design
Client Analytics
│ Consumer
▼ ▲
┌────────┐ ┌──────────┐ ┌──────────┐ ┌────────┴─────┐
│ Load │───▶│ API │───▶│ Redis │ │ Kafka │
│Balancer│ │ Server │ │ Cache │ │ (click logs)│
└────────┘ └────┬─────┘ └──────────┘ └──────────────┘
│ ▲
▼ │ cache miss
┌──────────┐ │
│ DB │─────────┘
│(Postgres)│
└──────────┘
Deep Dive: Base62 Encoding
Base62 uses characters a-z, A-Z, 0-9 (62 characters). A 7-character code gives:
62^7 = 3,521,614,606,208 ≈ 3.5 trillion unique URLs
Approach 1: Hash and truncate
import hashlib
def shorten(long_url: str) -> str:
# MD5 produces 128-bit hash → take first 43 bits for 7 Base62 chars
hash_hex = hashlib.md5(long_url.encode()).hexdigest()
hash_int = int(hash_hex, 16)
chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
code = ""
for _ in range(7):
code += chars[hash_int % 62]
hash_int //= 62
return code
Approach 2: Pre-generated key service (preferred at scale)
A separate Key Generation Service (KGS) pre-generates millions of unique 7-character codes and stores them in a database. When the API server needs a new code, it takes one from the pool atomically. This eliminates collision checking entirely.
Deep Dive: Caching Popular URLs
def redirect(short_code: str) -> str:
# Step 1: Check Redis cache (sub-millisecond)
long_url = redis.get(f"url:{short_code}")
if long_url:
# Fire-and-forget analytics event
kafka.send("clicks", {"code": short_code, "timestamp": now()})
return long_url
# Step 2: Cache miss — query database
long_url = db.query(
"SELECT long_url FROM urls WHERE short_code = %s", short_code
)
if not long_url:
raise NotFoundError("Short URL not found")
# Step 3: Populate cache with 24-hour TTL
redis.setex(f"url:{short_code}", 86400, long_url)
kafka.send("clicks", {"code": short_code, "timestamp": now()})
return long_url
2. Rate Limiter
Rate limiters protect APIs from abuse and ensure fair usage. This question tests algorithmic thinking and distributed systems knowledge.
Requirements
Functional: Limit requests per API key/user/IP, return HTTP 429 when limit exceeded, configurable rules per endpoint.
Non-functional: Low latency overhead (< 1ms per check), highly available, eventually consistent across distributed nodes is acceptable.
Token Bucket Algorithm
The most common rate limiting algorithm. Each user has a bucket that fills with tokens at a fixed rate and drains one token per request.
Bucket capacity: 10 tokens (burst limit)
Refill rate: 1 token/second (sustained rate)
Timeline:
t=0: bucket=10 → request arrives → bucket=9 ✓
t=0: bucket=9 → request arrives → bucket=8 ✓
...
t=0: bucket=1 → request arrives → bucket=0 ✓
t=0: bucket=0 → request arrives → REJECT ✗ (429)
t=1: bucket=1 → (1 token refilled)
Implementation with Redis (atomic operation):
import time
import redis
def is_allowed(user_id: str, capacity: int, refill_rate: float) -> bool:
"""Token bucket rate limiter using Redis."""
key = f"ratelimit:{user_id}"
now = time.time()
# Lua script for atomicity — runs entirely on Redis server
lua_script = """
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- Refill tokens based on elapsed time
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)
if tokens >= 1 then
tokens = tokens - 1
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)
return 1 -- allowed
else
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) * 2)
return 0 -- rejected
end
"""
result = redis.eval(lua_script, 1, key, capacity, refill_rate, now)
return result == 1
Algorithm Comparison
| Algorithm | Accuracy | Memory | Burst Handling | Complexity |
|---|---|---|---|---|
| Token bucket | Good | Low (2 values per user) | Allows controlled bursts | Simple |
| Sliding window log | Exact | High (stores every timestamp) | No bursts | Moderate |
| Sliding window counter | Approximate | Low (2 counters per user) | Weighted approximation | Simple |
| Fixed window counter | Approximate | Very low (1 counter) | Double burst at window edge | Simplest |
Distributed Rate Limiting
For multi-region deployments, each region keeps a local Redis counter and periodically syncs to a global count. Accept small over-limit inaccuracy (e.g., allow 105 requests instead of exactly 100) in exchange for lower cross-region latency.
3. Notification Service
This tests your ability to handle multi-channel delivery, async processing, and reliability at scale.
Requirements
Functional: Send push notifications, email, SMS, and in-app messages. Support templates, scheduling, user preferences, priority levels.
Non-functional: At-least-once delivery, < 30 second delivery for urgent notifications, handle 1M+ notifications/minute.
High-Level Design
┌───────────┐ ┌──────────────┐ ┌───────────────────┐
│ API / │────▶│ Validation │────▶│ Priority Router │
│ Triggers │ │ & Template │ │ │
└───────────┘ └──────────────┘ └─────┬────┬────┬───┘
│ │ │
┌──────────────┘ │ └──────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Urgent Queue │ │ Normal Queue │ │ Batch Queue │
│ (< 30 sec) │ │ (< 5 min) │ │ (< 1 hour) │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────┐
│ Channel Workers │
│ ┌──────┐ ┌───────┐ ┌─────┐ ┌──────────┐ │
│ │ Push │ │ Email │ │ SMS │ │ In-App │ │
│ │(APNs/│ │(SES/ │ │(Twi-│ │(WebSocket│ │
│ │ FCM) │ │Mailgun│ │lio) │ │/ SSE) │ │
│ └──────┘ └───────┘ └─────┘ └──────────┘ │
└─────────────────────────────────────────────────┘
│
▼
┌──────────────┐
│ Delivery │
│ Tracker DB │
└──────────────┘
Deep Dive: Retry with Exponential Backoff
import time
import random
def send_with_retry(channel: str, payload: dict, max_retries: int = 5):
"""Send notification with exponential backoff and jitter."""
for attempt in range(max_retries):
try:
response = channel_clients[channel].send(payload)
# Record successful delivery
db.update_delivery_status(payload["id"], "delivered")
return response
except TemporaryError:
if attempt == max_retries - 1:
# Move to dead letter queue after final failure
dead_letter_queue.send(payload)
db.update_delivery_status(payload["id"], "failed")
raise
# Exponential backoff: 1s, 2s, 4s, 8s, 16s
base_delay = 2 ** attempt
# Add jitter to prevent thundering herd
jitter = random.uniform(0, base_delay * 0.5)
time.sleep(base_delay + jitter)
Deep Dive: Deduplication
Prevent the same notification from being sent twice using an idempotency key:
def process_notification(notification: dict):
idempotency_key = f"notif:{notification['user_id']}:{notification['template_id']}:{notification['dedup_window']}"
# Check if already sent (Redis SET with NX and TTL)
already_sent = redis.set(idempotency_key, "1", nx=True, ex=3600)
if not already_sent:
return # Duplicate — skip silently
# Proceed with delivery
route_to_channels(notification)
4. Chat System (1-on-1 and Group)
This tests WebSocket management, message ordering, presence, and storage trade-offs.
Requirements
Functional: 1-on-1 and group messaging, online/offline presence, read receipts, message history, media sharing.
Non-functional: Message delivery < 200ms for online users, 100% delivery (no lost messages), message ordering within a conversation.
High-Level Design
┌────────┐ WebSocket ┌──────────────┐ ┌──────────────┐
│Client A│───────────────▶│ Gateway │───▶│ Chat Service│
└────────┘ │ (WS Mgmt) │ │ │
└──────┬───────┘ └──────┬───────┘
┌────────┐ WebSocket │ │
│Client B│──────────────────────┘ │
└────────┘ │
┌─────────────────────┤
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Presence │ │ Message │
│ Service │ │ Store │
│ (Redis) │ │ │
└──────────────┘ │ Recent:Redis │
│ History: │
│ Cassandra │
└──────────────┘
Deep Dive: WebSocket Connection Management
# Connection registry (in Redis for distributed deployment)
class ConnectionManager:
def __init__(self):
self.redis = redis.Redis()
def register(self, user_id: str, server_id: str, ws_id: str):
"""Register a WebSocket connection for a user."""
self.redis.hset(f"connections:{user_id}", ws_id, server_id)
self.redis.expire(f"connections:{user_id}", 300) # 5-min TTL
def heartbeat(self, user_id: str, ws_id: str):
"""Refresh connection TTL on heartbeat (every 30 seconds)."""
self.redis.expire(f"connections:{user_id}", 300)
def get_servers(self, user_id: str) -> list:
"""Find which servers hold connections for a user."""
return self.redis.hvals(f"connections:{user_id}")
def unregister(self, user_id: str, ws_id: str):
"""Remove connection on disconnect."""
self.redis.hdel(f"connections:{user_id}", ws_id)
Deep Dive: Message Ordering
Each conversation maintains a monotonically increasing sequence number. The server assigns the sequence, not the client, to prevent conflicts.
def send_message(conversation_id: str, sender_id: str, content: str):
# Atomic increment ensures unique, ordered sequence per conversation
seq = redis.incr(f"seq:{conversation_id}")
message = {
"id": generate_uuid(),
"conversation_id": conversation_id,
"sender_id": sender_id,
"content": content,
"sequence": seq,
"timestamp": datetime.utcnow().isoformat(),
"status": "sent"
}
# Write to recent messages cache (Redis sorted set, scored by sequence)
redis.zadd(
f"messages:{conversation_id}",
{json.dumps(message): seq}
)
# Async write to persistent storage (Cassandra)
kafka.send("message-persist", message)
# Deliver to all online participants
deliver_to_recipients(conversation_id, sender_id, message)
return message
Deep Dive: Presence Service
Online/Offline Detection:
1. Client connects → presence = "online"
2. Heartbeat every 30s → refresh "online" TTL
3. Missed 2 heartbeats (60s) → presence = "away"
4. Missed 3 heartbeats (90s) → presence = "offline"
5. Explicit disconnect → presence = "offline" immediately
Group Chat: Fan-Out Strategies
| Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| Fan-out on write | When user sends message, write to every recipient's inbox | Fast reads (inbox pre-built) | Expensive writes for large groups |
| Fan-out on read | Store message once, recipients query on load | Cheap writes | Slower reads, complex queries |
| Hybrid | Fan-out on write for small groups (< 100), fan-out on read for large groups/channels | Balanced | More complex logic |
Interview tip: Most real-world chat systems (WhatsApp, Slack) use the hybrid approach. Small group chats use fan-out on write for instant delivery. Channels with thousands of members use fan-out on read to avoid write amplification.
Next: We will cover scaling patterns, database sharding, search infrastructure, and data pipeline design for handling growth. :::