Backend System Design

Scaling Patterns & Data Infrastructure

5 min read

Every system design interview eventually reaches the question: "How does this scale to 10x or 100x traffic?" This lesson covers the scaling patterns, database strategies, and specialized data infrastructure you need to answer confidently.

Horizontal vs Vertical Scaling

Aspect Vertical (Scale Up) Horizontal (Scale Out)
Method Bigger machine (more CPU, RAM, disk) More machines
Limit Hardware ceiling (largest available instance) Virtually unlimited
Complexity Simple (no code changes) Complex (distributed state, coordination)
Cost curve Exponential (2x CPU != 2x cost) Linear (2x servers ≈ 2x cost)
Downtime Requires restart to resize Zero-downtime with load balancer
Best for Databases (before sharding), small services Stateless API servers, read replicas

Interview rule of thumb: Start vertical for simplicity, then go horizontal when you hit limits. Stateless services (API servers) scale horizontally trivially. Stateful services (databases) require careful planning.

Database Read Replicas

The simplest scaling pattern for read-heavy workloads. One primary (master) handles all writes, and multiple replicas handle reads.

                    ┌──────────────┐
                    │   Primary    │
        Writes ────▶│   (Master)   │
                    └──────┬───────┘
                           │ Replication
              ┌────────────┼────────────┐
              ▼            ▼            ▼
       ┌──────────┐ ┌──────────┐ ┌──────────┐
       │ Replica 1│ │ Replica 2│ │ Replica 3│
       └──────────┘ └──────────┘ └──────────┘
              ▲            ▲            ▲
              └────────────┼────────────┘
                    Reads (load balanced)

Replication Lag

Replicas are eventually consistent. Typical lag is 10-100ms, but can spike under heavy write load. Handle this in your application:

def get_user_profile(user_id: str, just_updated: bool = False) -> dict:
    if just_updated:
        # Read from primary to guarantee seeing the latest write
        return primary_db.query("SELECT * FROM users WHERE id = %s", user_id)
    else:
        # Normal reads go to replica
        return replica_db.query("SELECT * FROM users WHERE id = %s", user_id)

Write-Ahead Logging (WAL)

WAL is the mechanism databases use to ensure durability and crash recovery. Understanding it shows deep database knowledge in interviews.

How WAL works:

  1. Before modifying a data page, the database writes the change to a sequential log file (the WAL)
  2. The WAL write is flushed to disk (fsync)
  3. Only then is the actual data page modified in memory
  4. Data pages are flushed to disk later in batches (checkpoints)

If the database crashes:

  • Uncommitted changes in memory are lost, but the WAL is on disk
  • On restart, the database replays the WAL to reconstruct the state
  • No data that was committed (WAL flushed) is ever lost

PostgreSQL WAL uses: Crash recovery, streaming replication to replicas, point-in-time recovery (PITR), and logical replication (Debezium CDC).

Sharding Strategies

When a single database server cannot handle the write volume or data size, you split data across multiple servers (shards). This is one of the most complex decisions in system design.

Strategy Comparison

Strategy How It Works Pros Cons
Range-based Shard by key range (e.g., A-M on shard 1, N-Z on shard 2) Simple range queries, easy to understand Hot partitions (some ranges much busier)
Hash-based shard = hash(key) % num_shards Even distribution Range queries broken (must scatter-gather)
Directory-based Lookup table maps each key to its shard Flexible reassignment Lookup adds latency, directory is SPOF

Hash-Based Sharding Example

import hashlib

NUM_SHARDS = 8

def get_shard(user_id: str) -> int:
    """Determine which shard holds this user's data."""
    hash_value = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
    return hash_value % NUM_SHARDS

def query_user(user_id: str) -> dict:
    shard_id = get_shard(user_id)
    connection = shard_connections[shard_id]
    return connection.query("SELECT * FROM users WHERE id = %s", user_id)

Hot Partition Detection and Mitigation

Some keys receive disproportionate traffic (a celebrity's profile, a viral post). Strategies to mitigate:

  1. Consistent hashing with virtual nodes — spreads load more evenly across physical servers
  2. Salting hot keys — append a random suffix (celebrity_123_0, celebrity_123_1, ... celebrity_123_9) and scatter across 10 shards, then aggregate reads
  3. Dedicated shard for hot data — move detected hot keys to an isolated, beefier shard
  4. Caching layer — cache hot keys in Redis so they never hit the database

Multi-Tenant Data Partitioning

SaaS applications serve multiple tenants (customers). The isolation strategy depends on scale, compliance, and cost:

Strategy Description Isolation Cost Best For
Shared DB, shared schema All tenants in same tables, tenant_id column Lowest Cheapest Small tenants, startups
Shared DB, separate schemas Each tenant gets their own schema in one DB Medium Low Medium tenants, moderate data
Dedicated schema per tenant Separate schema, shared server Good Medium Regulated industries
Dedicated DB per tenant Each tenant gets their own database server Highest Most expensive Enterprise, strict compliance
-- Shared schema approach: every query includes tenant_id
-- Use Row Level Security (RLS) in PostgreSQL for enforcement

ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Application sets tenant context per request
SET app.current_tenant = 'tenant-abc-123';
SELECT * FROM orders;  -- Only sees tenant-abc-123's orders

Blob Storage

Large binary objects (images, videos, documents) should never live in your relational database. Use object storage.

Service Provider Key Features
S3 AWS 11 nines durability, lifecycle rules, event notifications
Cloud Storage GCP Nearline/Coldline tiers, strong consistency
Azure Blob Azure Hot/Cool/Archive tiers, immutable storage

Pre-Signed URLs Pattern

import boto3

def generate_upload_url(file_key: str, content_type: str) -> str:
    """Generate a pre-signed URL for direct client-to-S3 upload."""
    s3 = boto3.client("s3")
    url = s3.generate_presigned_url(
        "put_object",
        Params={
            "Bucket": "my-app-uploads",
            "Key": file_key,
            "ContentType": content_type,
        },
        ExpiresIn=300,  # 5 minutes
    )
    return url

# Flow:
# 1. Client requests upload URL from API server
# 2. API server generates pre-signed URL and returns it
# 3. Client uploads directly to S3 (bypasses API server)
# 4. S3 triggers event notification → process/thumbnail/CDN

This pattern keeps large files off your API servers, preventing bandwidth bottlenecks and memory pressure.

Search Infrastructure

When SQL LIKE '%search_term%' is too slow (it always is at scale because it cannot use indexes), you need a dedicated search engine.

Elasticsearch Core Concepts

Concept Description
Inverted index Maps every unique word to the list of documents containing it — O(1) lookup
Mapping Schema definition: which fields are text (full-text searchable) vs keyword (exact match)
Analyzer Tokenizer + filters: "Running quickly" → ["run", "quick"] (stemming + lowercasing)
Relevance scoring TF-IDF or BM25: documents with rare matching terms score higher
Near-real-time Documents are searchable within ~1 second of indexing (refresh interval)

When to Use Elasticsearch vs SQL

Use Elasticsearch When Stick with SQL When
Full-text search across multiple fields Exact lookups by primary key
Fuzzy matching, synonyms, stemming Simple WHERE column = value queries
Faceted search (filter by category + price + rating) Data fits in memory, < 1M records
Autocomplete/typeahead suggestions Strong consistency required
Log aggregation and analysis No budget for additional infrastructure

Time-Series Data

Metrics, IoT sensor data, application logs, and financial ticks all share the pattern: write-heavy, append-only, queried by time range.

Database Type Key Feature
InfluxDB Purpose-built time-series InfluxQL/Flux query language, built-in downsampling
TimescaleDB PostgreSQL extension Full SQL support, hypertables auto-partition by time
ClickHouse Column-oriented analytics Extremely fast aggregation queries

Retention and Downsampling

Raw data at 1-second granularity consumes enormous storage. Use tiered retention:

Raw data (1-sec resolution):     keep 7 days
5-minute aggregates:             keep 30 days
1-hour aggregates:               keep 1 year
1-day aggregates:                keep forever
-- TimescaleDB: create a continuous aggregate for 5-minute rollups
CREATE MATERIALIZED VIEW metrics_5min
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('5 minutes', timestamp) AS bucket,
    sensor_id,
    AVG(value) AS avg_value,
    MAX(value) AS max_value,
    MIN(value) AS min_value,
    COUNT(*) AS sample_count
FROM raw_metrics
GROUP BY bucket, sensor_id;

-- Add a retention policy to drop raw data after 7 days
SELECT add_retention_policy('raw_metrics', INTERVAL '7 days');

ETL Pipeline Design

ETL (Extract, Transform, Load) pipelines move data from operational databases to data warehouses for analytics.

┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│   Source DB  │───▶│   Extract    │───▶│  Transform   │───▶│    Load      │
│ (PostgreSQL) │    │  (Debezium   │    │  (Spark /    │    │(BigQuery /   │
│              │    │   CDC)       │    │   dbt)       │    │ Snowflake /  │
└──────────────┘    └──────────────┘    └──────────────┘    │ Redshift)    │
                                                            └──────────────┘

CDC (Change Data Capture) with Debezium

Instead of periodic batch exports (which miss data between runs and strain the source DB), CDC captures every INSERT, UPDATE, and DELETE as it happens by reading the database's WAL.

Source DB WAL → Debezium Connector → Kafka Topic → Consumer → Data Warehouse

Benefits:
- Near-real-time (seconds, not hours)
- No load on source database (reads WAL, not tables)
- Captures deletes (batch export misses these)
- Maintains exact ordering of changes

Transform Layer

Tool Type Best For
dbt SQL-first transforms Analysts who know SQL, declarative transformations
Apache Spark Distributed compute Large-scale data (TBs+), complex transformations
Apache Flink Stream processing Real-time transformations, windowed aggregations

Interview tip: When asked about data pipelines, mention CDC over batch ETL to show modern thinking. Highlight the trade-off: CDC is more complex to set up but provides fresher data and lower source database load.

You have completed all three lessons in this module. Take the quiz to test your understanding of system design frameworks, classic problems, and scaling patterns. :::

Quiz

Module 4 Quiz: Backend System Design

Take Quiz