Data Architecture Patterns for Scale

Event Sourcing, CQRS, and Distributed Transactions

4 min read

In system design interviews, candidates who can articulate data architecture patterns beyond basic CRUD demonstrate senior-level thinking. This lesson covers the patterns that power systems at companies like Uber, Netflix, and Shopify.

Beyond CRUD: Why Event Sourcing Matters

Traditional CRUD systems store only the current state. When an order changes from "pending" to "shipped," the previous state is lost. Event sourcing flips this model: store every change as an immutable event, and derive the current state by replaying the event log.

Aspect CRUD Event Sourcing
Storage Current state only Full history of changes
Audit trail Requires separate logging Built-in by design
Debugging "What is the state now?" "How did we get here?"
Storage cost Lower Higher (mitigated by snapshots)
Complexity Lower Higher

The Append-Only Event Log

Events are immutable facts that describe what happened. They are never updated or deleted:

// Domain events are immutable records of what happened
interface OrderCreated {
  type: "OrderCreated";
  orderId: string;
  customerId: string;
  items: Array<{ productId: string; quantity: number; price: number }>;
  timestamp: string;
}

interface PaymentProcessed {
  type: "PaymentProcessed";
  orderId: string;
  paymentId: string;
  amount: number;
  timestamp: string;
}

type OrderEvent = OrderCreated | PaymentProcessed | OrderShipped | OrderCancelled;
from dataclasses import dataclass
from datetime import datetime

@dataclass(frozen=True)
class OrderCreated:
    order_id: str
    customer_id: str
    items: list[dict]
    timestamp: datetime

@dataclass(frozen=True)
class PaymentProcessed:
    order_id: str
    payment_id: str
    amount: float
    timestamp: datetime

Event Replay and Snapshots

To get the current state, replay all events for an aggregate. For performance, take periodic snapshots so you only replay events since the last snapshot:

Events:     [E1] -> [E2] -> [E3] -> [Snapshot@v3] -> [E4] -> [E5]
                                          |
                                   Replay from here (only E4, E5)

A practical rule: create a snapshot every 100 events. This keeps replay time under a few milliseconds even for long-lived aggregates.

CQRS: Separate Read and Write Models

Command Query Responsibility Segregation separates the write model (commands that produce events) from the read model (projections optimized for queries).

                    +-----------------+
  Commands -------> | Command Handler | ----> Event Store
                    +-----------------+           |
                                                  | (publish)
                                                  v
                    +-----------------+     +-----------+
  Queries --------> | Read Model (DB) | <-- | Projector |
                    +-----------------+     +-----------+

Why Separate Models?

  • Write model: Enforces business rules, validates invariants, optimized for consistency
  • Read model: Denormalized, optimized for query performance, can have multiple projections for different use cases

Eventual Consistency

The read model is updated asynchronously after events are written. The delay (typically 10-100ms) is acceptable for most use cases. When strong consistency is needed, read directly from the event store.

// Command handler: validates and produces events
function handlePlaceOrder(command: PlaceOrderCommand): OrderCreated {
  if (command.items.length === 0) {
    throw new Error("Order must have at least one item");
  }
  return {
    type: "OrderCreated",
    orderId: generateId(),
    customerId: command.customerId,
    items: command.items,
    timestamp: new Date().toISOString(),
  };
}

// Projection: materializes read-optimized view from events
function projectOrder(events: OrderEvent[]): OrderReadModel {
  let order: OrderReadModel = { status: "unknown", items: [], total: 0 };
  for (const event of events) {
    switch (event.type) {
      case "OrderCreated":
        order = { status: "pending", items: event.items, total: sumItems(event.items) };
        break;
      case "PaymentProcessed":
        order = { ...order, status: "paid" };
        break;
      case "OrderShipped":
        order = { ...order, status: "shipped" };
        break;
    }
  }
  return order;
}

Distributed Transactions: The Saga Pattern

In microservices, a single business operation (e.g., placing an order) spans multiple services. You cannot use a traditional database transaction across services. The Saga pattern breaks the operation into a sequence of local transactions with compensating actions on failure.

Orchestration vs. Choreography

Aspect Orchestration Choreography
Coordination Central orchestrator Services react to events
Coupling Low (services don't know each other) Very low (no central coordinator)
Visibility Easy to track saga state Harder to trace flow
Failure handling Orchestrator manages compensations Each service handles its own
Best for Complex workflows (5+ steps) Simple workflows (2-3 steps)
Orchestration Saga: Place Order
================================
[Orchestrator] ---> [Payment Service] : charge()
      |                    |
      | <--- success ------+
      |
      +---> [Inventory Service] : reserve()
      |                    |
      | <--- success ------+
      |
      +---> [Shipping Service] : schedule()
      |                    |
      | <--- FAILURE ------+
      |
      | COMPENSATE:
      +---> [Inventory Service] : release()    (undo reserve)
      +---> [Payment Service]   : refund()     (undo charge)

The Outbox Pattern

A common problem: you update your database and publish an event to a message broker. If the publish fails, your system is inconsistent. The Outbox pattern solves this by writing the event to an outbox table in the same database transaction, then a separate process reads the outbox and publishes to the broker:

[Service] ---> BEGIN TRANSACTION
                 UPDATE orders SET status = 'paid'
                 INSERT INTO outbox (event_type, payload) VALUES ('PaymentProcessed', '...')
               COMMIT

[Outbox Relay] ---> SELECT * FROM outbox WHERE published = false
                    PUBLISH to message broker
                    UPDATE outbox SET published = true

This guarantees at-least-once delivery: the event is always published if the transaction commits.

Database Selection Framework

Interviews often ask: "Why did you choose this database?" Here is a decision framework:

Database Type Examples Best For Avoid When
Relational (SQL) PostgreSQL, MySQL ACID transactions, complex joins, structured data Massive horizontal scale needed
Document (NoSQL) MongoDB, DynamoDB Flexible schemas, high write throughput Complex relationships, joins
Wide-Column Cassandra, HBase Time-series, high write volume, known query patterns Ad-hoc queries, joins
NewSQL CockroachDB, TiDB SQL semantics + horizontal scale Cost-sensitive, simple workloads
Time-Series InfluxDB, TimescaleDB Metrics, IoT data, time-stamped data General-purpose queries
Graph Neo4j, Amazon Neptune Relationship traversals, social networks Simple CRUD, tabular data

Interview tip: Always justify your choice with the specific requirements. "I chose PostgreSQL because we need ACID guarantees for payment transactions and the data is highly relational" is stronger than "I chose PostgreSQL because it's popular."

Data Partitioning Deep Dive

Consistent Hashing with Virtual Nodes

Standard hashing (hash(key) % N) breaks when nodes are added or removed, causing massive data reshuffling. Consistent hashing maps both keys and nodes onto a ring, so adding a node only moves a fraction of keys:

Hash Ring with Virtual Nodes:
         Node A (v1)
           |
    Node C (v2) --- Node B (v1)
           |            |
    Node A (v2) --- Node C (v1)
           |
        Node B (v2)

Each physical node gets multiple virtual nodes (e.g., 150-200)
on the ring. This ensures even data distribution.

Hash-Based vs. Range-Based Sharding

Strategy Hash-Based Range-Based
Distribution Even Can be uneven
Range queries Not supported Efficient
Hot partitions Unlikely Possible (time-based data)
Example User ID % N Date ranges, alphabetical

Hot Partition Mitigation

When a single partition receives disproportionate traffic (e.g., a viral product):

  1. Add a random suffix: Spread hot keys across partitions by appending a random number (0-9), then scatter-gather on reads
  2. Dedicated partition: Move known hot keys to their own partition with more resources
  3. Caching layer: Cache hot data in front of the partition

Interview Application: E-Commerce Order Service

"Design an e-commerce order service that handles 100K orders/minute with full audit trail."

Architecture answer using today's patterns:

  1. Event Sourcing for the order aggregate: every state change (created, paid, shipped, cancelled) is an immutable event. This gives you the full audit trail for free.
  2. CQRS with separate projections: one for customer-facing order status (optimized for single-order lookups), one for analytics (aggregating revenue and order counts).
  3. Saga orchestrator for the order workflow: coordinates payment, inventory, and shipping with compensation on failure.
  4. Database choices: Event store on PostgreSQL (ACID for event ordering), read models on DynamoDB (fast key-value lookups at scale), analytics on ClickHouse (columnar for aggregation queries).
  5. Partitioning: Hash-based sharding on orderId across the event store, with consistent hashing for easy rebalancing.
  6. Throughput math: 100K orders/min = ~1,700/sec. With 5 events per order average, that is ~8,500 event writes/sec. A single PostgreSQL instance handles 10K-50K writes/sec, so start with one primary with read replicas, and plan sharding when approaching limits.

Next: Test your understanding in the module quiz, then apply these patterns in the hands-on lab. :::

Quiz

Module 2: Data Architecture Patterns Quiz

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.