Event Sourcing, CQRS, and Distributed Transactions

In system design interviews, candidates who can articulate data architecture patterns beyond basic CRUD demonstrate senior-level thinking. This lesson covers the patterns that power systems at companies like Uber, Netflix, and Shopify.

Beyond CRUD: Why Event Sourcing Matters

Traditional CRUD systems store only the current state. When an order changes from "pending" to "shipped," the previous state is lost. Event sourcing flips this model: store every change as an immutable event, and derive the current state by replaying the event log.

Aspect	CRUD	Event Sourcing
Storage	Current state only	Full history of changes
Audit trail	Requires separate logging	Built-in by design
Debugging	"What is the state now?"	"How did we get here?"
Storage cost	Lower	Higher (mitigated by snapshots)
Complexity	Lower	Higher

The Append-Only Event Log

Events are immutable facts that describe what happened. They are never updated or deleted:

// Domain events are immutable records of what happened
interface OrderCreated {
  type: "OrderCreated";
  orderId: string;
  customerId: string;
  items: Array<{ productId: string; quantity: number; price: number }>;
  timestamp: string;
}

interface PaymentProcessed {
  type: "PaymentProcessed";
  orderId: string;
  paymentId: string;
  amount: number;
  timestamp: string;
}

type OrderEvent = OrderCreated | PaymentProcessed | OrderShipped | OrderCancelled;

from dataclasses import dataclass
from datetime import datetime

@dataclass(frozen=True)
class OrderCreated:
    order_id: str
    customer_id: str
    items: list[dict]
    timestamp: datetime

@dataclass(frozen=True)
class PaymentProcessed:
    order_id: str
    payment_id: str
    amount: float
    timestamp: datetime

Event Replay and Snapshots

To get the current state, replay all events for an aggregate. For performance, take periodic snapshots so you only replay events since the last snapshot:

Events:     [E1] -> [E2] -> [E3] -> [Snapshot@v3] -> [E4] -> [E5]
                                          |
                                   Replay from here (only E4, E5)

A practical rule: create a snapshot every 100 events. This keeps replay time under a few milliseconds even for long-lived aggregates.

CQRS: Separate Read and Write Models

Command Query Responsibility Segregation separates the write model (commands that produce events) from the read model (projections optimized for queries).

                    +-----------------+
  Commands -------> | Command Handler | ----> Event Store
                    +-----------------+           |
                                                  | (publish)
                                                  v
                    +-----------------+     +-----------+
  Queries --------> | Read Model (DB) | <-- | Projector |
                    +-----------------+     +-----------+

Why Separate Models?

Write model: Enforces business rules, validates invariants, optimized for consistency
Read model: Denormalized, optimized for query performance, can have multiple projections for different use cases

Eventual Consistency

The read model is updated asynchronously after events are written. The delay (typically 10-100ms) is acceptable for most use cases. When strong consistency is needed, read directly from the event store.

// Command handler: validates and produces events
function handlePlaceOrder(command: PlaceOrderCommand): OrderCreated {
  if (command.items.length === 0) {
    throw new Error("Order must have at least one item");
  }
  return {
    type: "OrderCreated",
    orderId: generateId(),
    customerId: command.customerId,
    items: command.items,
    timestamp: new Date().toISOString(),
  };
}

// Projection: materializes read-optimized view from events
function projectOrder(events: OrderEvent[]): OrderReadModel {
  let order: OrderReadModel = { status: "unknown", items: [], total: 0 };
  for (const event of events) {
    switch (event.type) {
      case "OrderCreated":
        order = { status: "pending", items: event.items, total: sumItems(event.items) };
        break;
      case "PaymentProcessed":
        order = { ...order, status: "paid" };
        break;
      case "OrderShipped":
        order = { ...order, status: "shipped" };
        break;
    }
  }
  return order;
}

Distributed Transactions: The Saga Pattern

In microservices, a single business operation (e.g., placing an order) spans multiple services. You cannot use a traditional database transaction across services. The Saga pattern breaks the operation into a sequence of local transactions with compensating actions on failure.

Orchestration vs. Choreography

Aspect	Orchestration	Choreography
Coordination	Central orchestrator	Services react to events
Coupling	Low (services don't know each other)	Very low (no central coordinator)
Visibility	Easy to track saga state	Harder to trace flow
Failure handling	Orchestrator manages compensations	Each service handles its own
Best for	Complex workflows (5+ steps)	Simple workflows (2-3 steps)

Orchestration Saga: Place Order
================================
[Orchestrator] ---> [Payment Service] : charge()
      |                    |
      | <--- success ------+
      |
      +---> [Inventory Service] : reserve()
      |                    |
      | <--- success ------+
      |
      +---> [Shipping Service] : schedule()
      |                    |
      | <--- FAILURE ------+
      |
      | COMPENSATE:
      +---> [Inventory Service] : release()    (undo reserve)
      +---> [Payment Service]   : refund()     (undo charge)

The Outbox Pattern

A common problem: you update your database and publish an event to a message broker. If the publish fails, your system is inconsistent. The Outbox pattern solves this by writing the event to an outbox table in the same database transaction, then a separate process reads the outbox and publishes to the broker:

[Service] ---> BEGIN TRANSACTION
                 UPDATE orders SET status = 'paid'
                 INSERT INTO outbox (event_type, payload) VALUES ('PaymentProcessed', '...')
               COMMIT

[Outbox Relay] ---> SELECT * FROM outbox WHERE published = false
                    PUBLISH to message broker
                    UPDATE outbox SET published = true

This guarantees at-least-once delivery: the event is always published if the transaction commits.

Database Selection Framework

Interviews often ask: "Why did you choose this database?" Here is a decision framework:

Database Type	Examples	Best For	Avoid When
Relational (SQL)	PostgreSQL, MySQL	ACID transactions, complex joins, structured data	Massive horizontal scale needed
Document (NoSQL)	MongoDB, DynamoDB	Flexible schemas, high write throughput	Complex relationships, joins
Wide-Column	Cassandra, HBase	Time-series, high write volume, known query patterns	Ad-hoc queries, joins
NewSQL	CockroachDB, TiDB	SQL semantics + horizontal scale	Cost-sensitive, simple workloads
Time-Series	InfluxDB, TimescaleDB	Metrics, IoT data, time-stamped data	General-purpose queries
Graph	Neo4j, Amazon Neptune	Relationship traversals, social networks	Simple CRUD, tabular data

Interview tip: Always justify your choice with the specific requirements. "I chose PostgreSQL because we need ACID guarantees for payment transactions and the data is highly relational" is stronger than "I chose PostgreSQL because it's popular."

Data Partitioning Deep Dive

Consistent Hashing with Virtual Nodes

Standard hashing (hash(key) % N) breaks when nodes are added or removed, causing massive data reshuffling. Consistent hashing maps both keys and nodes onto a ring, so adding a node only moves a fraction of keys:

Hash Ring with Virtual Nodes:
         Node A (v1)
           |
    Node C (v2) --- Node B (v1)
           |            |
    Node A (v2) --- Node C (v1)
           |
        Node B (v2)

Each physical node gets multiple virtual nodes (e.g., 150-200)
on the ring. This ensures even data distribution.

Hash-Based vs. Range-Based Sharding

Strategy	Hash-Based	Range-Based
Distribution	Even	Can be uneven
Range queries	Not supported	Efficient
Hot partitions	Unlikely	Possible (time-based data)
Example	User ID % N	Date ranges, alphabetical

Hot Partition Mitigation

When a single partition receives disproportionate traffic (e.g., a viral product):

Add a random suffix: Spread hot keys across partitions by appending a random number (0-9), then scatter-gather on reads
Dedicated partition: Move known hot keys to their own partition with more resources
Caching layer: Cache hot data in front of the partition

Interview Application: E-Commerce Order Service

"Design an e-commerce order service that handles 100K orders/minute with full audit trail."

Architecture answer using today's patterns:

Event Sourcing for the order aggregate: every state change (created, paid, shipped, cancelled) is an immutable event. This gives you the full audit trail for free.
CQRS with separate projections: one for customer-facing order status (optimized for single-order lookups), one for analytics (aggregating revenue and order counts).
Saga orchestrator for the order workflow: coordinates payment, inventory, and shipping with compensation on failure.
Database choices: Event store on PostgreSQL (ACID for event ordering), read models on DynamoDB (fast key-value lookups at scale), analytics on ClickHouse (columnar for aggregation queries).
Partitioning: Hash-based sharding on orderId across the event store, with consistent hashing for easy rebalancing.
Throughput math: 100K orders/min = ~1,700/sec. With 5 events per order average, that is ~8,500 event writes/sec. A single PostgreSQL instance handles 10K-50K writes/sec, so start with one primary with read replicas, and plan sharding when approaching limits.

Next: Test your understanding in the module quiz, then apply these patterns in the hands-on lab. :::