Real-Time Systems, CRDTs & Stream Processing — System Design Interview Mastery: Scalable Architecture from Estimation to Production — Nerd Level Tech

Real-time features are everywhere: collaborative documents, live dashboards, chat, multiplayer games, presence indicators. This lesson covers the core protocols, data structures, and processing patterns that power these systems at scale.

Real-Time Transport Protocols

Three primary approaches exist for server-to-client communication. The right choice depends on your latency requirements, message direction, and infrastructure constraints.

Long Polling              SSE                    WebSocket
┌────────┐            ┌────────┐            ┌────────┐
│ Client │            │ Client │            │ Client │
└───┬────┘            └───┬────┘            └───┬────┘
    │ GET /updates        │ GET /stream         │ Upgrade
    │────────────>        │────────────>        │────────────>
    │  (held open)        │  (held open)        │ (101 Switch)
    │                     │                     │
    │<──── JSON ──────    │<── data: {...} ──   │<──── frame ────
    │                     │<── data: {...} ──   │──── frame ───>
    │ GET /updates        │<── data: {...} ──   │<──── frame ────
    │────────────>        │                     │
    │  (repeat)           │  (one direction)    │ (bidirectional)

Feature	Long Polling	SSE (Server-Sent Events)	WebSocket
Direction	Server to client (simulated)	Server to client only	Bidirectional
Protocol	HTTP/1.1 repeated requests	HTTP/1.1 with `text/event-stream`	Upgraded TCP via `ws://` or `wss://`
Connection overhead	New TCP handshake per poll	Single persistent connection	Single persistent connection
Binary data	No (JSON/text only)	No (UTF-8 text only)	Yes (binary frames supported)
Auto-reconnect	Manual implementation	Built-in (`EventSource` API)	Manual implementation
Proxy/CDN friendly	Yes (standard HTTP)	Yes (standard HTTP)	Often requires configuration
Scalability concern	High request volume	One connection per client	One connection per client
Best for	Legacy systems, simple notifications	Live feeds, dashboards, stock tickers	Chat, collaboration, gaming

Interview rule of thumb: Use SSE when the server pushes data one-way (dashboards, notifications). Use WebSocket when the client also sends frequent data (collaborative editing, chat). Avoid long polling in new designs unless constrained by infrastructure that blocks persistent connections.

WebSocket Connection Management at Scale

A single server can hold tens of thousands of WebSocket connections, but at scale you face routing, failover, and state management challenges.

Connection Routing Strategies

                    Load Balancer
                    ┌───────────┐
  Client A ────────>│           │──────> Server 1 (holds A, B)
  Client B ────────>│  L7 LB    │──────> Server 2 (holds C, D)
  Client C ────────>│           │──────> Server 3 (holds E, F)
  Client D ────────>│           │
  Client E ────────>│           │
  Client F ────────>└───────────┘

Sticky sessions pin a client to the same server using a cookie or client ID hash. Simple to implement, but creates problems during rolling deploys (connections drop) and uneven load distribution.

Connection migration stores connection state externally (in Redis) so any server can resume a session. When a server goes down, clients reconnect to any available server, which loads their state from Redis. More complex but eliminates sticky session drawbacks.

Heartbeats and Reconnection

import asyncio
import time
from dataclasses import dataclass, field

@dataclass
class ConnectionState:
    client_id: str
    last_heartbeat: float = field(default_factory=time.time)
    missed_heartbeats: int = 0

class HeartbeatManager:
    def __init__(self, interval_sec: float = 30.0, max_missed: int = 3):
        self.interval = interval_sec
        self.max_missed = max_missed
        self.connections: dict[str, ConnectionState] = {}

    def register(self, client_id: str) -> None:
        self.connections[client_id] = ConnectionState(client_id=client_id)

    def heartbeat_received(self, client_id: str) -> None:
        if client_id in self.connections:
            conn = self.connections[client_id]
            conn.last_heartbeat = time.time()
            conn.missed_heartbeats = 0

    def check_connections(self) -> list[str]:
        """Returns list of client_ids that should be disconnected."""
        now = time.time()
        dead: list[str] = []
        for client_id, conn in self.connections.items():
            if now - conn.last_heartbeat > self.interval:
                conn.missed_heartbeats += 1
                if conn.missed_heartbeats >= self.max_missed:
                    dead.append(client_id)
        return dead

Clients should implement exponential backoff with jitter on reconnection to avoid a thundering herd when a server restarts:

import random

def reconnect_delay(attempt: int, base_ms: int = 1000, max_ms: int = 30000) -> float:
    """Exponential backoff: 1s, 2s, 4s, 8s... capped at 30s, with jitter."""
    delay = min(base_ms * (2 ** attempt), max_ms)
    jitter = random.uniform(0, delay * 0.3)  # 0-30% jitter
    return (delay + jitter) / 1000.0

Pub/Sub Architecture

Pub/sub decouples producers from consumers, enabling fan-out to multiple subscribers without point-to-point connections.

Kafka vs Redis Streams

Aspect	Kafka	Redis Streams
Storage model	Partitioned append-only log on disk	In-memory stream with optional persistence
Retention	Configurable (days/weeks/forever)	Memory-bounded or capped by max length
Consumer groups	Yes (partition-based assignment)	Yes (XREADGROUP with acknowledgment)
Ordering guarantee	Per-partition FIFO	Per-stream FIFO
Throughput	Millions of messages/sec (batched I/O)	Hundreds of thousands/sec (single-threaded)
Latency	Low ms (batching adds small delay)	Sub-millisecond
Best for	Event sourcing, analytics pipelines, durable logs	Real-time notifications, lightweight pub/sub, caching layer events

Fan-Out Patterns

Fan-out on write (push model): When a message is published, immediately deliver it to all subscribers. Used by Twitter for users with few followers. Fast reads, expensive writes.

Fan-out on read (pull model): Subscribers query for new messages on demand. Used by Twitter for celebrity accounts (millions of followers). Cheap writes, slower reads.

Fan-out on Write                    Fan-out on Read

Producer                            Producer
   │                                   │
   ▼                                   ▼
 Publish ──> Sub A inbox             Topic Log
           ──> Sub B inbox              │
           ──> Sub C inbox           ┌──┼──┐
                                     ▼  ▼  ▼
  (N writes per message)           A   B   C  (read on demand)
                                   (N reads per consumer)

Interview insight: Most systems use a hybrid. Twitter's fanout service pushes to timelines of users with < 500 followers but stores celebrity tweets in the main timeline for pull-based merging.

CRDTs for Collaborative Editing

When multiple users edit the same document simultaneously, you need a strategy to merge concurrent changes without conflicts.

OT vs CRDTs

Operational Transformation (OT) transforms operations against each other to maintain consistency. Google Docs uses OT with a central server that orders operations.

Conflict-free Replicated Data Types (CRDTs) are data structures designed so that concurrent updates always converge without coordination. Figma switched from OT to CRDTs in 2019, publicly documenting the benefits for their multiplayer design tool.

Aspect	OT (Google Docs)	CRDTs (Figma, 2019+)
Central server required	Yes (orders operations)	No (peer-to-peer possible)
Complexity	O(n^2) transformation functions	O(1) merge per operation
Offline support	Limited (needs server)	Full (merge on reconnect)
Correctness proof	Difficult (many edge cases)	Mathematical guarantee
Memory overhead	Low (operations only)	Higher (metadata per element)

Counter CRDTs

G-Counter (Grow-only Counter): Each node maintains its own counter. The value is the sum of all nodes. Merge takes the max of each node's counter.

from typing import Dict

class GCounter:
    """Grow-only counter CRDT. Supports increment and merge."""
    def __init__(self, node_id: str):
        self.node_id = node_id
        self.counts: Dict[str, int] = {node_id: 0}

    def increment(self, amount: int = 1) -> None:
        self.counts[self.node_id] = self.counts.get(self.node_id, 0) + amount

    def value(self) -> int:
        return sum(self.counts.values())

    def merge(self, other: "GCounter") -> None:
        for node_id, count in other.counts.items():
            self.counts[node_id] = max(self.counts.get(node_id, 0), count)

PN-Counter (Positive-Negative Counter): Two G-Counters — one for increments, one for decrements. Value = P.value() - N.value().

LWW-Register (Last-Writer-Wins Register)

Stores a single value with a timestamp. On conflict, the highest timestamp wins.

from dataclasses import dataclass
from typing import Any

@dataclass
class LWWRegister:
    """Last-Writer-Wins Register. Higher timestamp always wins."""
    value: Any = None
    timestamp: float = 0.0
    node_id: str = ""

    def set(self, value: Any, timestamp: float, node_id: str) -> None:
        if timestamp > self.timestamp or (
            timestamp == self.timestamp and node_id > self.node_id
        ):
            self.value = value
            self.timestamp = timestamp
            self.node_id = node_id

    def merge(self, other: "LWWRegister") -> None:
        if other.timestamp > self.timestamp or (
            other.timestamp == self.timestamp and other.node_id > self.node_id
        ):
            self.value = other.value
            self.timestamp = other.timestamp
            self.node_id = other.node_id

OR-Set (Observed-Remove Set)

Supports both add and remove. Each element is tagged with a unique ID on add. Remove only removes the specific tags that were observed, so a concurrent add of the same element is preserved.

Text CRDTs: RGA and LSEQ

For collaborative text editing, you need CRDTs that model sequences of characters:

RGA (Replicated Growable Array): Each character has a unique ID (node_id, sequence). Insert places a character after a reference position. Delete marks characters as tombstones. Concurrent inserts at the same position are ordered deterministically by node ID.
LSEQ: Assigns each character a position from a dense identifier space. Positions are allocated between existing neighbors, avoiding rebalancing. More space-efficient than RGA for large documents.

  RGA Insert Example (concurrent edits):

  Initial:     H - E - L - L - O

  User A inserts 'X' after 'E':   H - E - X - L - L - O
  User B inserts 'Y' after 'E':   H - E - Y - L - L - O

  After merge (A.id > B.id):      H - E - X - Y - L - L - O
  Both insertions preserved, deterministic ordering by node ID.

Stream Processing Patterns

Real-time analytics and event processing require windowing strategies to aggregate unbounded data streams.

Window Types

Tumbling Window (fixed, non-overlapping):
|----5min----|----5min----|----5min----|
|  events    |  events    |  events    |

Sliding Window (fixed, overlapping):
|----5min----|
   |----5min----|
      |----5min----|
(advances every 1 minute)

Session Window (gap-based):
|--events--|     |--events------events--|    |--events--|
           ^gap^                        ^gap^
(window closes after inactivity gap, e.g., 30 min)

Window Type	Use Case	Example
Tumbling	Periodic aggregation	Page views per 5-minute bucket
Sliding	Rolling metrics	Average response time over last 10 minutes, updated every minute
Session	User behavior	Total actions per user session (ends after 30 min idle)

Exactly-Once Semantics

In distributed stream processing, three delivery guarantees exist:

At-most-once: Fire and forget. Messages may be lost. Fastest.
At-least-once: Retry on failure. Messages may be duplicated. Requires idempotent consumers.
Exactly-once: Each message processed exactly once. Achieved via idempotent writes + transactional offsets (Kafka transactions) or deduplication with unique event IDs.

Watermarks

A watermark is a timestamp assertion: "All events with timestamp <= W have arrived." Watermarks let the system know when a window can be closed and results emitted, even with out-of-order events.

Event time:    10:00  10:01  10:03  10:02  10:04  10:05
               │      │      │      │      │      │
Watermark:     ──────────────────────────────────>
               W=10:00       W=10:02       W=10:04

Late event (10:02 arriving after W=10:02) can be:
- Dropped (simplest)
- Placed in a side output for reprocessing
- Trigger window update (allowed lateness)

Presence Detection at Scale

Showing "online" status for millions of users requires careful design to avoid overwhelming the system.

Architecture

Client ──heartbeat every 30s──> Presence Service ──> Redis
                                       │
                                       ▼
                          Set TTL on key "user:{id}"
                          to 60s (2x heartbeat interval)
                                       │
                                       ▼
                          Key expires automatically
                          if no heartbeat received
                          (user is offline)

Status query:
  GET user:{id}  →  exists? ONLINE : OFFLINE

For showing presence to friends/channels, avoid querying per-friend. Instead:

Pub/sub for status changes: When a user's status changes (online/offline), publish to a channel. Subscribers (friends currently online) receive the update.
Lazy loading: Only fetch presence for users visible on screen.
Batch queries: MGET user:101 user:102 user:103 in a single Redis call.

Interview Application: Design a Real-Time Document Editor

When asked to design Google Docs or a collaborative editor in an interview, structure your answer:

Transport: WebSocket for bidirectional real-time edits. Fallback to SSE + HTTP POST for restricted networks.
Conflict resolution: CRDTs (RGA for text) for automatic merge without a central ordering server. Mention OT as the alternative (Google Docs approach) and trade-offs.
Document state: Each character has a unique CRDT ID. Clients apply local operations immediately (optimistic) and broadcast to the server. Server merges and rebroadcasts.
Presence: Cursor positions and user status tracked via heartbeats with TTL keys in Redis.
Persistence: Operations appended to a log (event sourcing). Periodic snapshots for fast document loading.
Scale: Partition documents across servers. Each document lives on one server (sticky routing by document ID). Cross-document operations are rare.

Next: Module 03 Quiz and Lab — build a collaborative document backend with CRDTs, WebSocket management, and presence tracking. :::