Real-Time Systems & Collaborative Architecture

Real-Time Systems, CRDTs & Stream Processing

4 min read

Real-time features are everywhere: collaborative documents, live dashboards, chat, multiplayer games, presence indicators. This lesson covers the core protocols, data structures, and processing patterns that power these systems at scale.

Real-Time Transport Protocols

Three primary approaches exist for server-to-client communication. The right choice depends on your latency requirements, message direction, and infrastructure constraints.

Long Polling              SSE                    WebSocket
┌────────┐            ┌────────┐            ┌────────┐
│ Client │            │ Client │            │ Client │
└───┬────┘            └───┬────┘            └───┬────┘
    │ GET /updates        │ GET /stream         │ Upgrade
    │────────────>        │────────────>        │────────────>
    │  (held open)        │  (held open)        │ (101 Switch)
    │                     │                     │
    │<──── JSON ──────    │<── data: {...} ──   │<──── frame ────
    │                     │<── data: {...} ──   │──── frame ───>
    │ GET /updates        │<── data: {...} ──   │<──── frame ────
    │────────────>        │                     │
    │  (repeat)           │  (one direction)    │ (bidirectional)
Feature Long Polling SSE (Server-Sent Events) WebSocket
Direction Server to client (simulated) Server to client only Bidirectional
Protocol HTTP/1.1 repeated requests HTTP/1.1 with text/event-stream Upgraded TCP via ws:// or wss://
Connection overhead New TCP handshake per poll Single persistent connection Single persistent connection
Binary data No (JSON/text only) No (UTF-8 text only) Yes (binary frames supported)
Auto-reconnect Manual implementation Built-in (EventSource API) Manual implementation
Proxy/CDN friendly Yes (standard HTTP) Yes (standard HTTP) Often requires configuration
Scalability concern High request volume One connection per client One connection per client
Best for Legacy systems, simple notifications Live feeds, dashboards, stock tickers Chat, collaboration, gaming

Interview rule of thumb: Use SSE when the server pushes data one-way (dashboards, notifications). Use WebSocket when the client also sends frequent data (collaborative editing, chat). Avoid long polling in new designs unless constrained by infrastructure that blocks persistent connections.

WebSocket Connection Management at Scale

A single server can hold tens of thousands of WebSocket connections, but at scale you face routing, failover, and state management challenges.

Connection Routing Strategies

                    Load Balancer
                    ┌───────────┐
  Client A ────────>│           │──────> Server 1 (holds A, B)
  Client B ────────>│  L7 LB    │──────> Server 2 (holds C, D)
  Client C ────────>│           │──────> Server 3 (holds E, F)
  Client D ────────>│           │
  Client E ────────>│           │
  Client F ────────>└───────────┘

Sticky sessions pin a client to the same server using a cookie or client ID hash. Simple to implement, but creates problems during rolling deploys (connections drop) and uneven load distribution.

Connection migration stores connection state externally (in Redis) so any server can resume a session. When a server goes down, clients reconnect to any available server, which loads their state from Redis. More complex but eliminates sticky session drawbacks.

Heartbeats and Reconnection

import asyncio
import time
from dataclasses import dataclass, field

@dataclass
class ConnectionState:
    client_id: str
    last_heartbeat: float = field(default_factory=time.time)
    missed_heartbeats: int = 0

class HeartbeatManager:
    def __init__(self, interval_sec: float = 30.0, max_missed: int = 3):
        self.interval = interval_sec
        self.max_missed = max_missed
        self.connections: dict[str, ConnectionState] = {}

    def register(self, client_id: str) -> None:
        self.connections[client_id] = ConnectionState(client_id=client_id)

    def heartbeat_received(self, client_id: str) -> None:
        if client_id in self.connections:
            conn = self.connections[client_id]
            conn.last_heartbeat = time.time()
            conn.missed_heartbeats = 0

    def check_connections(self) -> list[str]:
        """Returns list of client_ids that should be disconnected."""
        now = time.time()
        dead: list[str] = []
        for client_id, conn in self.connections.items():
            if now - conn.last_heartbeat > self.interval:
                conn.missed_heartbeats += 1
                if conn.missed_heartbeats >= self.max_missed:
                    dead.append(client_id)
        return dead

Clients should implement exponential backoff with jitter on reconnection to avoid a thundering herd when a server restarts:

import random

def reconnect_delay(attempt: int, base_ms: int = 1000, max_ms: int = 30000) -> float:
    """Exponential backoff: 1s, 2s, 4s, 8s... capped at 30s, with jitter."""
    delay = min(base_ms * (2 ** attempt), max_ms)
    jitter = random.uniform(0, delay * 0.3)  # 0-30% jitter
    return (delay + jitter) / 1000.0

Pub/Sub Architecture

Pub/sub decouples producers from consumers, enabling fan-out to multiple subscribers without point-to-point connections.

Kafka vs Redis Streams

Aspect Kafka Redis Streams
Storage model Partitioned append-only log on disk In-memory stream with optional persistence
Retention Configurable (days/weeks/forever) Memory-bounded or capped by max length
Consumer groups Yes (partition-based assignment) Yes (XREADGROUP with acknowledgment)
Ordering guarantee Per-partition FIFO Per-stream FIFO
Throughput Millions of messages/sec (batched I/O) Hundreds of thousands/sec (single-threaded)
Latency Low ms (batching adds small delay) Sub-millisecond
Best for Event sourcing, analytics pipelines, durable logs Real-time notifications, lightweight pub/sub, caching layer events

Fan-Out Patterns

Fan-out on write (push model): When a message is published, immediately deliver it to all subscribers. Used by Twitter for users with few followers. Fast reads, expensive writes.

Fan-out on read (pull model): Subscribers query for new messages on demand. Used by Twitter for celebrity accounts (millions of followers). Cheap writes, slower reads.

Fan-out on Write                    Fan-out on Read

Producer                            Producer
   │                                   │
   ▼                                   ▼
 Publish ──> Sub A inbox             Topic Log
           ──> Sub B inbox              │
           ──> Sub C inbox           ┌──┼──┐
                                     ▼  ▼  ▼
  (N writes per message)           A   B   C  (read on demand)
                                   (N reads per consumer)

Interview insight: Most systems use a hybrid. Twitter's fanout service pushes to timelines of users with < 500 followers but stores celebrity tweets in the main timeline for pull-based merging.

CRDTs for Collaborative Editing

When multiple users edit the same document simultaneously, you need a strategy to merge concurrent changes without conflicts.

OT vs CRDTs

Operational Transformation (OT) transforms operations against each other to maintain consistency. Google Docs uses OT with a central server that orders operations.

Conflict-free Replicated Data Types (CRDTs) are data structures designed so that concurrent updates always converge without coordination. Figma switched from OT to CRDTs in 2019, publicly documenting the benefits for their multiplayer design tool.

Aspect OT (Google Docs) CRDTs (Figma, 2019+)
Central server required Yes (orders operations) No (peer-to-peer possible)
Complexity O(n^2) transformation functions O(1) merge per operation
Offline support Limited (needs server) Full (merge on reconnect)
Correctness proof Difficult (many edge cases) Mathematical guarantee
Memory overhead Low (operations only) Higher (metadata per element)

Counter CRDTs

G-Counter (Grow-only Counter): Each node maintains its own counter. The value is the sum of all nodes. Merge takes the max of each node's counter.

from typing import Dict

class GCounter:
    """Grow-only counter CRDT. Supports increment and merge."""
    def __init__(self, node_id: str):
        self.node_id = node_id
        self.counts: Dict[str, int] = {node_id: 0}

    def increment(self, amount: int = 1) -> None:
        self.counts[self.node_id] = self.counts.get(self.node_id, 0) + amount

    def value(self) -> int:
        return sum(self.counts.values())

    def merge(self, other: "GCounter") -> None:
        for node_id, count in other.counts.items():
            self.counts[node_id] = max(self.counts.get(node_id, 0), count)

PN-Counter (Positive-Negative Counter): Two G-Counters — one for increments, one for decrements. Value = P.value() - N.value().

LWW-Register (Last-Writer-Wins Register)

Stores a single value with a timestamp. On conflict, the highest timestamp wins.

from dataclasses import dataclass
from typing import Any

@dataclass
class LWWRegister:
    """Last-Writer-Wins Register. Higher timestamp always wins."""
    value: Any = None
    timestamp: float = 0.0
    node_id: str = ""

    def set(self, value: Any, timestamp: float, node_id: str) -> None:
        if timestamp > self.timestamp or (
            timestamp == self.timestamp and node_id > self.node_id
        ):
            self.value = value
            self.timestamp = timestamp
            self.node_id = node_id

    def merge(self, other: "LWWRegister") -> None:
        if other.timestamp > self.timestamp or (
            other.timestamp == self.timestamp and other.node_id > self.node_id
        ):
            self.value = other.value
            self.timestamp = other.timestamp
            self.node_id = other.node_id

OR-Set (Observed-Remove Set)

Supports both add and remove. Each element is tagged with a unique ID on add. Remove only removes the specific tags that were observed, so a concurrent add of the same element is preserved.

Text CRDTs: RGA and LSEQ

For collaborative text editing, you need CRDTs that model sequences of characters:

  • RGA (Replicated Growable Array): Each character has a unique ID (node_id, sequence). Insert places a character after a reference position. Delete marks characters as tombstones. Concurrent inserts at the same position are ordered deterministically by node ID.

  • LSEQ: Assigns each character a position from a dense identifier space. Positions are allocated between existing neighbors, avoiding rebalancing. More space-efficient than RGA for large documents.

  RGA Insert Example (concurrent edits):

  Initial:     H - E - L - L - O

  User A inserts 'X' after 'E':   H - E - X - L - L - O
  User B inserts 'Y' after 'E':   H - E - Y - L - L - O

  After merge (A.id > B.id):      H - E - X - Y - L - L - O
  Both insertions preserved, deterministic ordering by node ID.

Stream Processing Patterns

Real-time analytics and event processing require windowing strategies to aggregate unbounded data streams.

Window Types

Tumbling Window (fixed, non-overlapping):
|----5min----|----5min----|----5min----|
|  events    |  events    |  events    |

Sliding Window (fixed, overlapping):
|----5min----|
   |----5min----|
      |----5min----|
(advances every 1 minute)

Session Window (gap-based):
|--events--|     |--events------events--|    |--events--|
           ^gap^                        ^gap^
(window closes after inactivity gap, e.g., 30 min)
Window Type Use Case Example
Tumbling Periodic aggregation Page views per 5-minute bucket
Sliding Rolling metrics Average response time over last 10 minutes, updated every minute
Session User behavior Total actions per user session (ends after 30 min idle)

Exactly-Once Semantics

In distributed stream processing, three delivery guarantees exist:

  • At-most-once: Fire and forget. Messages may be lost. Fastest.
  • At-least-once: Retry on failure. Messages may be duplicated. Requires idempotent consumers.
  • Exactly-once: Each message processed exactly once. Achieved via idempotent writes + transactional offsets (Kafka transactions) or deduplication with unique event IDs.

Watermarks

A watermark is a timestamp assertion: "All events with timestamp <= W have arrived." Watermarks let the system know when a window can be closed and results emitted, even with out-of-order events.

Event time:    10:00  10:01  10:03  10:02  10:04  10:05
               │      │      │      │      │      │
Watermark:     ──────────────────────────────────>
               W=10:00       W=10:02       W=10:04

Late event (10:02 arriving after W=10:02) can be:
- Dropped (simplest)
- Placed in a side output for reprocessing
- Trigger window update (allowed lateness)

Presence Detection at Scale

Showing "online" status for millions of users requires careful design to avoid overwhelming the system.

Architecture

Client ──heartbeat every 30s──> Presence Service ──> Redis
                          Set TTL on key "user:{id}"
                          to 60s (2x heartbeat interval)
                          Key expires automatically
                          if no heartbeat received
                          (user is offline)

Status query:
  GET user:{id}  →  exists? ONLINE : OFFLINE

For showing presence to friends/channels, avoid querying per-friend. Instead:

  1. Pub/sub for status changes: When a user's status changes (online/offline), publish to a channel. Subscribers (friends currently online) receive the update.
  2. Lazy loading: Only fetch presence for users visible on screen.
  3. Batch queries: MGET user:101 user:102 user:103 in a single Redis call.

Interview Application: Design a Real-Time Document Editor

When asked to design Google Docs or a collaborative editor in an interview, structure your answer:

  1. Transport: WebSocket for bidirectional real-time edits. Fallback to SSE + HTTP POST for restricted networks.
  2. Conflict resolution: CRDTs (RGA for text) for automatic merge without a central ordering server. Mention OT as the alternative (Google Docs approach) and trade-offs.
  3. Document state: Each character has a unique CRDT ID. Clients apply local operations immediately (optimistic) and broadcast to the server. Server merges and rebroadcasts.
  4. Presence: Cursor positions and user status tracked via heartbeats with TTL keys in Redis.
  5. Persistence: Operations appended to a log (event sourcing). Periodic snapshots for fast document loading.
  6. Scale: Partition documents across servers. Each document lives on one server (sticky routing by document ID). Cross-document operations are rare.

Next: Module 03 Quiz and Lab — build a collaborative document backend with CRDTs, WebSocket management, and presence tracking. :::

Quiz

Module 3 Quiz: Real-Time Systems & Collaborative Architecture

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.