Real-Time Systems & Collaborative Architecture
Real-Time Systems, CRDTs & Stream Processing
Real-time features are everywhere: collaborative documents, live dashboards, chat, multiplayer games, presence indicators. This lesson covers the core protocols, data structures, and processing patterns that power these systems at scale.
Real-Time Transport Protocols
Three primary approaches exist for server-to-client communication. The right choice depends on your latency requirements, message direction, and infrastructure constraints.
Long Polling SSE WebSocket
┌────────┐ ┌────────┐ ┌────────┐
│ Client │ │ Client │ │ Client │
└───┬────┘ └───┬────┘ └───┬────┘
│ GET /updates │ GET /stream │ Upgrade
│────────────> │────────────> │────────────>
│ (held open) │ (held open) │ (101 Switch)
│ │ │
│<──── JSON ────── │<── data: {...} ── │<──── frame ────
│ │<── data: {...} ── │──── frame ───>
│ GET /updates │<── data: {...} ── │<──── frame ────
│────────────> │ │
│ (repeat) │ (one direction) │ (bidirectional)
| Feature | Long Polling | SSE (Server-Sent Events) | WebSocket |
|---|---|---|---|
| Direction | Server to client (simulated) | Server to client only | Bidirectional |
| Protocol | HTTP/1.1 repeated requests | HTTP/1.1 with text/event-stream |
Upgraded TCP via ws:// or wss:// |
| Connection overhead | New TCP handshake per poll | Single persistent connection | Single persistent connection |
| Binary data | No (JSON/text only) | No (UTF-8 text only) | Yes (binary frames supported) |
| Auto-reconnect | Manual implementation | Built-in (EventSource API) |
Manual implementation |
| Proxy/CDN friendly | Yes (standard HTTP) | Yes (standard HTTP) | Often requires configuration |
| Scalability concern | High request volume | One connection per client | One connection per client |
| Best for | Legacy systems, simple notifications | Live feeds, dashboards, stock tickers | Chat, collaboration, gaming |
Interview rule of thumb: Use SSE when the server pushes data one-way (dashboards, notifications). Use WebSocket when the client also sends frequent data (collaborative editing, chat). Avoid long polling in new designs unless constrained by infrastructure that blocks persistent connections.
WebSocket Connection Management at Scale
A single server can hold tens of thousands of WebSocket connections, but at scale you face routing, failover, and state management challenges.
Connection Routing Strategies
Load Balancer
┌───────────┐
Client A ────────>│ │──────> Server 1 (holds A, B)
Client B ────────>│ L7 LB │──────> Server 2 (holds C, D)
Client C ────────>│ │──────> Server 3 (holds E, F)
Client D ────────>│ │
Client E ────────>│ │
Client F ────────>└───────────┘
Sticky sessions pin a client to the same server using a cookie or client ID hash. Simple to implement, but creates problems during rolling deploys (connections drop) and uneven load distribution.
Connection migration stores connection state externally (in Redis) so any server can resume a session. When a server goes down, clients reconnect to any available server, which loads their state from Redis. More complex but eliminates sticky session drawbacks.
Heartbeats and Reconnection
import asyncio
import time
from dataclasses import dataclass, field
@dataclass
class ConnectionState:
client_id: str
last_heartbeat: float = field(default_factory=time.time)
missed_heartbeats: int = 0
class HeartbeatManager:
def __init__(self, interval_sec: float = 30.0, max_missed: int = 3):
self.interval = interval_sec
self.max_missed = max_missed
self.connections: dict[str, ConnectionState] = {}
def register(self, client_id: str) -> None:
self.connections[client_id] = ConnectionState(client_id=client_id)
def heartbeat_received(self, client_id: str) -> None:
if client_id in self.connections:
conn = self.connections[client_id]
conn.last_heartbeat = time.time()
conn.missed_heartbeats = 0
def check_connections(self) -> list[str]:
"""Returns list of client_ids that should be disconnected."""
now = time.time()
dead: list[str] = []
for client_id, conn in self.connections.items():
if now - conn.last_heartbeat > self.interval:
conn.missed_heartbeats += 1
if conn.missed_heartbeats >= self.max_missed:
dead.append(client_id)
return dead
Clients should implement exponential backoff with jitter on reconnection to avoid a thundering herd when a server restarts:
import random
def reconnect_delay(attempt: int, base_ms: int = 1000, max_ms: int = 30000) -> float:
"""Exponential backoff: 1s, 2s, 4s, 8s... capped at 30s, with jitter."""
delay = min(base_ms * (2 ** attempt), max_ms)
jitter = random.uniform(0, delay * 0.3) # 0-30% jitter
return (delay + jitter) / 1000.0
Pub/Sub Architecture
Pub/sub decouples producers from consumers, enabling fan-out to multiple subscribers without point-to-point connections.
Kafka vs Redis Streams
| Aspect | Kafka | Redis Streams |
|---|---|---|
| Storage model | Partitioned append-only log on disk | In-memory stream with optional persistence |
| Retention | Configurable (days/weeks/forever) | Memory-bounded or capped by max length |
| Consumer groups | Yes (partition-based assignment) | Yes (XREADGROUP with acknowledgment) |
| Ordering guarantee | Per-partition FIFO | Per-stream FIFO |
| Throughput | Millions of messages/sec (batched I/O) | Hundreds of thousands/sec (single-threaded) |
| Latency | Low ms (batching adds small delay) | Sub-millisecond |
| Best for | Event sourcing, analytics pipelines, durable logs | Real-time notifications, lightweight pub/sub, caching layer events |
Fan-Out Patterns
Fan-out on write (push model): When a message is published, immediately deliver it to all subscribers. Used by Twitter for users with few followers. Fast reads, expensive writes.
Fan-out on read (pull model): Subscribers query for new messages on demand. Used by Twitter for celebrity accounts (millions of followers). Cheap writes, slower reads.
Fan-out on Write Fan-out on Read
Producer Producer
│ │
▼ ▼
Publish ──> Sub A inbox Topic Log
──> Sub B inbox │
──> Sub C inbox ┌──┼──┐
▼ ▼ ▼
(N writes per message) A B C (read on demand)
(N reads per consumer)
Interview insight: Most systems use a hybrid. Twitter's fanout service pushes to timelines of users with < 500 followers but stores celebrity tweets in the main timeline for pull-based merging.
CRDTs for Collaborative Editing
When multiple users edit the same document simultaneously, you need a strategy to merge concurrent changes without conflicts.
OT vs CRDTs
Operational Transformation (OT) transforms operations against each other to maintain consistency. Google Docs uses OT with a central server that orders operations.
Conflict-free Replicated Data Types (CRDTs) are data structures designed so that concurrent updates always converge without coordination. Figma switched from OT to CRDTs in 2019, publicly documenting the benefits for their multiplayer design tool.
| Aspect | OT (Google Docs) | CRDTs (Figma, 2019+) |
|---|---|---|
| Central server required | Yes (orders operations) | No (peer-to-peer possible) |
| Complexity | O(n^2) transformation functions | O(1) merge per operation |
| Offline support | Limited (needs server) | Full (merge on reconnect) |
| Correctness proof | Difficult (many edge cases) | Mathematical guarantee |
| Memory overhead | Low (operations only) | Higher (metadata per element) |
Counter CRDTs
G-Counter (Grow-only Counter): Each node maintains its own counter. The value is the sum of all nodes. Merge takes the max of each node's counter.
from typing import Dict
class GCounter:
"""Grow-only counter CRDT. Supports increment and merge."""
def __init__(self, node_id: str):
self.node_id = node_id
self.counts: Dict[str, int] = {node_id: 0}
def increment(self, amount: int = 1) -> None:
self.counts[self.node_id] = self.counts.get(self.node_id, 0) + amount
def value(self) -> int:
return sum(self.counts.values())
def merge(self, other: "GCounter") -> None:
for node_id, count in other.counts.items():
self.counts[node_id] = max(self.counts.get(node_id, 0), count)
PN-Counter (Positive-Negative Counter): Two G-Counters — one for increments, one for decrements. Value = P.value() - N.value().
LWW-Register (Last-Writer-Wins Register)
Stores a single value with a timestamp. On conflict, the highest timestamp wins.
from dataclasses import dataclass
from typing import Any
@dataclass
class LWWRegister:
"""Last-Writer-Wins Register. Higher timestamp always wins."""
value: Any = None
timestamp: float = 0.0
node_id: str = ""
def set(self, value: Any, timestamp: float, node_id: str) -> None:
if timestamp > self.timestamp or (
timestamp == self.timestamp and node_id > self.node_id
):
self.value = value
self.timestamp = timestamp
self.node_id = node_id
def merge(self, other: "LWWRegister") -> None:
if other.timestamp > self.timestamp or (
other.timestamp == self.timestamp and other.node_id > self.node_id
):
self.value = other.value
self.timestamp = other.timestamp
self.node_id = other.node_id
OR-Set (Observed-Remove Set)
Supports both add and remove. Each element is tagged with a unique ID on add. Remove only removes the specific tags that were observed, so a concurrent add of the same element is preserved.
Text CRDTs: RGA and LSEQ
For collaborative text editing, you need CRDTs that model sequences of characters:
-
RGA (Replicated Growable Array): Each character has a unique ID (node_id, sequence). Insert places a character after a reference position. Delete marks characters as tombstones. Concurrent inserts at the same position are ordered deterministically by node ID.
-
LSEQ: Assigns each character a position from a dense identifier space. Positions are allocated between existing neighbors, avoiding rebalancing. More space-efficient than RGA for large documents.
RGA Insert Example (concurrent edits):
Initial: H - E - L - L - O
User A inserts 'X' after 'E': H - E - X - L - L - O
User B inserts 'Y' after 'E': H - E - Y - L - L - O
After merge (A.id > B.id): H - E - X - Y - L - L - O
Both insertions preserved, deterministic ordering by node ID.
Stream Processing Patterns
Real-time analytics and event processing require windowing strategies to aggregate unbounded data streams.
Window Types
Tumbling Window (fixed, non-overlapping):
|----5min----|----5min----|----5min----|
| events | events | events |
Sliding Window (fixed, overlapping):
|----5min----|
|----5min----|
|----5min----|
(advances every 1 minute)
Session Window (gap-based):
|--events--| |--events------events--| |--events--|
^gap^ ^gap^
(window closes after inactivity gap, e.g., 30 min)
| Window Type | Use Case | Example |
|---|---|---|
| Tumbling | Periodic aggregation | Page views per 5-minute bucket |
| Sliding | Rolling metrics | Average response time over last 10 minutes, updated every minute |
| Session | User behavior | Total actions per user session (ends after 30 min idle) |
Exactly-Once Semantics
In distributed stream processing, three delivery guarantees exist:
- At-most-once: Fire and forget. Messages may be lost. Fastest.
- At-least-once: Retry on failure. Messages may be duplicated. Requires idempotent consumers.
- Exactly-once: Each message processed exactly once. Achieved via idempotent writes + transactional offsets (Kafka transactions) or deduplication with unique event IDs.
Watermarks
A watermark is a timestamp assertion: "All events with timestamp <= W have arrived." Watermarks let the system know when a window can be closed and results emitted, even with out-of-order events.
Event time: 10:00 10:01 10:03 10:02 10:04 10:05
│ │ │ │ │ │
Watermark: ──────────────────────────────────>
W=10:00 W=10:02 W=10:04
Late event (10:02 arriving after W=10:02) can be:
- Dropped (simplest)
- Placed in a side output for reprocessing
- Trigger window update (allowed lateness)
Presence Detection at Scale
Showing "online" status for millions of users requires careful design to avoid overwhelming the system.
Architecture
Client ──heartbeat every 30s──> Presence Service ──> Redis
│
▼
Set TTL on key "user:{id}"
to 60s (2x heartbeat interval)
│
▼
Key expires automatically
if no heartbeat received
(user is offline)
Status query:
GET user:{id} → exists? ONLINE : OFFLINE
For showing presence to friends/channels, avoid querying per-friend. Instead:
- Pub/sub for status changes: When a user's status changes (online/offline), publish to a channel. Subscribers (friends currently online) receive the update.
- Lazy loading: Only fetch presence for users visible on screen.
- Batch queries:
MGET user:101 user:102 user:103in a single Redis call.
Interview Application: Design a Real-Time Document Editor
When asked to design Google Docs or a collaborative editor in an interview, structure your answer:
- Transport: WebSocket for bidirectional real-time edits. Fallback to SSE + HTTP POST for restricted networks.
- Conflict resolution: CRDTs (RGA for text) for automatic merge without a central ordering server. Mention OT as the alternative (Google Docs approach) and trade-offs.
- Document state: Each character has a unique CRDT ID. Clients apply local operations immediately (optimistic) and broadcast to the server. Server merges and rebroadcasts.
- Presence: Cursor positions and user status tracked via heartbeats with TTL keys in Redis.
- Persistence: Operations appended to a log (event sourcing). Periodic snapshots for fast document loading.
- Scale: Partition documents across servers. Each document lives on one server (sticky routing by document ID). Cross-document operations are rare.
Next: Module 03 Quiz and Lab — build a collaborative document backend with CRDTs, WebSocket management, and presence tracking. :::