Back to Course|System Design Interview Mastery: Scalable Architecture from Estimation to Production
Lab

Build a Collaborative Document Backend

45 min
Advanced
Unlimited free attempts

Instructions

In this lab, you will build a complete collaborative document editing backend in Python. You will implement CRDTs for conflict-free data merging, a WebSocket connection manager for real-time communication, a document manager for state coordination, and a presence system for tracking active users and cursor positions.

This is the architecture behind systems like Google Docs, Figma, and Notion.

Architecture Overview

Client A ──WebSocket──> Connection Manager ──> Message Handler
Client B ──WebSocket──>      |                      |
Client C ──WebSocket──>      |                      |
                             |                      v
                        Heartbeat &           Document Manager
                        Reconnection               |
                             |              ┌──────┼──────┐
                             v              v      v      v
                        Presence         G-Counter  LWW   Text CRDT
                        Tracker          PN-Counter  Reg   (RGA)

What You Will Build

You will implement 8 files that together form a working collaborative editing backend:

  1. server/crdt/counter.py — G-Counter and PN-Counter CRDTs with merge operations
  2. server/crdt/lww_register.py — Last-Writer-Wins Register with timestamp-based conflict resolution
  3. server/crdt/text_crdt.py — Text CRDT (RGA-based) supporting insert, delete, and concurrent editing
  4. server/document/document_manager.py — Document state management: apply operations, merge replicas, generate diffs
  5. server/websocket/connection_manager.py — WebSocket connection pool: connect, disconnect, heartbeat, broadcast
  6. server/websocket/message_handler.py — Message routing: parse incoming ops, apply to document, broadcast updates
  7. server/presence/presence_tracker.py — Presence system: track active users per document, cursor positions, idle detection
  8. server/main.py — Application entry: start server, handle connections, manage documents

Step-by-Step Guide

Step 1: Counter CRDTs (FILE 1)

Implement GCounter and PNCounter. A G-Counter tracks a single counter per node. Its value is the sum of all counters. Merge takes the element-wise max. A PN-Counter uses two G-Counters — one for increments (p), one for decrements (n) — so the value is p.value() - n.value().

Key properties to maintain:

  • merge() is commutative: a.merge(b) produces the same state as b.merge(a)
  • merge() is idempotent: a.merge(b); a.merge(b) produces the same state as a single merge
  • The counter value is always eventually consistent after merge

Step 2: LWW-Register (FILE 2)

Implement a Last-Writer-Wins Register that stores a single value. When two writes conflict, the one with the higher timestamp wins. If timestamps are equal, use node ID as a deterministic tiebreaker (lexicographic comparison).

Step 3: Text CRDT (FILE 3)

Implement an RGA-based text CRDT. Each character is represented as a node with:

  • A unique ID: (node_id, sequence_number)
  • The character value
  • A reference to the node it was inserted after
  • A tombstone flag for deleted characters

Insert places a new character after a reference position. If two concurrent inserts target the same position, the one with the higher node ID comes first (deterministic ordering). Delete marks a character as a tombstone (it remains in the structure but is not visible).

The to_string() method returns only non-tombstoned characters in order.

Step 4: Document Manager (FILE 4)

The document manager tracks the state of a document using the CRDTs from steps 1-3. It supports:

  • Applying individual operations (insert, delete, set_title, increment_counter)
  • Merging an entire replica state from another node
  • Generating a diff between the current state and a previous version
  • Returning a snapshot of the full document state

Step 5: WebSocket Connection Manager (FILE 5)

Build a connection manager that tracks WebSocket connections for each document. It should:

  • Register and unregister connections (mapping client IDs to WebSocket objects)
  • Track which document each client is editing
  • Send heartbeats and detect dead connections (missed heartbeats > threshold)
  • Broadcast messages to all clients editing the same document (excluding the sender)

Step 6: Message Handler (FILE 6)

The message handler receives raw WebSocket messages (JSON), parses them into typed operations, applies them to the document via the document manager, and broadcasts the result to other connected clients. It should handle malformed messages gracefully.

Supported message types: insert, delete, set_title, cursor_move, presence_update.

Step 7: Presence Tracker (FILE 7)

Track active users per document with:

  • Cursor position (line, column) per user
  • Last activity timestamp for idle detection
  • An idle threshold (e.g., 5 minutes) — users who exceed it are marked idle
  • Methods to get all active (non-idle) users for a document

Step 8: Main Entry Point (FILE 8)

Wire all components together. The CollaborativeServer class should:

  • Initialize the document manager, connection manager, message handler, and presence tracker
  • Handle new client connections (register, send current document state)
  • Handle disconnections (unregister, remove presence)
  • Route incoming messages to the message handler
  • Provide a method to get server status (connected clients, active documents)

Hints

  • For the text CRDT, store characters in a list and use linear search for position lookups. This is O(n) but correct and simple — real implementations use trees for O(log n).
  • Use time.time() for timestamps throughout. In production, you would use vector clocks or hybrid logical clocks.
  • The WebSocket objects in this lab are simulated — you do not need an actual WebSocket library. Use the provided MockWebSocket interface in the starter code.
  • All CRDT merge() methods must be commutative and idempotent. Test by merging A into B and B into A — both should produce identical states.
  • Use Python type hints on all public methods.

Grading Rubric

G-Counter and PN-Counter with correct merge semantics: GCounter tracks per-node counts, increment only increases the local node's counter, value returns the sum, and merge takes element-wise max. PNCounter uses two GCounters where value = p - n, and merge delegates to both inner counters. Both merge operations are commutative and idempotent.10 points
LWW-Register with timestamp conflict resolution: set() only updates when the new timestamp is strictly higher, or when timestamps are equal and node_id serves as a deterministic tiebreaker (lexicographic comparison). merge() correctly adopts the other register's state when it wins. get_state() returns a consistent tuple of (value, timestamp, writer_node_id).10 points
Text CRDT supporting concurrent insert/delete with convergence: RGA-based implementation where each character has a unique (node_id, seq) ID. insert() places characters after a reference position. Concurrent inserts at the same position are ordered deterministically by node_id. delete() uses tombstones. to_string() returns visible text only. merge() integrates remote nodes and applies tombstones, producing identical state regardless of merge order.20 points
Document manager applying operations and merging replica states: apply_operation() correctly routes insert, delete, set_title, and increment_counter to the appropriate CRDTs. Version is incremented on each operation. merge_replica() accepts serialized CRDT states and integrates them. get_snapshot() returns a consistent DocumentSnapshot. get_diff() returns operations and content changes between versions.15 points
WebSocket connection manager with heartbeat and reconnection: connect() registers a client with a document and stores the websocket. disconnect() removes the client, cleans up document associations, and closes the websocket. heartbeat() updates the last_heartbeat timestamp. broadcast() sends JSON messages to all document peers except the sender. cleanup_dead_connections() detects and removes clients that missed too many heartbeats.15 points
Message handler routing operations and broadcasting: parse_message() safely parses JSON, validates required fields (type, user_id, doc_id), and rejects malformed messages gracefully. handle_message() routes to the correct handler based on message type (insert/delete/set_title go to document manager, cursor_move/presence_update go to presence tracker). Results are broadcast to other connected clients. Error responses include descriptive messages.10 points
Presence tracker with per-document user tracking and cursor sync: join() registers a user with a document and assigns a unique cursor color. leave() removes the user and cleans up. update_cursor() stores line and column position and resets the idle timer. check_idle() detects users exceeding the idle timeout and returns newly idle user IDs. get_active_users() returns only non-idle users. get_cursor_positions() returns a dict of active user cursors.10 points
Main entry wiring all components: CollaborativeServer initializes ConnectionManager, PresenceTracker, MessageHandler, and doc_managers dict. handle_connect() registers the websocket, creates/gets the document manager, joins presence, sends current state, and broadcasts user_joined. handle_disconnect() removes the connection, leaves presence, and broadcasts user_left. handle_message() records heartbeat, routes to MessageHandler, and increments operation counter. get_status() returns connected clients, active documents, total operations, and uptime.10 points

Checklist

0/9

Your Solution

Unlimited free attempts
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.