Distributed Systems & Reliability

Concurrency & Multithreading

5 min read

Concurrency bugs are among the hardest to reproduce and debug. Interviewers love them because they reveal whether you truly understand shared state, coordination, and the guarantees your language runtime provides. This lesson covers the three categories of concurrency problems, locking strategies, and language-specific patterns for Go, Java, and Python.

Three Categories of Concurrency Problems

Category Problem Example
Correctness Shared state corruption Two threads incrementing a counter — final value is wrong
Coordination Hand-off and waiting Producer creates work faster than consumer processes it
Scarcity Resource limits Database connection pool exhausted, new requests time out

Every concurrency interview question falls into one (or more) of these categories. Identifying the category immediately narrows your solution space.

Mutexes and Locks

A mutex (mutual exclusion) ensures only one thread accesses a critical section at a time.

// Go: sync.Mutex
var mu sync.Mutex
var balance int

func Deposit(amount int) {
    mu.Lock()
    defer mu.Unlock()  // always unlock, even if panic occurs
    balance += amount
}

func Balance() int {
    mu.Lock()
    defer mu.Unlock()
    return balance
}
// Java: ReentrantLock (preferred over synchronized for tryLock, fairness)
private final ReentrantLock lock = new ReentrantLock();
private int balance = 0;

public void deposit(int amount) {
    lock.lock();
    try {
        balance += amount;
    } finally {
        lock.unlock();  // always in finally block
    }
}
# Python: threading.Lock
import threading

lock = threading.Lock()
balance = 0

def deposit(amount: int) -> None:
    with lock:  # context manager handles acquire/release
        global balance
        balance += amount

Read-Write Locks

When reads vastly outnumber writes, a read-write lock allows concurrent reads but exclusive writes:

// Go: sync.RWMutex
var rwmu sync.RWMutex
var config map[string]string

func GetConfig(key string) string {
    rwmu.RLock()         // multiple goroutines can hold RLock
    defer rwmu.RUnlock()
    return config[key]
}

func SetConfig(key, value string) {
    rwmu.Lock()          // exclusive — blocks all readers and writers
    defer rwmu.Unlock()
    config[key] = value
}

When to use: Read-write locks help when the read-to-write ratio is 10:1 or higher. Below that, the overhead of managing two lock types is not worth it — use a simple mutex.

Race Condition Patterns

Check-Then-Act (TOCTOU)

The Time-Of-Check-To-Time-Of-Use bug: you check a condition, then act on it, but the condition changes between check and act.

# BROKEN: race condition between check and act
def withdraw(amount: int) -> bool:
    if balance >= amount:    # Thread A checks: balance=100, amount=80
        # Thread B runs withdraw(50) here — balance becomes 50
        balance -= amount    # Thread A subtracts 80 from 50 = -30!
        return True
    return False

# FIX: hold lock across both check and act
def withdraw(amount: int) -> bool:
    with lock:
        if balance >= amount:
            balance -= amount
            return True
        return False

Read-Modify-Write

Multiple threads read a value, modify it, then write it back — lost updates.

// BROKEN: counter++ is actually three operations (read, increment, write)
private int counter = 0;
public void increment() {
    counter++;  // NOT atomic! Thread A reads 5, Thread B reads 5,
                // both write 6 — one increment lost
}

// FIX: use AtomicInteger with CAS (Compare-And-Swap)
private final AtomicInteger counter = new AtomicInteger(0);
public void increment() {
    counter.incrementAndGet();  // atomic CAS operation in a loop
}

Compare-And-Swap (CAS)

CAS is the hardware primitive behind lock-free programming:

CAS(memory_location, expected_value, new_value):
    atomically:
        if *memory_location == expected_value:
            *memory_location = new_value
            return true
        else:
            return false  // another thread modified it — retry

ABA Problem: Thread reads value A, another thread changes it to B then back to A, and the first thread's CAS succeeds even though the value was modified. Solution: tagged pointers (append a version counter to the value).

Deadlock

A deadlock occurs when two or more threads are waiting for each other to release locks, and none can proceed.

Four Necessary Conditions (Coffman, 1971)

All four must hold simultaneously for deadlock to occur:

  1. Mutual exclusion: Resources cannot be shared
  2. Hold and wait: A thread holds one lock while waiting for another
  3. No preemption: Locks cannot be forcibly taken from a thread
  4. Circular wait: Thread A waits for B, B waits for A
  The Dining Philosophers Problem:

  Philosopher 0       Philosopher 1
       │                    │
       ▼                    ▼
    [Fork 0] ◄──────── [Fork 1]
       │                    │
       │    Philosopher 2   │
       │         │          │
       └────► [Fork 2] ◄───┘

  Each philosopher picks up left fork, then tries right fork.
  If all pick up left fork simultaneously → deadlock.
  Nobody can get their right fork because their neighbor holds it.

Prevention Strategies

Strategy Break Which Condition Implementation
Lock ordering Circular wait Always acquire locks in a consistent global order (e.g., by resource ID)
Timeout Hold and wait Use tryLock(timeout) — release all locks if timeout expires
Try-lock Hold and wait tryLock() returns false if lock unavailable — back off and retry
Single lock Hold and wait Use one coarse-grained lock instead of multiple fine-grained locks
// Go: lock ordering to prevent deadlock
func Transfer(from, to *Account, amount int) {
    // Always lock the account with the lower ID first
    first, second := from, to
    if from.ID > to.ID {
        first, second = to, from
    }
    first.mu.Lock()
    defer first.mu.Unlock()
    second.mu.Lock()
    defer second.mu.Unlock()

    from.Balance -= amount
    to.Balance += amount
}

Deadlock Detection

Build a wait-for graph: nodes are threads, edges point from a waiting thread to the thread holding the desired lock. A cycle in this graph means deadlock.

  Wait-for graph:

  Thread A ──waits-for──► Thread B
     ▲                       │
     │                       │
     └────waits-for──────────┘

  Cycle detected → deadlock!

Optimistic vs. Pessimistic Locking

Pessimistic Locking

Lock the resource before reading. No other transaction can modify it until you are done.

-- Pessimistic: lock the row before reading
BEGIN;
SELECT balance FROM accounts WHERE id = 42 FOR UPDATE;
-- Row is now locked — other transactions block here
UPDATE accounts SET balance = balance - 100 WHERE id = 42;
COMMIT;

Optimistic Locking

Read without locking, check for conflicts on write using a version column.

-- Optimistic: read version, check on write
SELECT balance, version FROM accounts WHERE id = 42;
-- balance=500, version=3

-- Later, attempt the update:
UPDATE accounts
SET balance = 400, version = 4
WHERE id = 42 AND version = 3;
-- If 0 rows affected → version changed → conflict! Retry.

When to Use Each

Scenario Best Choice Why
High contention (many writers on same row) Pessimistic Retries are expensive when conflicts are frequent
Low contention (rare conflicts) Optimistic No lock overhead; conflicts are rare so retries are cheap
Long transactions Pessimistic Holding a version for minutes increases conflict probability
Short, fast transactions Optimistic Minimal window for conflicts
Read-heavy workload Optimistic No locks on reads — maximum throughput

Lock-Free Data Structures

Lock-free algorithms guarantee that at least one thread makes progress in a finite number of steps (even if other threads are stalled). They use CAS operations instead of locks.

Common examples:

  • Atomic counters: AtomicInteger.incrementAndGet() in Java, atomic.AddInt64() in Go
  • Lock-free queues: Michael-Scott queue (used in Java's ConcurrentLinkedQueue)
  • Lock-free hash maps: Split into segments with independent CAS operations (ConcurrentHashMap in Java)

Language-Specific Concurrency

Go: Goroutines and Channels

Go's concurrency model is based on CSP (Communicating Sequential Processes): share memory by communicating, not by shared state.

func FanOut(urls []string) []Result {
    ch := make(chan Result, len(urls)) // buffered channel
    var wg sync.WaitGroup

    for _, url := range urls {
        wg.Add(1)
        go func(u string) {   // goroutine: lightweight (~4KB stack)
            defer wg.Done()
            ch <- fetch(u)
        }(url)
    }

    // Close channel when all goroutines complete
    go func() {
        wg.Wait()
        close(ch)
    }()

    var results []Result
    for r := range ch {   // range over channel until closed
        results = append(results, r)
    }
    return results
}

Key primitives:

  • go func(): Spawns a goroutine (~4KB stack, grows as needed, millions possible)
  • chan T: Typed channel for communication between goroutines
  • select: Multiplexes across multiple channels (like epoll for channels)
  • sync.WaitGroup: Waits for a collection of goroutines to finish
  • errgroup.Group: Like WaitGroup but propagates the first error

Java: Virtual Threads (Java 21+)

Java 21 introduced virtual threads — lightweight threads managed by the JVM, not the OS.

// Virtual threads: millions possible (unlike platform threads ~1MB each)
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    List<Future<String>> futures = urls.stream()
        .map(url -> executor.submit(() -> fetch(url)))
        .toList();

    List<String> results = new ArrayList<>();
    for (var future : futures) {
        results.add(future.get());  // blocks the virtual thread, not OS thread
    }
}

// CompletableFuture: composable async operations
CompletableFuture<String> future = CompletableFuture
    .supplyAsync(() -> fetchUser(userId))
    .thenApply(user -> enrichProfile(user))
    .thenApply(profile -> serialize(profile))
    .exceptionally(ex -> fallbackResponse(ex));

Key primitives:

  • Virtual threads: Thread.ofVirtual().start(runnable) or Executors.newVirtualThreadPerTaskExecutor()
  • CompletableFuture: Composable async with thenApply, thenCompose, allOf
  • ConcurrentHashMap: Thread-safe hash map with segment-level locking
  • synchronized vs ReentrantLock: ReentrantLock adds tryLock(), fairness, and Condition variables

Python: The GIL and Workarounds

The Global Interpreter Lock (GIL) allows only one thread to execute Python bytecode at a time. This means threads do not provide parallelism for CPU-bound work.

import asyncio
import concurrent.futures

# I/O-bound: use asyncio (cooperative multitasking, single thread)
async def fetch_all(urls: list[str]) -> list[str]:
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url) for url in urls]
        return await asyncio.gather(*tasks)  # concurrent I/O

# CPU-bound: use multiprocessing (separate processes, no GIL)
def process_all(items: list[dict]) -> list[dict]:
    with concurrent.futures.ProcessPoolExecutor() as executor:
        return list(executor.map(heavy_computation, items))

# I/O-bound: threading works (GIL is released during I/O waits)
def download_all(urls: list[str]) -> list[bytes]:
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        return list(executor.map(download, urls))
Workload Best Approach Why
I/O-bound (HTTP, DB) asyncio or threading GIL released during I/O waits
CPU-bound (computation) multiprocessing Separate processes bypass GIL
Mixed asyncio + ProcessPoolExecutor Async I/O with process pool for CPU work

Connection Pooling

Opening a database connection is expensive (TCP handshake, TLS, authentication). Connection pools maintain a set of reusable connections.

Sizing Formula

connections = (CPU_cores * 2) + effective_spindle_count
  • For SSDs, effective spindle count is typically 1
  • A 4-core server: (4 * 2) + 1 = 9 connections
  • More connections than this causes context switching overhead and contention on the database

Common mistake: Setting pool size to 100+ "for safety." This actually degrades performance because the database spends more time context-switching between connections than executing queries. PostgreSQL recommends using PgBouncer with a smaller pool.

Pool Configuration

# Typical HikariCP (Java) pool configuration
pool:
  minimum_idle: 5         # keep 5 idle connections warm
  maximum_pool_size: 10   # hard cap
  connection_timeout: 30s # wait for a free connection
  idle_timeout: 600s      # close idle connections after 10 min
  max_lifetime: 1800s     # replace connections every 30 min

Key decisions:

  • min idle: Set to your baseline load to avoid cold-start latency
  • max pool: Set using the formula above — resist the urge to increase
  • connection timeout: Fail fast (30s) rather than queue indefinitely
  • max lifetime: Rotate connections to prevent stale state and respect database-side timeouts

Next: Reliability & Observability — SLOs, circuit breakers, distributed tracing, and chaos engineering. :::

Quiz

Module 5 Quiz: Distributed Systems & Reliability

Take Quiz