Architecture Patterns & System Design

Distributed Systems Fundamentals

4 min read

Cloud architects must understand distributed systems theory to design resilient, scalable architectures. These concepts appear frequently in system design interviews.

CAP Theorem

The foundation of distributed systems decision-making.

Definition

A distributed system can only guarantee two of three properties:

  • Consistency: All nodes see the same data at the same time
  • Availability: Every request receives a response (success or failure)
  • Partition Tolerance: System continues operating despite network partitions

CAP in Practice

Database CAP Choice Trade-off
Traditional RDBMS CA Cannot handle partitions well
MongoDB CP May reject writes during partition
Cassandra AP May return stale data
DynamoDB Configurable Choose per-operation
Spanner CP (with high A) Google's global network minimizes partitions

Interview Question: CAP in Real Systems

Q: "How does AWS DynamoDB handle CAP theorem trade-offs?"

A: DynamoDB offers configurable consistency:

  • Eventually Consistent Reads (default): AP behavior, higher availability, lower latency
  • Strongly Consistent Reads: CP behavior, guaranteed latest data, higher latency
  • Transactions: Strong consistency with ACID guarantees, highest latency

Design Implication: Use eventually consistent reads for user-facing read-heavy operations, strongly consistent for critical operations (payments, inventory).

PACELC Theorem

An extension of CAP for normal operation scenarios.

Definition

If there's a Partition, choose Availability or Consistency. Else (normal operation), choose Latency or Consistency.

PACELC Classification

System Partition (A/C) Else (L/C) Behavior
DynamoDB A L Fast, eventual consistency
Cassandra A L Tunable consistency
MongoDB C C Strong consistency
Spanner C C Strong consistency, global
CockroachDB C L Serializable, low latency

Consistency Models

Eventual Consistency

  • Updates propagate asynchronously
  • Reads may return stale data temporarily
  • System converges to consistent state

Use Cases: Social media feeds, product catalogs, session data

Strong Consistency

  • Reads always return the latest write
  • Higher latency, lower throughput
  • Required for financial transactions

Use Cases: Banking, inventory management, user authentication

Causal Consistency

  • Related operations maintain order
  • Independent operations may be out of order
  • Balance between eventual and strong

Use Cases: Collaborative editing, messaging systems

Distributed Consensus

Raft Algorithm

Used by: etcd, Consul, CockroachDB

How Raft Works:

  1. Leader Election: One node becomes leader
  2. Log Replication: Leader replicates entries to followers
  3. Safety: Only leader with complete log can be elected

Paxos Algorithm

Used by: Google Spanner, Chubby

More complex than Raft, but mathematically proven safe.

Interview Question: Leader Election

Q: "How would you implement leader election for a distributed job scheduler?"

A: Options by complexity:

Approach Complexity Use Case
Cloud-native (DynamoDB locks) Low AWS-only, simple
etcd/Consul Medium Kubernetes-native
Zookeeper High Legacy, Kafka
Custom Raft Very High Research/custom needs

Recommended: Use managed services (DynamoDB, Consul) unless you have specific requirements.

Data Replication Patterns

Synchronous Replication

Client → Primary → Replica1 → Replica2 → ACK → Client
  • Strong consistency
  • Higher latency
  • Lower availability during failures

Asynchronous Replication

Client → Primary → ACK → Client
         ↓ (async)
     Replica1, Replica2
  • Lower latency
  • Eventual consistency
  • Potential data loss on primary failure

Semi-Synchronous Replication

Client → Primary → Replica1 → ACK → Client
         ↓ (async)
      Replica2
  • Balance of consistency and availability
  • At least one replica has latest data
  • Common in production databases

Sharding Strategies

Range-Based Sharding

Users A-M → Shard 1
Users N-Z → Shard 2
  • Good for range queries
  • Risk of hot spots (popular letters)

Hash-Based Sharding

hash(user_id) % num_shards → Shard
  • Even distribution
  • Range queries require scatter-gather

Consistent Hashing

hash ring with virtual nodes
  • Minimal redistribution on scale
  • Used by DynamoDB, Cassandra

Interview Question: Sharding Strategy

Q: "Design a sharding strategy for a social media platform's posts."

A: Consider access patterns:

  1. Primary access: Get posts by user (user timeline)
  2. Secondary: Get posts by time (global feed)

Strategy: Compound shard key (user_id, created_at)

  • Posts for a user are co-located
  • Time-based queries still efficient within user
  • Add secondary index for global time-based queries

Key Insight: Distributed systems design is about trade-offs. Always articulate what you're optimizing for and what you're sacrificing.

Next, we'll explore microservices architecture patterns. :::

Quiz

Module 4: Architecture Patterns & System Design

Take Quiz