Architecture Patterns & System Design
Distributed Systems Fundamentals
Cloud architects must understand distributed systems theory to design resilient, scalable architectures. These concepts appear frequently in system design interviews.
CAP Theorem
The foundation of distributed systems decision-making.
Definition
A distributed system can only guarantee two of three properties:
- Consistency: All nodes see the same data at the same time
- Availability: Every request receives a response (success or failure)
- Partition Tolerance: System continues operating despite network partitions
CAP in Practice
| Database | CAP Choice | Trade-off |
|---|---|---|
| Traditional RDBMS | CA | Cannot handle partitions well |
| MongoDB | CP | May reject writes during partition |
| Cassandra | AP | May return stale data |
| DynamoDB | Configurable | Choose per-operation |
| Spanner | CP (with high A) | Google's global network minimizes partitions |
Interview Question: CAP in Real Systems
Q: "How does AWS DynamoDB handle CAP theorem trade-offs?"
A: DynamoDB offers configurable consistency:
- Eventually Consistent Reads (default): AP behavior, higher availability, lower latency
- Strongly Consistent Reads: CP behavior, guaranteed latest data, higher latency
- Transactions: Strong consistency with ACID guarantees, highest latency
Design Implication: Use eventually consistent reads for user-facing read-heavy operations, strongly consistent for critical operations (payments, inventory).
PACELC Theorem
An extension of CAP for normal operation scenarios.
Definition
If there's a Partition, choose Availability or Consistency. Else (normal operation), choose Latency or Consistency.
PACELC Classification
| System | Partition (A/C) | Else (L/C) | Behavior |
|---|---|---|---|
| DynamoDB | A | L | Fast, eventual consistency |
| Cassandra | A | L | Tunable consistency |
| MongoDB | C | C | Strong consistency |
| Spanner | C | C | Strong consistency, global |
| CockroachDB | C | L | Serializable, low latency |
Consistency Models
Eventual Consistency
- Updates propagate asynchronously
- Reads may return stale data temporarily
- System converges to consistent state
Use Cases: Social media feeds, product catalogs, session data
Strong Consistency
- Reads always return the latest write
- Higher latency, lower throughput
- Required for financial transactions
Use Cases: Banking, inventory management, user authentication
Causal Consistency
- Related operations maintain order
- Independent operations may be out of order
- Balance between eventual and strong
Use Cases: Collaborative editing, messaging systems
Distributed Consensus
Raft Algorithm
Used by: etcd, Consul, CockroachDB
How Raft Works:
- Leader Election: One node becomes leader
- Log Replication: Leader replicates entries to followers
- Safety: Only leader with complete log can be elected
Paxos Algorithm
Used by: Google Spanner, Chubby
More complex than Raft, but mathematically proven safe.
Interview Question: Leader Election
Q: "How would you implement leader election for a distributed job scheduler?"
A: Options by complexity:
| Approach | Complexity | Use Case |
|---|---|---|
| Cloud-native (DynamoDB locks) | Low | AWS-only, simple |
| etcd/Consul | Medium | Kubernetes-native |
| Zookeeper | High | Legacy, Kafka |
| Custom Raft | Very High | Research/custom needs |
Recommended: Use managed services (DynamoDB, Consul) unless you have specific requirements.
Data Replication Patterns
Synchronous Replication
Client → Primary → Replica1 → Replica2 → ACK → Client
- Strong consistency
- Higher latency
- Lower availability during failures
Asynchronous Replication
Client → Primary → ACK → Client
↓ (async)
Replica1, Replica2
- Lower latency
- Eventual consistency
- Potential data loss on primary failure
Semi-Synchronous Replication
Client → Primary → Replica1 → ACK → Client
↓ (async)
Replica2
- Balance of consistency and availability
- At least one replica has latest data
- Common in production databases
Sharding Strategies
Range-Based Sharding
Users A-M → Shard 1
Users N-Z → Shard 2
- Good for range queries
- Risk of hot spots (popular letters)
Hash-Based Sharding
hash(user_id) % num_shards → Shard
- Even distribution
- Range queries require scatter-gather
Consistent Hashing
hash ring with virtual nodes
- Minimal redistribution on scale
- Used by DynamoDB, Cassandra
Interview Question: Sharding Strategy
Q: "Design a sharding strategy for a social media platform's posts."
A: Consider access patterns:
- Primary access: Get posts by user (user timeline)
- Secondary: Get posts by time (global feed)
Strategy: Compound shard key (user_id, created_at)
- Posts for a user are co-located
- Time-based queries still efficient within user
- Add secondary index for global time-based queries
Key Insight: Distributed systems design is about trade-offs. Always articulate what you're optimizing for and what you're sacrificing.
Next, we'll explore microservices architecture patterns. :::