Database Design & SQL Mastery
NoSQL: When to Choose What
Not every problem is best solved with a relational database. This lesson covers the four major NoSQL categories, when each one is the right choice, the CAP theorem in practice, and the Redis data structures that interviewers love to ask about.
Document Stores: MongoDB
Document databases store data as flexible JSON-like documents (BSON in MongoDB). Each document can have a different structure, which is useful when schema requirements evolve frequently.
Embedding vs. Referencing
The key design decision in MongoDB is whether to embed related data inside a document or reference it with an ID.
{
"_id": "order_123",
"customer_name": "Alice Chen",
"items": [
{ "product": "Wireless Mouse", "price": 29.99, "qty": 2 },
{ "product": "USB-C Hub", "price": 49.99, "qty": 1 }
],
"total": 109.97,
"created_at": "2026-01-15T10:30:00Z"
}
| Strategy | When to Use | Trade-off |
|---|---|---|
| Embed | Data is read together, child has no independent life | Faster reads, document size limit (16MB) |
| Reference | Data is shared across documents, many-to-many relationships | Requires multiple queries, more flexible |
Best use cases for MongoDB:
- Content management systems (articles with varying metadata)
- Product catalogs with heterogeneous attributes
- User profiles with flexible fields
- Event logging with varying event schemas
Key-Value Stores: Redis and DynamoDB
Key-value stores map a unique key to a value. They offer the fastest read/write performance but with limited query capabilities.
Redis Data Structures
Redis is far more than a simple key-value cache. Its rich data structures make it a powerful tool for specific patterns that interviewers frequently ask about.
Sorted Sets -- Leaderboards:
# Add player scores to a leaderboard
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2200 "player:bob"
ZADD leaderboard 1800 "player:charlie"
# Top 3 players with scores (highest first)
ZREVRANGE leaderboard 0 2 WITHSCORES
# Result: ["player:bob", "2200", "player:charlie", "1800", "player:alice", "1500"]
# Player rank (0-indexed, highest score = rank 0)
ZREVRANK leaderboard "player:alice"
# Result: 2
HyperLogLog -- Unique Counts:
# Count unique visitors (uses ~12KB regardless of count)
PFADD daily_visitors:2026-01-15 "user:101" "user:102" "user:103"
PFADD daily_visitors:2026-01-15 "user:101" # duplicate, not counted
PFCOUNT daily_visitors:2026-01-15
# Result: 3 (approximate, 0.81% standard error)
# Merge multiple days for weekly count
PFMERGE weekly_visitors daily_visitors:2026-01-15 daily_visitors:2026-01-16
Pub/Sub -- Real-time Messaging:
# Subscriber listens on a channel
SUBSCRIBE chat:room:42
# Publisher sends a message
PUBLISH chat:room:42 "Hello everyone!"
DynamoDB: Partition Key Design
DynamoDB (AWS) is a managed key-value and document store designed for massive scale. The partition key determines how data is distributed across partitions.
Good partition key: High cardinality, even distribution (e.g., user_id, order_id)
Bad partition key: Low cardinality, hot partitions (e.g., status, country -- most writes go to a few values)
{
"PK": "USER#alice123",
"SK": "ORDER#2026-01-15#order_789",
"total": 109.97,
"status": "shipped"
}
The single-table design pattern in DynamoDB uses a composite primary key (partition key + sort key) to store multiple entity types in one table, enabling efficient access patterns with a single query.
Column-Family Stores: Cassandra
Apache Cassandra is designed for write-heavy workloads across multiple data centers. Data is organized by partition key and clustering key.
Partition key determines which node stores the data. Clustering key determines the sort order within a partition.
CREATE TABLE sensor_readings (
sensor_id UUID,
reading_time TIMESTAMP,
temperature DOUBLE,
humidity DOUBLE,
PRIMARY KEY (sensor_id, reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
Best use cases for Cassandra:
- Time-series data (IoT sensors, metrics, logs)
- Write-heavy workloads (event tracking, messaging)
- Multi-region deployments needing low-latency writes
- Data that is naturally partitioned (per user, per device)
Key design rule: In Cassandra, you design tables around your query patterns, not around entities. Each query pattern typically gets its own table.
Graph Databases: Neo4j
Graph databases model data as nodes (entities) and edges (relationships). They excel when relationships are the primary focus of your queries.
(:Person {name: "Alice"})-[:FOLLOWS]->(:Person {name: "Bob"})
(:Person {name: "Alice"})-[:PURCHASED]->(:Product {name: "Wireless Mouse"})
Best use cases:
- Social networks (friends-of-friends, mutual connections)
- Recommendation engines (people who bought X also bought Y)
- Fraud detection (circular money transfers)
- Knowledge graphs (entity relationships)
When NOT to use a graph database: Simple CRUD operations, time-series data, or when relationships between entities are not central to your query patterns.
SQL vs. NoSQL Decision Matrix
| Criteria | Choose SQL | Choose NoSQL |
|---|---|---|
| Data structure | Well-defined, relational | Flexible, evolving |
| Consistency needs | Strong consistency required (banking) | Eventual consistency acceptable |
| Query patterns | Complex joins, aggregations, ad-hoc queries | Known access patterns, key-based lookups |
| Scale model | Vertical scaling sufficient | Horizontal scaling needed |
| Transaction needs | Multi-row ACID transactions | Single-document or eventual consistency |
| Team expertise | Strong SQL skills | Experience with specific NoSQL system |
| Read/write ratio | Balanced or write-heavy with complex reads | Read-heavy with simple patterns |
Interview tip: Never say "NoSQL is better than SQL" or vice versa. Always frame your answer as: "For this specific access pattern and consistency requirement, I would choose X because..." The best answer demonstrates that you understand the trade-offs, not that you have a favorite.
CAP Theorem in Practice
The CAP theorem states that a distributed system can only guarantee two of three properties simultaneously:
- Consistency (C): Every read returns the most recent write
- Availability (A): Every request receives a response (not an error)
- Partition Tolerance (P): The system continues to operate despite network partitions
Since network partitions are unavoidable in distributed systems, the real choice is between CP (consistency during partition) and AP (availability during partition).
| Database | CAP Classification | Behavior During Partition |
|---|---|---|
| PostgreSQL (single node) | CA (not distributed) | No partition tolerance |
| Google Spanner | CP | Rejects writes if quorum unavailable |
| MongoDB (default config) | CP | Primary election, brief unavailability |
| DynamoDB | AP | Returns potentially stale data, resolves later |
| Cassandra | AP (tunable) | Accepts writes, resolves conflicts via last-write-wins |
Eventual Consistency Patterns
When using AP systems, these patterns help manage consistency:
Read-your-writes: After a user writes data, their subsequent reads are guaranteed to see that write (even if other users don't yet). Implemented by routing reads to the same node that handled the write, or by including a version token.
Monotonic reads: A user will never see older data after seeing newer data. Implemented by pinning a user's reads to a consistent replica.
Consistent prefix: If write A happened before write B, any reader who sees B will also see A. Implemented by ordering writes through a single partition or using vector clocks.
Interview tip: When discussing CAP, be precise. Saying "MongoDB is CP" without context is incomplete. Say: "MongoDB with the default write concern and read preference behaves as CP -- during a primary election, writes are briefly unavailable. However, with read preference set to
secondaryPreferredand write concernw:1, it can behave more AP, accepting reads from secondaries that may be stale." This level of nuance demonstrates deep understanding.
Congratulations -- you have completed the Database Design & SQL Mastery module. Test your knowledge with the module quiz. :::