Database Design & SQL Mastery

NoSQL: When to Choose What

4 min read

Not every problem is best solved with a relational database. This lesson covers the four major NoSQL categories, when each one is the right choice, the CAP theorem in practice, and the Redis data structures that interviewers love to ask about.

Document Stores: MongoDB

Document databases store data as flexible JSON-like documents (BSON in MongoDB). Each document can have a different structure, which is useful when schema requirements evolve frequently.

Embedding vs. Referencing

The key design decision in MongoDB is whether to embed related data inside a document or reference it with an ID.

{
  "_id": "order_123",
  "customer_name": "Alice Chen",
  "items": [
    { "product": "Wireless Mouse", "price": 29.99, "qty": 2 },
    { "product": "USB-C Hub", "price": 49.99, "qty": 1 }
  ],
  "total": 109.97,
  "created_at": "2026-01-15T10:30:00Z"
}
Strategy When to Use Trade-off
Embed Data is read together, child has no independent life Faster reads, document size limit (16MB)
Reference Data is shared across documents, many-to-many relationships Requires multiple queries, more flexible

Best use cases for MongoDB:

  • Content management systems (articles with varying metadata)
  • Product catalogs with heterogeneous attributes
  • User profiles with flexible fields
  • Event logging with varying event schemas

Key-Value Stores: Redis and DynamoDB

Key-value stores map a unique key to a value. They offer the fastest read/write performance but with limited query capabilities.

Redis Data Structures

Redis is far more than a simple key-value cache. Its rich data structures make it a powerful tool for specific patterns that interviewers frequently ask about.

Sorted Sets -- Leaderboards:

# Add player scores to a leaderboard
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2200 "player:bob"
ZADD leaderboard 1800 "player:charlie"

# Top 3 players with scores (highest first)
ZREVRANGE leaderboard 0 2 WITHSCORES
# Result: ["player:bob", "2200", "player:charlie", "1800", "player:alice", "1500"]

# Player rank (0-indexed, highest score = rank 0)
ZREVRANK leaderboard "player:alice"
# Result: 2

HyperLogLog -- Unique Counts:

# Count unique visitors (uses ~12KB regardless of count)
PFADD daily_visitors:2026-01-15 "user:101" "user:102" "user:103"
PFADD daily_visitors:2026-01-15 "user:101"  # duplicate, not counted

PFCOUNT daily_visitors:2026-01-15
# Result: 3 (approximate, 0.81% standard error)

# Merge multiple days for weekly count
PFMERGE weekly_visitors daily_visitors:2026-01-15 daily_visitors:2026-01-16

Pub/Sub -- Real-time Messaging:

# Subscriber listens on a channel
SUBSCRIBE chat:room:42

# Publisher sends a message
PUBLISH chat:room:42 "Hello everyone!"

DynamoDB: Partition Key Design

DynamoDB (AWS) is a managed key-value and document store designed for massive scale. The partition key determines how data is distributed across partitions.

Good partition key: High cardinality, even distribution (e.g., user_id, order_id)

Bad partition key: Low cardinality, hot partitions (e.g., status, country -- most writes go to a few values)

{
  "PK": "USER#alice123",
  "SK": "ORDER#2026-01-15#order_789",
  "total": 109.97,
  "status": "shipped"
}

The single-table design pattern in DynamoDB uses a composite primary key (partition key + sort key) to store multiple entity types in one table, enabling efficient access patterns with a single query.

Column-Family Stores: Cassandra

Apache Cassandra is designed for write-heavy workloads across multiple data centers. Data is organized by partition key and clustering key.

Partition key determines which node stores the data. Clustering key determines the sort order within a partition.

CREATE TABLE sensor_readings (
    sensor_id UUID,
    reading_time TIMESTAMP,
    temperature DOUBLE,
    humidity DOUBLE,
    PRIMARY KEY (sensor_id, reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);

Best use cases for Cassandra:

  • Time-series data (IoT sensors, metrics, logs)
  • Write-heavy workloads (event tracking, messaging)
  • Multi-region deployments needing low-latency writes
  • Data that is naturally partitioned (per user, per device)

Key design rule: In Cassandra, you design tables around your query patterns, not around entities. Each query pattern typically gets its own table.

Graph Databases: Neo4j

Graph databases model data as nodes (entities) and edges (relationships). They excel when relationships are the primary focus of your queries.

(:Person {name: "Alice"})-[:FOLLOWS]->(:Person {name: "Bob"})
(:Person {name: "Alice"})-[:PURCHASED]->(:Product {name: "Wireless Mouse"})

Best use cases:

  • Social networks (friends-of-friends, mutual connections)
  • Recommendation engines (people who bought X also bought Y)
  • Fraud detection (circular money transfers)
  • Knowledge graphs (entity relationships)

When NOT to use a graph database: Simple CRUD operations, time-series data, or when relationships between entities are not central to your query patterns.

SQL vs. NoSQL Decision Matrix

Criteria Choose SQL Choose NoSQL
Data structure Well-defined, relational Flexible, evolving
Consistency needs Strong consistency required (banking) Eventual consistency acceptable
Query patterns Complex joins, aggregations, ad-hoc queries Known access patterns, key-based lookups
Scale model Vertical scaling sufficient Horizontal scaling needed
Transaction needs Multi-row ACID transactions Single-document or eventual consistency
Team expertise Strong SQL skills Experience with specific NoSQL system
Read/write ratio Balanced or write-heavy with complex reads Read-heavy with simple patterns

Interview tip: Never say "NoSQL is better than SQL" or vice versa. Always frame your answer as: "For this specific access pattern and consistency requirement, I would choose X because..." The best answer demonstrates that you understand the trade-offs, not that you have a favorite.

CAP Theorem in Practice

The CAP theorem states that a distributed system can only guarantee two of three properties simultaneously:

  • Consistency (C): Every read returns the most recent write
  • Availability (A): Every request receives a response (not an error)
  • Partition Tolerance (P): The system continues to operate despite network partitions

Since network partitions are unavoidable in distributed systems, the real choice is between CP (consistency during partition) and AP (availability during partition).

Database CAP Classification Behavior During Partition
PostgreSQL (single node) CA (not distributed) No partition tolerance
Google Spanner CP Rejects writes if quorum unavailable
MongoDB (default config) CP Primary election, brief unavailability
DynamoDB AP Returns potentially stale data, resolves later
Cassandra AP (tunable) Accepts writes, resolves conflicts via last-write-wins

Eventual Consistency Patterns

When using AP systems, these patterns help manage consistency:

Read-your-writes: After a user writes data, their subsequent reads are guaranteed to see that write (even if other users don't yet). Implemented by routing reads to the same node that handled the write, or by including a version token.

Monotonic reads: A user will never see older data after seeing newer data. Implemented by pinning a user's reads to a consistent replica.

Consistent prefix: If write A happened before write B, any reader who sees B will also see A. Implemented by ordering writes through a single partition or using vector clocks.

Interview tip: When discussing CAP, be precise. Saying "MongoDB is CP" without context is incomplete. Say: "MongoDB with the default write concern and read preference behaves as CP -- during a primary election, writes are briefly unavailable. However, with read preference set to secondaryPreferred and write concern w:1, it can behave more AP, accepting reads from secondaries that may be stale." This level of nuance demonstrates deep understanding.

Congratulations -- you have completed the Database Design & SQL Mastery module. Test your knowledge with the module quiz. :::

Quiz

Module 2 Quiz: Database Design & SQL Mastery

Take Quiz