API Design & Microservices Patterns

Microservices Architecture Patterns

5 min read

Microservices questions dominate senior backend interviews. You need to explain patterns, draw architecture diagrams, and reason about trade-offs. This lesson covers the patterns interviewers ask about most.

Monolith to Microservices: The Strangler Fig Pattern

Never rewrite a monolith from scratch. The Strangler Fig pattern lets you incrementally extract services while the monolith keeps running.

                    ┌─────────────┐
                    │   API       │
                    │   Gateway   │
                    └──────┬──────┘
              ┌────────────┼────────────┐
              │            │            │
              v            v            v
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │  New      │ │  New      │ │ Monolith │
        │  Auth     │ │  Payment  │ │ (still   │
        │  Service  │ │  Service  │ │ handles  │
        └──────────┘ └──────────┘ │ orders,  │
                                   │ users,   │
                                   │ etc.)    │
                                   └──────────┘

How it works:

  1. Put an API Gateway in front of the monolith
  2. Identify a bounded context to extract (start with the least coupled)
  3. Build the new service alongside the monolith
  4. Route traffic to the new service via the gateway
  5. Remove the old code from the monolith once the new service is stable
  6. Repeat for the next bounded context

Service Boundaries: Bounded Contexts from DDD

Each microservice should own a single bounded context — a cohesive domain with its own data, rules, and language.

┌──────────────────────────────────────────────────────────┐
│                    E-Commerce Platform                    │
├──────────────┬──────────────┬──────────────┬─────────────┤
│   Order      │   Payment    │  Inventory   │  Shipping   │
│   Context    │   Context    │  Context     │  Context    │
│              │              │              │             │
│  - Order     │  - Payment   │  - Product   │  - Shipment │
│  - OrderItem │  - Refund    │  - Stock     │  - Tracking │
│  - Cart      │  - Invoice   │  - Warehouse │  - Carrier  │
│              │              │              │             │
│  Own DB      │  Own DB      │  Own DB      │  Own DB     │
└──────────────┴──────────────┴──────────────┴─────────────┘

Key principle: Database per service. Each service owns its data. No shared databases. Communication happens through APIs or events, never through direct database access.

Inter-Service Communication

Synchronous vs Asynchronous

Aspect Synchronous (REST/gRPC) Asynchronous (Message Queue)
Coupling Temporal coupling (caller waits) Decoupled (fire and forget)
Latency Higher for chains (A -> B -> C) Lower perceived latency
Availability Cascading failures if downstream is down Resilient, messages wait in queue
Debugging Easier (request/response trace) Harder (trace across queues)
Data consistency Immediate (within request) Eventual consistency
Best for Reads, queries needing immediate response Writes, events, long-running tasks

Asynchronous Messaging

┌─────────┐     ┌─────────────────┐     ┌───────────┐
│  Order   │────>│  Message Broker │────>│  Payment  │
│  Service │     │  (Kafka/SQS/    │     │  Service  │
└─────────┘     │   RabbitMQ)     │     └───────────┘
                └────────┬────────┘
                         ├────────────>┌───────────┐
                         │             │ Inventory  │
                         │             │ Service    │
                         │             └───────────┘
                         └────────────>┌───────────┐
                                       │ Notification│
                                       │ Service    │
                                       └───────────┘
Broker Strengths Best For
Apache Kafka High throughput, persistent log, replay Event sourcing, streaming, audit logs
RabbitMQ Flexible routing, priority queues, low latency Task queues, RPC patterns
AWS SQS Fully managed, auto-scaling, dead letter queues Serverless architectures, AWS-native

Saga Pattern: Distributed Transactions

In microservices, you cannot use a single database transaction across services. The Saga pattern coordinates a sequence of local transactions, with compensating actions if any step fails.

E-Commerce Order Flow Example

Happy Path:
  Order Service      Payment Service     Inventory Service    Shipping Service
       │                    │                    │                    │
  1. Create Order ──────────>                    │                    │
       │              2. Charge Card              │                    │
       │              ────────────────>           │                    │
       │                    │           3. Reserve Stock              │
       │                    │           ─────────────────>            │
       │                    │                    │          4. Create Shipment
       │                    │                    │          ────────────────>
       │                    │                    │                    │
       ◄────────────────────────────── Success ──────────────────────┘

Failure at Step 3 (out of stock):
  Order Service      Payment Service     Inventory Service
       │                    │                    │
  1. Create Order ──────────>                    │
       │              2. Charge Card              │
       │              ────────────────>           │
       │                    │           3. Reserve Stock ──> FAILS!
       │                    │                    │
       │              4. COMPENSATE:              │
       │                 Refund Card              │
       │              <──────────────             │
  5. COMPENSATE:            │                    │
     Cancel Order           │                    │
       │                    │                    │

Orchestration vs Choreography

Orchestration: A central coordinator (saga orchestrator) tells each service what to do.

                    ┌──────────────────┐
                    │  Saga            │
                    │  Orchestrator    │
                    └───────┬──────────┘
              ┌─────────────┼─────────────┐
              │             │             │
              v             v             v
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │  Payment │ │ Inventory│ │ Shipping │
        │  Service │ │  Service │ │  Service │
        └──────────┘ └──────────┘ └──────────┘

Choreography: Each service reacts to events and publishes its own events. No central coordinator.

  Order Created ──> Payment Service ──> Payment Completed ──> Inventory Service
                                                             Stock Reserved
                                                                     v
                                                             Shipping Service
Aspect Orchestration Choreography
Coordination Central orchestrator Distributed (each service listens)
Coupling Services coupled to orchestrator Services coupled to events
Visibility Easy to see full flow in one place Flow distributed across services
Complexity Orchestrator can become complex Harder to track full saga
Failure handling Orchestrator manages compensations Each service handles its own
Best for Complex multi-step workflows Simple flows, high autonomy
Risk Single point of failure Circular dependencies, event storms

Interview answer: "For a payment flow with 4+ steps and strict ordering, I'd use orchestration — the centralized view makes it easier to handle compensating transactions and debug failures. For simpler event-driven flows like sending notifications after an order, choreography works well because it keeps services autonomous."

Circuit Breaker Pattern

When a downstream service is failing, the Circuit Breaker stops calling it to prevent cascading failures. It has three states:

                          ┌──────────┐
                          │  CLOSED  │  (normal operation)
                          │          │
                          │ Requests │
                          │ pass     │
                          │ through  │
                          └────┬─────┘
                    Failure threshold reached
                               v
                          ┌──────────┐
                          │   OPEN   │  (all requests fail fast)
                          │          │
                          │ Returns  │
                          │ fallback │
                          │ or error │
                          └────┬─────┘
                      Timeout expires
                               v
                          ┌──────────┐
                          │HALF-OPEN │  (test with limited requests)
                          │          │
                          │ Allows   │
                          │ few test │
                          │ requests │
                          └────┬─────┘
                    ┌──────────┴──────────┐
                    │                     │
              Tests pass            Tests fail
                    │                     │
                    v                     v
               ┌──────────┐         ┌──────────┐
               │  CLOSED  │         │   OPEN   │
               └──────────┘         └──────────┘
import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold: int = 5, timeout: float = 30.0, half_open_max: int = 3):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.half_open_max = half_open_max

        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.last_failure_time = 0
        self.half_open_calls = 0

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
                self.half_open_calls = 0
            else:
                raise CircuitOpenError("Circuit is open — request blocked")

        if self.state == CircuitState.HALF_OPEN:
            if self.half_open_calls >= self.half_open_max:
                raise CircuitOpenError("Half-open limit reached")
            self.half_open_calls += 1

        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise e

    def _on_success(self):
        if self.state == CircuitState.HALF_OPEN:
            self.state = CircuitState.CLOSED
        self.failure_count = 0

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

API Gateway

The API Gateway is the single entry point for all client requests. It handles cross-cutting concerns so individual services do not have to.

                          ┌──────────────────────┐
    Mobile App ──────────>│                      │──> User Service
    Web App ─────────────>│     API Gateway      │──> Order Service
    Third-party ─────────>│                      │──> Payment Service
                          │  - Routing           │──> Inventory Service
                          │  - Authentication    │
                          │  - Rate Limiting     │
                          │  - Request Aggregation│
                          │  - SSL Termination   │
                          │  - Load Balancing    │
                          └──────────────────────┘

Popular choices: Kong (plugin-based), Envoy (high-performance proxy), AWS API Gateway (serverless), NGINX (lightweight).

Service Mesh

For complex microservice deployments, a service mesh adds observability, security, and traffic management without changing application code. It uses a sidecar proxy pattern.

  ┌─────────────────────┐     ┌─────────────────────┐
  │  Pod A              │     │  Pod B              │
  │  ┌───────────────┐  │     │  ┌───────────────┐  │
  │  │  Order Service │  │     │  │ Payment Service│  │
  │  └───────┬───────┘  │     │  └───────┬───────┘  │
  │          │          │     │          │          │
  │  ┌───────▼───────┐  │     │  ┌───────▼───────┐  │
  │  │  Envoy Proxy  │◄─┼─mTLS┼─►│  Envoy Proxy  │  │
  │  │  (sidecar)    │  │     │  │  (sidecar)    │  │
  │  └───────────────┘  │     │  └───────────────┘  │
  └─────────────────────┘     └─────────────────────┘
            │                           │
            └───────────┬───────────────┘
                ┌───────▼────────┐
                │  Control Plane │
                │  (Istio/Linkerd)│
                │  - mTLS certs  │
                │  - Traffic rules│
                │  - Observability│
                └────────────────┘

What a service mesh provides:

  • mTLS: Automatic encryption between all services — zero-trust networking
  • Traffic management: Canary deployments, A/B testing, fault injection
  • Observability: Distributed tracing, metrics, access logs — without code changes
  • Retries and timeouts: Configurable retry policies at the proxy level

CQRS (Command Query Responsibility Segregation)

Separate the read and write models of your application. The write side optimizes for consistency, while the read side optimizes for query performance.

                    ┌──────────────────────────────────────┐
                    │           API Layer                   │
                    └────────────┬─────────────────────────┘
                    ┌────────────┴────────────┐
                    │                         │
              Commands (writes)         Queries (reads)
                    │                         │
                    v                         v
            ┌──────────────┐         ┌──────────────┐
            │ Write Model  │         │  Read Model  │
            │ (normalized, │   Sync  │ (denormalized│
            │  consistent) │ ──────> │  fast reads) │
            │              │ events  │              │
            │  PostgreSQL  │         │ Elasticsearch│
            │              │         │  or Redis    │
            └──────────────┘         └──────────────┘

When to use CQRS:

  • Read-heavy workloads (100:1 read/write ratio)
  • Complex queries that would slow down the write database
  • Different scaling needs for reads vs writes
  • Need for different data representations (e.g., search index)

When NOT to use CQRS:

  • Simple CRUD applications
  • Low traffic where a single model suffices
  • Teams unfamiliar with eventual consistency

Event Sourcing

Instead of storing the current state, store a sequence of events that led to that state. You can rebuild any state by replaying events.

Traditional (state-based):
  Account: { id: "acc-1", balance: 150 }

Event Sourcing (event-based):
  Event 1: AccountCreated  { id: "acc-1", owner: "Alice" }
  Event 2: MoneyDeposited  { amount: 200 }
  Event 3: MoneyWithdrawn  { amount: 50 }
  ─────────────────────────────────────────
  Current state: balance = 0 + 200 - 50 = 150
// Event sourcing with an event store
interface DomainEvent {
  eventId: string;
  aggregateId: string;
  eventType: string;
  payload: Record<string, unknown>;
  timestamp: string;
  version: number;
}

// Rebuild account state from events
function rebuildAccount(events: DomainEvent[]): Account {
  return events.reduce((account, event) => {
    switch (event.eventType) {
      case 'AccountCreated':
        return { id: event.aggregateId, balance: 0, owner: event.payload.owner as string };
      case 'MoneyDeposited':
        return { ...account, balance: account.balance + (event.payload.amount as number) };
      case 'MoneyWithdrawn':
        return { ...account, balance: account.balance - (event.payload.amount as number) };
      default:
        return account;
    }
  }, {} as Account);
}

Benefits of Event Sourcing:

  • Complete audit trail — every change is recorded
  • Time travel — rebuild state at any point in time
  • Event replay — fix bugs by replaying events through corrected logic
  • Natural fit with CQRS — events update the read model

Challenges:

  • Eventual consistency between event store and read model
  • Event schema evolution (versioning events)
  • Storage growth — need snapshotting for aggregates with many events

Interview tip: CQRS and Event Sourcing are often mentioned together but are independent patterns. You can use CQRS without Event Sourcing (sync read models from the write DB) and Event Sourcing without CQRS (single model rebuilt from events).

This completes the API Design & Microservices module. Test your knowledge with the module quiz, then apply these patterns in the Payment API Design lab. :::

Quiz

Module 3 Quiz: API Design & Microservices Patterns

Take Quiz