Microservices Architecture Patterns

Microservices questions dominate senior backend interviews. You need to explain patterns, draw architecture diagrams, and reason about trade-offs. This lesson covers the patterns interviewers ask about most.

Monolith to Microservices: The Strangler Fig Pattern

Never rewrite a monolith from scratch. The Strangler Fig pattern lets you incrementally extract services while the monolith keeps running.

                    ┌─────────────┐
                    │   API       │
                    │   Gateway   │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
              v            v            v
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │  New      │ │  New      │ │ Monolith │
        │  Auth     │ │  Payment  │ │ (still   │
        │  Service  │ │  Service  │ │ handles  │
        └──────────┘ └──────────┘ │ orders,  │
                                   │ users,   │
                                   │ etc.)    │
                                   └──────────┘

How it works:

Put an API Gateway in front of the monolith
Identify a bounded context to extract (start with the least coupled)
Build the new service alongside the monolith
Route traffic to the new service via the gateway
Remove the old code from the monolith once the new service is stable
Repeat for the next bounded context

Service Boundaries: Bounded Contexts from DDD

Each microservice should own a single bounded context — a cohesive domain with its own data, rules, and language.

┌──────────────────────────────────────────────────────────┐
│                    E-Commerce Platform                    │
├──────────────┬──────────────┬──────────────┬─────────────┤
│   Order      │   Payment    │  Inventory   │  Shipping   │
│   Context    │   Context    │  Context     │  Context    │
│              │              │              │             │
│  - Order     │  - Payment   │  - Product   │  - Shipment │
│  - OrderItem │  - Refund    │  - Stock     │  - Tracking │
│  - Cart      │  - Invoice   │  - Warehouse │  - Carrier  │
│              │              │              │             │
│  Own DB      │  Own DB      │  Own DB      │  Own DB     │
└──────────────┴──────────────┴──────────────┴─────────────┘

Key principle: Database per service. Each service owns its data. No shared databases. Communication happens through APIs or events, never through direct database access.

Inter-Service Communication

Synchronous vs Asynchronous

Aspect	Synchronous (REST/gRPC)	Asynchronous (Message Queue)
Coupling	Temporal coupling (caller waits)	Decoupled (fire and forget)
Latency	Higher for chains (A -> B -> C)	Lower perceived latency
Availability	Cascading failures if downstream is down	Resilient, messages wait in queue
Debugging	Easier (request/response trace)	Harder (trace across queues)
Data consistency	Immediate (within request)	Eventual consistency
Best for	Reads, queries needing immediate response	Writes, events, long-running tasks

Asynchronous Messaging

┌─────────┐     ┌─────────────────┐     ┌───────────┐
│  Order   │────>│  Message Broker │────>│  Payment  │
│  Service │     │  (Kafka/SQS/    │     │  Service  │
└─────────┘     │   RabbitMQ)     │     └───────────┘
                └────────┬────────┘
                         │
                         ├────────────>┌───────────┐
                         │             │ Inventory  │
                         │             │ Service    │
                         │             └───────────┘
                         │
                         └────────────>┌───────────┐
                                       │ Notification│
                                       │ Service    │
                                       └───────────┘

Broker	Strengths	Best For
Apache Kafka	High throughput, persistent log, replay	Event sourcing, streaming, audit logs
RabbitMQ	Flexible routing, priority queues, low latency	Task queues, RPC patterns
AWS SQS	Fully managed, auto-scaling, dead letter queues	Serverless architectures, AWS-native

Saga Pattern: Distributed Transactions

In microservices, you cannot use a single database transaction across services. The Saga pattern coordinates a sequence of local transactions, with compensating actions if any step fails.

E-Commerce Order Flow Example

Happy Path:
  Order Service      Payment Service     Inventory Service    Shipping Service
       │                    │                    │                    │
  1. Create Order ──────────>                    │                    │
       │              2. Charge Card              │                    │
       │              ────────────────>           │                    │
       │                    │           3. Reserve Stock              │
       │                    │           ─────────────────>            │
       │                    │                    │          4. Create Shipment
       │                    │                    │          ────────────────>
       │                    │                    │                    │
       ◄────────────────────────────── Success ──────────────────────┘

Failure at Step 3 (out of stock):
  Order Service      Payment Service     Inventory Service
       │                    │                    │
  1. Create Order ──────────>                    │
       │              2. Charge Card              │
       │              ────────────────>           │
       │                    │           3. Reserve Stock ──> FAILS!
       │                    │                    │
       │              4. COMPENSATE:              │
       │                 Refund Card              │
       │              <──────────────             │
  5. COMPENSATE:            │                    │
     Cancel Order           │                    │
       │                    │                    │

Orchestration vs Choreography

Orchestration: A central coordinator (saga orchestrator) tells each service what to do.

                    ┌──────────────────┐
                    │  Saga            │
                    │  Orchestrator    │
                    └───────┬──────────┘
                            │
              ┌─────────────┼─────────────┐
              │             │             │
              v             v             v
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │  Payment │ │ Inventory│ │ Shipping │
        │  Service │ │  Service │ │  Service │
        └──────────┘ └──────────┘ └──────────┘

Choreography: Each service reacts to events and publishes its own events. No central coordinator.

  Order Created ──> Payment Service ──> Payment Completed ──> Inventory Service
                                                                     │
                                                             Stock Reserved
                                                                     │
                                                                     v
                                                             Shipping Service

Aspect	Orchestration	Choreography
Coordination	Central orchestrator	Distributed (each service listens)
Coupling	Services coupled to orchestrator	Services coupled to events
Visibility	Easy to see full flow in one place	Flow distributed across services
Complexity	Orchestrator can become complex	Harder to track full saga
Failure handling	Orchestrator manages compensations	Each service handles its own
Best for	Complex multi-step workflows	Simple flows, high autonomy
Risk	Single point of failure	Circular dependencies, event storms

Interview answer: "For a payment flow with 4+ steps and strict ordering, I'd use orchestration — the centralized view makes it easier to handle compensating transactions and debug failures. For simpler event-driven flows like sending notifications after an order, choreography works well because it keeps services autonomous."

Circuit Breaker Pattern

When a downstream service is failing, the Circuit Breaker stops calling it to prevent cascading failures. It has three states:

                          ┌──────────┐
                          │  CLOSED  │  (normal operation)
                          │          │
                          │ Requests │
                          │ pass     │
                          │ through  │
                          └────┬─────┘
                               │
                    Failure threshold reached
                               │
                               v
                          ┌──────────┐
                          │   OPEN   │  (all requests fail fast)
                          │          │
                          │ Returns  │
                          │ fallback │
                          │ or error │
                          └────┬─────┘
                               │
                      Timeout expires
                               │
                               v
                          ┌──────────┐
                          │HALF-OPEN │  (test with limited requests)
                          │          │
                          │ Allows   │
                          │ few test │
                          │ requests │
                          └────┬─────┘
                               │
                    ┌──────────┴──────────┐
                    │                     │
              Tests pass            Tests fail
                    │                     │
                    v                     v
               ┌──────────┐         ┌──────────┐
               │  CLOSED  │         │   OPEN   │
               └──────────┘         └──────────┘

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold: int = 5, timeout: float = 30.0, half_open_max: int = 3):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.half_open_max = half_open_max

        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.last_failure_time = 0
        self.half_open_calls = 0

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
                self.half_open_calls = 0
            else:
                raise CircuitOpenError("Circuit is open — request blocked")

        if self.state == CircuitState.HALF_OPEN:
            if self.half_open_calls >= self.half_open_max:
                raise CircuitOpenError("Half-open limit reached")
            self.half_open_calls += 1

        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise e

    def _on_success(self):
        if self.state == CircuitState.HALF_OPEN:
            self.state = CircuitState.CLOSED
        self.failure_count = 0

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

API Gateway

The API Gateway is the single entry point for all client requests. It handles cross-cutting concerns so individual services do not have to.

                          ┌──────────────────────┐
    Mobile App ──────────>│                      │──> User Service
    Web App ─────────────>│     API Gateway      │──> Order Service
    Third-party ─────────>│                      │──> Payment Service
                          │  - Routing           │──> Inventory Service
                          │  - Authentication    │
                          │  - Rate Limiting     │
                          │  - Request Aggregation│
                          │  - SSL Termination   │
                          │  - Load Balancing    │
                          └──────────────────────┘

Popular choices: Kong (plugin-based), Envoy (high-performance proxy), AWS API Gateway (serverless), NGINX (lightweight).

Service Mesh

For complex microservice deployments, a service mesh adds observability, security, and traffic management without changing application code. It uses a sidecar proxy pattern.

  ┌─────────────────────┐     ┌─────────────────────┐
  │  Pod A              │     │  Pod B              │
  │  ┌───────────────┐  │     │  ┌───────────────┐  │
  │  │  Order Service │  │     │  │ Payment Service│  │
  │  └───────┬───────┘  │     │  └───────┬───────┘  │
  │          │          │     │          │          │
  │  ┌───────▼───────┐  │     │  ┌───────▼───────┐  │
  │  │  Envoy Proxy  │◄─┼─mTLS┼─►│  Envoy Proxy  │  │
  │  │  (sidecar)    │  │     │  │  (sidecar)    │  │
  │  └───────────────┘  │     │  └───────────────┘  │
  └─────────────────────┘     └─────────────────────┘
            │                           │
            └───────────┬───────────────┘
                        │
                ┌───────▼────────┐
                │  Control Plane │
                │  (Istio/Linkerd)│
                │  - mTLS certs  │
                │  - Traffic rules│
                │  - Observability│
                └────────────────┘

What a service mesh provides:

mTLS: Automatic encryption between all services — zero-trust networking
Traffic management: Canary deployments, A/B testing, fault injection
Observability: Distributed tracing, metrics, access logs — without code changes
Retries and timeouts: Configurable retry policies at the proxy level

CQRS (Command Query Responsibility Segregation)

Separate the read and write models of your application. The write side optimizes for consistency, while the read side optimizes for query performance.

                    ┌──────────────────────────────────────┐
                    │           API Layer                   │
                    └────────────┬─────────────────────────┘
                                 │
                    ┌────────────┴────────────┐
                    │                         │
              Commands (writes)         Queries (reads)
                    │                         │
                    v                         v
            ┌──────────────┐         ┌──────────────┐
            │ Write Model  │         │  Read Model  │
            │ (normalized, │   Sync  │ (denormalized│
            │  consistent) │ ──────> │  fast reads) │
            │              │ events  │              │
            │  PostgreSQL  │         │ Elasticsearch│
            │              │         │  or Redis    │
            └──────────────┘         └──────────────┘

When to use CQRS:

Read-heavy workloads (100:1 read/write ratio)
Complex queries that would slow down the write database
Different scaling needs for reads vs writes
Need for different data representations (e.g., search index)

When NOT to use CQRS:

Simple CRUD applications
Low traffic where a single model suffices
Teams unfamiliar with eventual consistency

Event Sourcing

Instead of storing the current state, store a sequence of events that led to that state. You can rebuild any state by replaying events.

Traditional (state-based):
  Account: { id: "acc-1", balance: 150 }

Event Sourcing (event-based):
  Event 1: AccountCreated  { id: "acc-1", owner: "Alice" }
  Event 2: MoneyDeposited  { amount: 200 }
  Event 3: MoneyWithdrawn  { amount: 50 }
  ─────────────────────────────────────────
  Current state: balance = 0 + 200 - 50 = 150

// Event sourcing with an event store
interface DomainEvent {
  eventId: string;
  aggregateId: string;
  eventType: string;
  payload: Record<string, unknown>;
  timestamp: string;
  version: number;
}

// Rebuild account state from events
function rebuildAccount(events: DomainEvent[]): Account {
  return events.reduce((account, event) => {
    switch (event.eventType) {
      case 'AccountCreated':
        return { id: event.aggregateId, balance: 0, owner: event.payload.owner as string };
      case 'MoneyDeposited':
        return { ...account, balance: account.balance + (event.payload.amount as number) };
      case 'MoneyWithdrawn':
        return { ...account, balance: account.balance - (event.payload.amount as number) };
      default:
        return account;
    }
  }, {} as Account);
}

Benefits of Event Sourcing:

Complete audit trail — every change is recorded
Time travel — rebuild state at any point in time
Event replay — fix bugs by replaying events through corrected logic
Natural fit with CQRS — events update the read model

Challenges:

Eventual consistency between event store and read model
Event schema evolution (versioning events)
Storage growth — need snapshotting for aggregates with many events

Interview tip: CQRS and Event Sourcing are often mentioned together but are independent patterns. You can use CQRS without Event Sourcing (sync read models from the write DB) and Event Sourcing without CQRS (single model rebuilt from events).

This completes the API Design & Microservices module. Test your knowledge with the module quiz, then apply these patterns in the Payment API Design lab. :::