Deployment Strategies: Blue-Green, Canary, and Rolling

Deployment strategy questions test your understanding of production reliability. Let's master each approach.

Deployment Strategy Comparison

Strategy	Risk	Rollback Speed	Resource Cost	Complexity
Rolling	Medium	Slow	Low	Low
Blue-Green	Low	Instant	High (2x)	Medium
Canary	Lowest	Fast	Medium	High
Shadow	Lowest	N/A	High	Highest

Rolling Deployment

Updates instances incrementally:

Time 0: [v1] [v1] [v1] [v1]
Time 1: [v2] [v1] [v1] [v1]
Time 2: [v2] [v2] [v1] [v1]
Time 3: [v2] [v2] [v2] [v1]
Time 4: [v2] [v2] [v2] [v2]

Kubernetes Rolling Update

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Max extra pods during update
      maxUnavailable: 1  # Max pods unavailable during update
  selector:
    matchLabels:
      app: web
  template:
    spec:
      containers:
      - name: web
        image: app:v2
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

Pros: Simple, low resource overhead Cons: Slower rollback, mixed versions during deployment

Blue-Green Deployment

Maintain two identical environments:

              ┌─────────────┐
              │  Load       │
              │  Balancer   │
              └──────┬──────┘
                     │
         ┌───────────┴───────────┐
         │                       │
    ┌────▼────┐             ┌────▼────┐
    │  Blue   │             │  Green  │
    │  (v1)   │             │  (v2)   │
    │ ACTIVE  │             │ STANDBY │
    └─────────┘             └─────────┘

AWS Blue-Green with ALB

# Target groups for blue and green
resource "aws_lb_target_group" "blue" {
  name     = "blue-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc_id
}

resource "aws_lb_target_group" "green" {
  name     = "green-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc_id
}

# Listener rule - switch between blue/green
resource "aws_lb_listener_rule" "main" {
  listener_arn = aws_lb_listener.main.arn

  action {
    type             = "forward"
    target_group_arn = var.active_color == "blue" ? aws_lb_target_group.blue.arn : aws_lb_target_group.green.arn
  }

  condition {
    path_pattern {
      values = ["/*"]
    }
  }
}

Pros: Instant rollback, zero-downtime deployment Cons: Double infrastructure cost, database migrations complex

Canary Deployment

Gradually shift traffic to new version:

Stage 1:  [v1: 95%] ────► [v2: 5%]   # Test with 5%
Stage 2:  [v1: 80%] ────► [v2: 20%]  # Increase if healthy
Stage 3:  [v1: 50%] ────► [v2: 50%]  # Half-half
Stage 4:  [v1: 0%]  ────► [v2: 100%] # Full rollout

Kubernetes Canary with Istio

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: web-app
spec:
  hosts:
  - web-app
  http:
  - route:
    - destination:
        host: web-app
        subset: stable
      weight: 90
    - destination:
        host: web-app
        subset: canary
      weight: 10

---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: web-app
spec:
  host: web-app
  subsets:
  - name: stable
    labels:
      version: v1
  - name: canary
    labels:
      version: v2

Canary Analysis

# Argo Rollouts canary with analysis
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: web-app
spec:
  strategy:
    canary:
      steps:
      - setWeight: 5
      - pause: {duration: 5m}
      - analysis:
          templates:
          - templateName: success-rate
      - setWeight: 25
      - pause: {duration: 5m}
      - setWeight: 50
      - pause: {duration: 5m}
      - setWeight: 100

---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
  - name: success-rate
    interval: 1m
    successCondition: result[0] >= 0.95
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          sum(rate(http_requests_total{status=~"2.."}[5m])) /
          sum(rate(http_requests_total[5m]))

Pros: Lowest risk, data-driven decisions Cons: Complex setup, requires good observability

Interview Questions

Q: "Your canary shows 2% error rate vs 0.5% for stable. What do you do?"

Answer:

Don't panic - collect more data first
Check if statistically significant - 5% traffic may have noise
Examine error types - are they new errors or existing?
Check metrics - latency, CPU, memory of canary pods
If confirmed bad - automatic rollback or manual
Root cause - investigate before next attempt

Q: "How do you handle database migrations in blue-green?"