CI/CD & Infrastructure as Code

Deployment Strategies: Blue-Green, Canary, and Rolling

4 min read

Deployment strategy questions test your understanding of production reliability. Let's master each approach.

Deployment Strategy Comparison

Strategy Risk Rollback Speed Resource Cost Complexity
Rolling Medium Slow Low Low
Blue-Green Low Instant High (2x) Medium
Canary Lowest Fast Medium High
Shadow Lowest N/A High Highest

Rolling Deployment

Updates instances incrementally:

Time 0: [v1] [v1] [v1] [v1]
Time 1: [v2] [v1] [v1] [v1]
Time 2: [v2] [v2] [v1] [v1]
Time 3: [v2] [v2] [v2] [v1]
Time 4: [v2] [v2] [v2] [v2]

Kubernetes Rolling Update

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Max extra pods during update
      maxUnavailable: 1  # Max pods unavailable during update
  selector:
    matchLabels:
      app: web
  template:
    spec:
      containers:
      - name: web
        image: app:v2
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

Pros: Simple, low resource overhead Cons: Slower rollback, mixed versions during deployment

Blue-Green Deployment

Maintain two identical environments:

              ┌─────────────┐
              │  Load       │
              │  Balancer   │
              └──────┬──────┘
         ┌───────────┴───────────┐
         │                       │
    ┌────▼────┐             ┌────▼────┐
    │  Blue   │             │  Green  │
    │  (v1)   │             │  (v2)   │
    │ ACTIVE  │             │ STANDBY │
    └─────────┘             └─────────┘

AWS Blue-Green with ALB

# Target groups for blue and green
resource "aws_lb_target_group" "blue" {
  name     = "blue-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc_id
}

resource "aws_lb_target_group" "green" {
  name     = "green-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc_id
}

# Listener rule - switch between blue/green
resource "aws_lb_listener_rule" "main" {
  listener_arn = aws_lb_listener.main.arn

  action {
    type             = "forward"
    target_group_arn = var.active_color == "blue" ? aws_lb_target_group.blue.arn : aws_lb_target_group.green.arn
  }

  condition {
    path_pattern {
      values = ["/*"]
    }
  }
}

Pros: Instant rollback, zero-downtime deployment Cons: Double infrastructure cost, database migrations complex

Canary Deployment

Gradually shift traffic to new version:

Stage 1:  [v1: 95%] ────► [v2: 5%]   # Test with 5%
Stage 2:  [v1: 80%] ────► [v2: 20%]  # Increase if healthy
Stage 3:  [v1: 50%] ────► [v2: 50%]  # Half-half
Stage 4:  [v1: 0%]  ────► [v2: 100%] # Full rollout

Kubernetes Canary with Istio

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: web-app
spec:
  hosts:
  - web-app
  http:
  - route:
    - destination:
        host: web-app
        subset: stable
      weight: 90
    - destination:
        host: web-app
        subset: canary
      weight: 10

---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: web-app
spec:
  host: web-app
  subsets:
  - name: stable
    labels:
      version: v1
  - name: canary
    labels:
      version: v2

Canary Analysis

# Argo Rollouts canary with analysis
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: web-app
spec:
  strategy:
    canary:
      steps:
      - setWeight: 5
      - pause: {duration: 5m}
      - analysis:
          templates:
          - templateName: success-rate
      - setWeight: 25
      - pause: {duration: 5m}
      - setWeight: 50
      - pause: {duration: 5m}
      - setWeight: 100

---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
  - name: success-rate
    interval: 1m
    successCondition: result[0] >= 0.95
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          sum(rate(http_requests_total{status=~"2.."}[5m])) /
          sum(rate(http_requests_total[5m]))

Pros: Lowest risk, data-driven decisions Cons: Complex setup, requires good observability

Interview Questions

Q: "Your canary shows 2% error rate vs 0.5% for stable. What do you do?"

Answer:

  1. Don't panic - collect more data first
  2. Check if statistically significant - 5% traffic may have noise
  3. Examine error types - are they new errors or existing?
  4. Check metrics - latency, CPU, memory of canary pods
  5. If confirmed bad - automatic rollback or manual
  6. Root cause - investigate before next attempt

Q: "How do you handle database migrations in blue-green?"

Answer:

Approach Description
Expand-Contract Add new schema alongside old, migrate, then remove old
Feature flags Deploy code that handles both schemas
Read replicas Blue reads from primary, green from replica during migration
Backward compatible Ensure v2 schema works with v1 code
-- Expand phase (works with both versions)
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT false;

-- Contract phase (after v1 is gone)
ALTER TABLE users DROP COLUMN old_column;

Q: "A rolling deployment is stuck at 50%. How do you troubleshoot?"

# Check deployment status
kubectl rollout status deployment/web-app

# Check pod status
kubectl get pods -l app=web-app
kubectl describe pod <stuck-pod>

# Check events
kubectl get events --sort-by='.lastTimestamp'

# Common issues:
# - Readiness probe failing
# - Image pull errors
# - Resource limits (pending pods)
# - PodDisruptionBudget blocking

You've mastered CI/CD and IaC. Next module: Kubernetes and container orchestration—the heart of modern infrastructure. :::

Quiz

Module 3: CI/CD & Infrastructure as Code

Take Quiz