ML Deployment Strategies

Deployment strategies for ML go beyond standard software deployments. Interviewers test your knowledge of canary, blue-green, shadow, and A/B testing for models.

Deployment Strategy Comparison

Strategy	Risk Level	Use Case	Rollback Time
Blue-Green	Low	Full cutover, quick rollback	Seconds
Canary	Low	Gradual rollout with metrics	Minutes
Shadow	Very Low	Testing without user impact	N/A
A/B Testing	Medium	Business metric comparison	Hours

Interview Question: Choose a Deployment Strategy

Question: "You're deploying a new recommendation model. Which strategy would you use and why?"

Framework Answer:

def choose_deployment_strategy(context):
    # Shadow deployment first for all new models
    if context["model_is_new"]:
        return {
            "phase_1": "Shadow deployment (1 week)",
            "phase_2": "Canary (5% → 25% → 50% → 100%)",
            "rationale": "Shadow validates accuracy without risk, canary validates scale"
        }

    # Canary for iterative improvements
    if context["incremental_improvement"]:
        return {
            "strategy": "Canary",
            "rollout": "10% → 25% → 50% → 100%",
            "rationale": "Known model family, just need to validate improvement"
        }

    # A/B test for business decisions
    if context["need_business_metrics"]:
        return {
            "strategy": "A/B Test",
            "duration": "2-4 weeks",
            "rationale": "Need statistical significance on revenue/engagement"
        }

    return "Canary with auto-rollback"

Canary Deployment Implementation

# Kubernetes canary with Istio
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: model-serving
spec:
  hosts:
    - model-serving
  http:
    # Canary: 10% traffic
    - match:
        - headers:
            x-canary-override:
              exact: "true"
      route:
        - destination:
            host: model-serving-canary
            port:
              number: 8080
    - route:
        - destination:
            host: model-serving-stable
            port:
              number: 8080
          weight: 90
        - destination:
            host: model-serving-canary
            port:
              number: 8080
          weight: 10

Canary Promotion Script:

def promote_canary(model_name: str, current_weight: int, target_weight: int):
    """Gradually increase canary traffic"""

    # Get current metrics
    canary_metrics = get_canary_metrics(model_name)
    stable_metrics = get_stable_metrics(model_name)

    # Validate canary health
    checks = {
        "error_rate": canary_metrics["error_rate"] <= stable_metrics["error_rate"] * 1.1,
        "latency_p99": canary_metrics["latency_p99"] <= stable_metrics["latency_p99"] * 1.2,
        "accuracy": canary_metrics["accuracy"] >= stable_metrics["accuracy"] * 0.95
    }

    if all(checks.values()):
        update_traffic_weight(model_name, target_weight)
        log.info(f"Promoted {model_name} canary to {target_weight}%")
    else:
        failed_checks = [k for k, v in checks.items() if not v]
        log.error(f"Canary promotion blocked: {failed_checks}")
        trigger_rollback(model_name)

Shadow Deployment

# Shadow deployment: Run new model in parallel without affecting users
async def predict_with_shadow(request):
    # Primary model (serves response)
    primary_response = await primary_model.predict(request.features)

    # Shadow model (async, no impact on response)
    asyncio.create_task(
        shadow_prediction(request.features, primary_response)
    )

    return primary_response

async def shadow_prediction(features, primary_response):
    """Compare shadow model to production"""
    try:
        shadow_response = await shadow_model.predict(features)

        # Log comparison (don't block)
        comparison = {
            "primary_prediction": primary_response,
            "shadow_prediction": shadow_response,
            "match": primary_response == shadow_response,
            "timestamp": datetime.utcnow()
        }

        await metrics.log_shadow_comparison(comparison)

    except Exception as e:
        # Shadow errors never affect production
        log.warning(f"Shadow model error: {e}")

Rollback Strategies

# Automated rollback criteria
rollback_triggers:
  immediate:
    - error_rate > 5%
    - latency_p99 > 2x baseline
    - model returns null predictions

  gradual:
    - error_rate > 2% for 5 minutes
    - accuracy drop > 5%
    - prediction distribution shift > 20%

rollback_procedure:
  1. Halt canary traffic (weight = 0)
  2. Scale up stable deployment
  3. Alert on-call engineer
  4. Create incident ticket
  5. Preserve canary logs for analysis

Interview Talking Points

key_points:
  shadow_first: "Always shadow deploy new model families to catch issues before any user impact"

  canary_metrics: "We watch error rate, latency, and model-specific metrics like prediction confidence distribution"

  auto_rollback: "Automated rollback on error rate spike, but manual intervention for accuracy issues since they need investigation"

  a_b_testing: "Reserve A/B tests for business metrics - they require statistical significance which takes 2-4 weeks"

Expert Insight: "The key difference from traditional software: ML models can silently degrade. A model returning wrong predictions with 200 OK is worse than an error - that's why shadow deployment is so valuable."

Next, we'll cover GitOps and infrastructure as code for ML. :::