CI/CD for ML Systems
ML Deployment Strategies
4 min read
Deployment strategies for ML go beyond standard software deployments. Interviewers test your knowledge of canary, blue-green, shadow, and A/B testing for models.
Deployment Strategy Comparison
| Strategy | Risk Level | Use Case | Rollback Time |
|---|---|---|---|
| Blue-Green | Low | Full cutover, quick rollback | Seconds |
| Canary | Low | Gradual rollout with metrics | Minutes |
| Shadow | Very Low | Testing without user impact | N/A |
| A/B Testing | Medium | Business metric comparison | Hours |
Interview Question: Choose a Deployment Strategy
Question: "You're deploying a new recommendation model. Which strategy would you use and why?"
Framework Answer:
def choose_deployment_strategy(context):
# Shadow deployment first for all new models
if context["model_is_new"]:
return {
"phase_1": "Shadow deployment (1 week)",
"phase_2": "Canary (5% → 25% → 50% → 100%)",
"rationale": "Shadow validates accuracy without risk, canary validates scale"
}
# Canary for iterative improvements
if context["incremental_improvement"]:
return {
"strategy": "Canary",
"rollout": "10% → 25% → 50% → 100%",
"rationale": "Known model family, just need to validate improvement"
}
# A/B test for business decisions
if context["need_business_metrics"]:
return {
"strategy": "A/B Test",
"duration": "2-4 weeks",
"rationale": "Need statistical significance on revenue/engagement"
}
return "Canary with auto-rollback"
Canary Deployment Implementation
# Kubernetes canary with Istio
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: model-serving
spec:
hosts:
- model-serving
http:
# Canary: 10% traffic
- match:
- headers:
x-canary-override:
exact: "true"
route:
- destination:
host: model-serving-canary
port:
number: 8080
- route:
- destination:
host: model-serving-stable
port:
number: 8080
weight: 90
- destination:
host: model-serving-canary
port:
number: 8080
weight: 10
Canary Promotion Script:
def promote_canary(model_name: str, current_weight: int, target_weight: int):
"""Gradually increase canary traffic"""
# Get current metrics
canary_metrics = get_canary_metrics(model_name)
stable_metrics = get_stable_metrics(model_name)
# Validate canary health
checks = {
"error_rate": canary_metrics["error_rate"] <= stable_metrics["error_rate"] * 1.1,
"latency_p99": canary_metrics["latency_p99"] <= stable_metrics["latency_p99"] * 1.2,
"accuracy": canary_metrics["accuracy"] >= stable_metrics["accuracy"] * 0.95
}
if all(checks.values()):
update_traffic_weight(model_name, target_weight)
log.info(f"Promoted {model_name} canary to {target_weight}%")
else:
failed_checks = [k for k, v in checks.items() if not v]
log.error(f"Canary promotion blocked: {failed_checks}")
trigger_rollback(model_name)
Shadow Deployment
# Shadow deployment: Run new model in parallel without affecting users
async def predict_with_shadow(request):
# Primary model (serves response)
primary_response = await primary_model.predict(request.features)
# Shadow model (async, no impact on response)
asyncio.create_task(
shadow_prediction(request.features, primary_response)
)
return primary_response
async def shadow_prediction(features, primary_response):
"""Compare shadow model to production"""
try:
shadow_response = await shadow_model.predict(features)
# Log comparison (don't block)
comparison = {
"primary_prediction": primary_response,
"shadow_prediction": shadow_response,
"match": primary_response == shadow_response,
"timestamp": datetime.utcnow()
}
await metrics.log_shadow_comparison(comparison)
except Exception as e:
# Shadow errors never affect production
log.warning(f"Shadow model error: {e}")
Rollback Strategies
# Automated rollback criteria
rollback_triggers:
immediate:
- error_rate > 5%
- latency_p99 > 2x baseline
- model returns null predictions
gradual:
- error_rate > 2% for 5 minutes
- accuracy drop > 5%
- prediction distribution shift > 20%
rollback_procedure:
1. Halt canary traffic (weight = 0)
2. Scale up stable deployment
3. Alert on-call engineer
4. Create incident ticket
5. Preserve canary logs for analysis
Interview Talking Points
key_points:
shadow_first: "Always shadow deploy new model families to catch issues before any user impact"
canary_metrics: "We watch error rate, latency, and model-specific metrics like prediction confidence distribution"
auto_rollback: "Automated rollback on error rate spike, but manual intervention for accuracy issues since they need investigation"
a_b_testing: "Reserve A/B tests for business metrics - they require statistical significance which takes 2-4 weeks"
Expert Insight: "The key difference from traditional software: ML models can silently degrade. A model returning wrong predictions with 200 OK is worse than an error - that's why shadow deployment is so valuable."
Next, we'll cover GitOps and infrastructure as code for ML. :::