Production Operations & GitOps
CI/CD for ML Model Deployment
3 min read
ML-specific CI/CD pipelines extend traditional software delivery with model validation, performance testing, and automated canary deployments. This lesson covers GitHub Actions and Tekton pipelines for ML workflows.
ML CI/CD Pipeline Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ ML CI/CD Pipeline │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Code │──→│ Build │──→│ Test │──→│ Scan │ │
│ │ Push │ │ Image │ │ Model │ │ Security │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │
│ ↓ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Prod │←──│ Canary │←──│ Stage │←──│ Registry │ │
│ │ Deploy │ │ Deploy │ │ Test │ │ Push │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │
│ └──────────────┼───────────────────────────────────→ │
│ │ Monitor & Rollback │
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Observability (Metrics, Logs, Traces) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
GitHub Actions for ML
# .github/workflows/ml-deploy.yaml
name: ML Model Deployment
on:
push:
branches: [main]
paths:
- 'models/**'
- 'inference/**'
pull_request:
branches: [main]
env:
REGISTRY: gcr.io
PROJECT_ID: ml-production
CLUSTER_NAME: ml-cluster
CLUSTER_ZONE: us-central1-a
jobs:
test-model:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-benchmark
- name: Run model unit tests
run: pytest tests/unit/ -v
- name: Run model performance tests
run: |
pytest tests/performance/ --benchmark-json=benchmark.json
- name: Check performance regression
run: |
python scripts/check_performance.py benchmark.json \
--baseline benchmarks/baseline.json \
--threshold 0.1 # Max 10% regression
build-and-push:
needs: test-model
runs-on: ubuntu-latest
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GCR
uses: docker/login-action@v3
with:
registry: gcr.io
username: _json_key
password: ${{ secrets.GCP_SA_KEY }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.PROJECT_ID }}/inference
tags: |
type=sha,prefix=
type=ref,event=branch
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=gha
cache-to: type=gha,mode=max
security-scan:
needs: build-and-push
runs-on: ubuntu-latest
steps:
- name: Scan image for vulnerabilities
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ needs.build-and-push.outputs.image-tag }}
format: 'sarif'
output: 'trivy-results.sarif'
severity: 'HIGH,CRITICAL'
- name: Upload scan results
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: 'trivy-results.sarif'
deploy-staging:
needs: [build-and-push, security-scan]
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- name: Set up gcloud
uses: google-github-actions/setup-gcloud@v2
with:
service_account_key: ${{ secrets.GCP_SA_KEY }}
- name: Get GKE credentials
run: |
gcloud container clusters get-credentials ${{ env.CLUSTER_NAME }} \
--zone ${{ env.CLUSTER_ZONE }}
- name: Deploy to staging
run: |
kubectl set image deployment/inference-staging \
inference=${{ needs.build-and-push.outputs.image-tag }} \
-n ml-staging
- name: Wait for rollout
run: |
kubectl rollout status deployment/inference-staging \
-n ml-staging --timeout=300s
- name: Run integration tests
run: |
python scripts/integration_tests.py \
--endpoint https://staging.inference.example.com \
--test-data tests/fixtures/integration.json
deploy-canary:
needs: deploy-staging
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Deploy canary (10% traffic)
run: |
# Update canary deployment
kubectl set image deployment/inference-canary \
inference=${{ needs.build-and-push.outputs.image-tag }} \
-n ml-serving
# Update Istio VirtualService for 10% canary traffic
kubectl patch virtualservice inference-vs -n ml-serving \
--type=json \
-p='[{"op": "replace", "path": "/spec/http/0/route/1/weight", "value": 10}]'
- name: Monitor canary metrics
run: |
python scripts/canary_monitor.py \
--duration 600 \
--error-threshold 0.01 \
--latency-threshold-p99 2.0
- name: Promote or rollback
run: |
if [ "$CANARY_SUCCESS" == "true" ]; then
# Promote to 100%
kubectl patch virtualservice inference-vs -n ml-serving \
--type=json \
-p='[{"op": "replace", "path": "/spec/http/0/route/1/weight", "value": 100}]'
else
# Rollback
kubectl rollout undo deployment/inference-canary -n ml-serving
fi
Tekton Pipeline for ML
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: ml-deployment-pipeline
spec:
params:
- name: git-url
type: string
- name: git-revision
type: string
default: main
- name: image-name
type: string
workspaces:
- name: shared-workspace
- name: docker-credentials
tasks:
- name: fetch-source
taskRef:
name: git-clone
params:
- name: url
value: $(params.git-url)
- name: revision
value: $(params.git-revision)
workspaces:
- name: output
workspace: shared-workspace
- name: run-tests
runAfter: [fetch-source]
taskSpec:
workspaces:
- name: source
steps:
- name: test
image: python:3.11
script: |
cd $(workspaces.source.path)
pip install -r requirements.txt
pytest tests/ -v --junitxml=test-results.xml
workspaces:
- name: source
workspace: shared-workspace
- name: validate-model
runAfter: [run-tests]
taskSpec:
workspaces:
- name: source
steps:
- name: validate
image: python:3.11
script: |
cd $(workspaces.source.path)
python scripts/validate_model.py \
--model-path models/latest \
--validation-data data/validation.csv \
--min-accuracy 0.95
workspaces:
- name: source
workspace: shared-workspace
- name: build-image
runAfter: [validate-model]
taskRef:
name: kaniko
params:
- name: IMAGE
value: $(params.image-name)
workspaces:
- name: source
workspace: shared-workspace
- name: dockerconfig
workspace: docker-credentials
- name: deploy-canary
runAfter: [build-image]
taskRef:
name: kubernetes-actions
params:
- name: script
value: |
kubectl set image deployment/inference-canary \
inference=$(params.image-name) -n ml-serving
kubectl rollout status deployment/inference-canary \
-n ml-serving --timeout=300s
- name: run-canary-analysis
runAfter: [deploy-canary]
taskSpec:
steps:
- name: analyze
image: curlimages/curl
script: |
# Query Prometheus for canary metrics
CANARY_ERROR_RATE=$(curl -s "prometheus:9090/api/v1/query?query=sum(rate(inference_errors_total{deployment='canary'}[10m]))")
STABLE_ERROR_RATE=$(curl -s "prometheus:9090/api/v1/query?query=sum(rate(inference_errors_total{deployment='stable'}[10m]))")
# Compare and decide
if [ "$CANARY_ERROR_RATE" -gt "$STABLE_ERROR_RATE" ]; then
echo "Canary has higher error rate, rolling back"
exit 1
fi
Model Validation Gate
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: model-validation-gate
spec:
params:
- name: model-uri
type: string
- name: min-accuracy
type: string
default: "0.95"
- name: max-latency-ms
type: string
default: "100"
steps:
- name: download-model
image: amazon/aws-cli
script: |
aws s3 cp $(params.model-uri) /workspace/model
- name: validate-accuracy
image: python:3.11
script: |
pip install scikit-learn numpy
python << 'EOF'
import pickle
from sklearn.metrics import accuracy_score
with open('/workspace/model', 'rb') as f:
model = pickle.load(f)
# Load validation data
X_val, y_val = load_validation_data()
predictions = model.predict(X_val)
accuracy = accuracy_score(y_val, predictions)
if accuracy < float("$(params.min-accuracy)"):
print(f"Model accuracy {accuracy} below threshold")
exit(1)
EOF
- name: validate-latency
image: python:3.11
script: |
python << 'EOF'
import time
import pickle
with open('/workspace/model', 'rb') as f:
model = pickle.load(f)
# Measure inference latency
latencies = []
for _ in range(100):
start = time.time()
model.predict([[1, 2, 3, 4]])
latencies.append((time.time() - start) * 1000)
p99_latency = sorted(latencies)[95]
if p99_latency > float("$(params.max-latency-ms)"):
print(f"P99 latency {p99_latency}ms exceeds threshold")
exit(1)
EOF
Automated Rollback
# Argo Rollouts with automatic rollback
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: inference-rollout
spec:
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 5m}
- analysis:
templates:
- templateName: success-rate
- templateName: latency-check
- setWeight: 50
- pause: {duration: 10m}
- analysis:
templates:
- templateName: success-rate
- setWeight: 100
canaryService: inference-canary
stableService: inference-stable
# Automatic rollback on failure
abortScaleDownDelaySeconds: 30
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
metrics:
- name: success-rate
successCondition: result[0] >= 0.99
failureCondition: result[0] < 0.95
failureLimit: 3
provider:
prometheus:
address: http://prometheus:9090
query: |
sum(rate(inference_success_total{rollouts_pod_template_hash="{{args.canary-hash}}"}[5m])) /
sum(rate(inference_requests_total{rollouts_pod_template_hash="{{args.canary-hash}}"}[5m]))
Congratulations! You've completed the Kubernetes for AI/ML course. You now have the knowledge to deploy, scale, and operate production ML workloads on Kubernetes. :::