CI/CD Fundamentals for ML
CI/CD Pipeline Anatomy for ML
3 min read
An ML CI/CD pipeline has more stages than a traditional software pipeline. Let's break down each stage and understand what happens at every step.
The Full ML CI/CD Pipeline
┌──────────────────────────────────────────────────────────────────┐
│ ML CI/CD Pipeline │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Code │──▶│ Data │──▶│ Train │──▶│ Model │ │
│ │ Test │ │Validate │ │ │ │Validate │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │
│ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Monitor │◀──│ Deploy │◀──│ Package │◀──│Register │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
Stage-by-Stage Breakdown
Stage 1: Code Test
Standard software tests plus ML-specific checks:
# GitHub Actions example
code-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run unit tests
run: pytest tests/unit/
- name: Run integration tests
run: pytest tests/integration/
- name: Lint ML code
run: |
ruff check src/
mypy src/
| Test Type | What It Checks |
|---|---|
| Unit tests | Individual functions work correctly |
| Integration tests | Pipeline components work together |
| Linting | Code style and type hints |
Stage 2: Data Validate
Check data quality before training:
data-validate:
needs: code-test
steps:
- name: Pull data
run: dvc pull data/
- name: Validate schema
run: python scripts/validate_schema.py
- name: Check data quality
run: great_expectations checkpoint run data_quality
- name: Detect drift
run: python scripts/detect_drift.py --baseline data/baseline.parquet
| Validation | Purpose |
|---|---|
| Schema check | Column names, types, constraints |
| Quality rules | No nulls in required fields, valid ranges |
| Drift detection | Distribution hasn't changed significantly |
Stage 3: Train
Run the training job:
train:
needs: data-validate
runs-on: [self-hosted, gpu]
steps:
- name: Set up environment
run: pip install -r requirements.txt
- name: Run training
run: |
python train.py \
--data data/train.parquet \
--config configs/model.yaml \
--output models/
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}
- name: Upload model artifact
uses: actions/upload-artifact@v4
with:
name: model
path: models/
Stage 4: Model Validate
Ensure model meets quality standards:
model-validate:
needs: train
steps:
- name: Download model
uses: actions/download-artifact@v4
with:
name: model
- name: Run model tests
run: pytest tests/model/ --model-path models/
- name: Check accuracy threshold
run: |
python scripts/evaluate.py \
--model models/model.pkl \
--test-data data/test.parquet \
--min-accuracy 0.85
- name: Check for bias
run: python scripts/fairness_check.py --model models/model.pkl
| Validation | Gate Criteria |
|---|---|
| Accuracy | Must exceed baseline (e.g., > 0.85) |
| Latency | Inference time < 100ms |
| Fairness | No demographic bias beyond threshold |
Stage 5: Register
Version and store the model:
register:
needs: model-validate
steps:
- name: Register with MLflow
run: |
python scripts/register_model.py \
--model-path models/model.pkl \
--name "fraud-detector" \
--stage "staging"
Stage 6: Package
Create deployable artifact:
package:
needs: register
steps:
- name: Build container
run: |
docker build -t model-service:${{ github.sha }} .
docker push $REGISTRY/model-service:${{ github.sha }}
Stage 7: Deploy
Roll out to production:
deploy:
needs: package
steps:
- name: Deploy to staging
run: kubectl apply -f k8s/staging/
- name: Run smoke tests
run: pytest tests/smoke/ --endpoint $STAGING_URL
- name: Deploy canary (10%)
run: |
kubectl set image deployment/model-service \
model=$REGISTRY/model-service:${{ github.sha }}
Stage 8: Monitor
Continuous production monitoring:
# This runs on schedule, not per-commit
monitor:
schedule: "0 * * * *" # Every hour
steps:
- name: Check model metrics
run: python scripts/monitor.py --alert-on-drift
Pipeline Triggers
Different events trigger different pipeline paths:
| Trigger | Pipeline Path |
|---|---|
| Code push | Full pipeline |
| Data update | Data validate → Train → Full |
| Schedule | Train → Full (retraining) |
| Manual | Any stage |
Key Takeaways
- Gates at every stage: Each stage must pass before proceeding
- Artifact handoffs: Each stage produces artifacts for the next
- Versioning throughout: Code, data, and model versions are tracked
- Automation: Manual intervention only when gates fail
Next, we'll explore continuous training (CT) and how it extends beyond CI/CD. :::