CI/CD Fundamentals for ML

CI/CD Pipeline Anatomy for ML

3 min read

An ML CI/CD pipeline has more stages than a traditional software pipeline. Let's break down each stage and understand what happens at every step.

The Full ML CI/CD Pipeline

┌──────────────────────────────────────────────────────────────────┐
│                     ML CI/CD Pipeline                            │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐         │
│  │  Code   │──▶│  Data   │──▶│  Train  │──▶│  Model  │         │
│  │  Test   │   │Validate │   │         │   │Validate │         │
│  └─────────┘   └─────────┘   └─────────┘   └─────────┘         │
│                                                │                 │
│                                                ▼                 │
│  ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐         │
│  │ Monitor │◀──│ Deploy  │◀──│ Package │◀──│Register │         │
│  └─────────┘   └─────────┘   └─────────┘   └─────────┘         │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Stage-by-Stage Breakdown

Stage 1: Code Test

Standard software tests plus ML-specific checks:

# GitHub Actions example
code-test:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - name: Run unit tests
      run: pytest tests/unit/

    - name: Run integration tests
      run: pytest tests/integration/

    - name: Lint ML code
      run: |
        ruff check src/
        mypy src/
Test Type What It Checks
Unit tests Individual functions work correctly
Integration tests Pipeline components work together
Linting Code style and type hints

Stage 2: Data Validate

Check data quality before training:

data-validate:
  needs: code-test
  steps:
    - name: Pull data
      run: dvc pull data/

    - name: Validate schema
      run: python scripts/validate_schema.py

    - name: Check data quality
      run: great_expectations checkpoint run data_quality

    - name: Detect drift
      run: python scripts/detect_drift.py --baseline data/baseline.parquet
Validation Purpose
Schema check Column names, types, constraints
Quality rules No nulls in required fields, valid ranges
Drift detection Distribution hasn't changed significantly

Stage 3: Train

Run the training job:

train:
  needs: data-validate
  runs-on: [self-hosted, gpu]
  steps:
    - name: Set up environment
      run: pip install -r requirements.txt

    - name: Run training
      run: |
        python train.py \
          --data data/train.parquet \
          --config configs/model.yaml \
          --output models/
      env:
        MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}

    - name: Upload model artifact
      uses: actions/upload-artifact@v4
      with:
        name: model
        path: models/

Stage 4: Model Validate

Ensure model meets quality standards:

model-validate:
  needs: train
  steps:
    - name: Download model
      uses: actions/download-artifact@v4
      with:
        name: model

    - name: Run model tests
      run: pytest tests/model/ --model-path models/

    - name: Check accuracy threshold
      run: |
        python scripts/evaluate.py \
          --model models/model.pkl \
          --test-data data/test.parquet \
          --min-accuracy 0.85

    - name: Check for bias
      run: python scripts/fairness_check.py --model models/model.pkl
Validation Gate Criteria
Accuracy Must exceed baseline (e.g., > 0.85)
Latency Inference time < 100ms
Fairness No demographic bias beyond threshold

Stage 5: Register

Version and store the model:

register:
  needs: model-validate
  steps:
    - name: Register with MLflow
      run: |
        python scripts/register_model.py \
          --model-path models/model.pkl \
          --name "fraud-detector" \
          --stage "staging"

Stage 6: Package

Create deployable artifact:

package:
  needs: register
  steps:
    - name: Build container
      run: |
        docker build -t model-service:${{ github.sha }} .
        docker push $REGISTRY/model-service:${{ github.sha }}

Stage 7: Deploy

Roll out to production:

deploy:
  needs: package
  steps:
    - name: Deploy to staging
      run: kubectl apply -f k8s/staging/

    - name: Run smoke tests
      run: pytest tests/smoke/ --endpoint $STAGING_URL

    - name: Deploy canary (10%)
      run: |
        kubectl set image deployment/model-service \
          model=$REGISTRY/model-service:${{ github.sha }}

Stage 8: Monitor

Continuous production monitoring:

# This runs on schedule, not per-commit
monitor:
  schedule: "0 * * * *"  # Every hour
  steps:
    - name: Check model metrics
      run: python scripts/monitor.py --alert-on-drift

Pipeline Triggers

Different events trigger different pipeline paths:

Trigger Pipeline Path
Code push Full pipeline
Data update Data validate → Train → Full
Schedule Train → Full (retraining)
Manual Any stage

Key Takeaways

  1. Gates at every stage: Each stage must pass before proceeding
  2. Artifact handoffs: Each stage produces artifacts for the next
  3. Versioning throughout: Code, data, and model versions are tracked
  4. Automation: Manual intervention only when gates fail

Next, we'll explore continuous training (CT) and how it extends beyond CI/CD. :::

Quiz

Module 1: CI/CD Fundamentals for ML

Take Quiz