GitHub Actions for ML Workflows

Reusable Workflows & Best Practices

3 min read

As your ML CI/CD grows, you'll want to avoid duplicating workflow code. Let's explore reusable workflows and production best practices.

Reusable Workflows

Create workflows that can be called from other workflows:

Defining a Reusable Workflow

# .github/workflows/reusable-train.yml
name: Reusable Training Workflow

on:
  workflow_call:
    inputs:
      model_name:
        required: true
        type: string
      python_version:
        required: false
        type: string
        default: '3.11'
      run_tests:
        required: false
        type: boolean
        default: true
    secrets:
      MLFLOW_URI:
        required: true
    outputs:
      model_path:
        description: "Path to trained model"
        value: ${{ jobs.train.outputs.model_path }}
      accuracy:
        description: "Model accuracy"
        value: ${{ jobs.train.outputs.accuracy }}

jobs:
  train:
    runs-on: ubuntu-latest
    outputs:
      model_path: ${{ steps.train.outputs.path }}
      accuracy: ${{ steps.train.outputs.accuracy }}

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: ${{ inputs.python_version }}
          cache: 'pip'

      - run: pip install -r requirements.txt

      - name: Run tests
        if: ${{ inputs.run_tests }}
        run: pytest tests/

      - name: Train model
        id: train
        run: |
          python train.py --model ${{ inputs.model_name }}
          echo "path=models/${{ inputs.model_name }}.pkl" >> $GITHUB_OUTPUT
          echo "accuracy=$(cat metrics.json | jq .accuracy)" >> $GITHUB_OUTPUT
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}

      - uses: actions/upload-artifact@v4
        with:
          name: model-${{ inputs.model_name }}
          path: models/

Calling the Reusable Workflow

# .github/workflows/main.yml
name: Main Pipeline

on:
  push:
    branches: [main]

jobs:
  train-baseline:
    uses: ./.github/workflows/reusable-train.yml
    with:
      model_name: baseline
      python_version: '3.11'
    secrets:
      MLFLOW_URI: ${{ secrets.MLFLOW_URI }}

  train-improved:
    uses: ./.github/workflows/reusable-train.yml
    with:
      model_name: improved
      run_tests: false
    secrets:
      MLFLOW_URI: ${{ secrets.MLFLOW_URI }}

  compare:
    needs: [train-baseline, train-improved]
    runs-on: ubuntu-latest
    steps:
      - run: |
          echo "Baseline: ${{ needs.train-baseline.outputs.accuracy }}"
          echo "Improved: ${{ needs.train-improved.outputs.accuracy }}"

Composite Actions

Bundle multiple steps into a single reusable action:

# .github/actions/setup-ml-env/action.yml
name: 'Setup ML Environment'
description: 'Sets up Python and installs ML dependencies'

inputs:
  python-version:
    description: 'Python version'
    required: false
    default: '3.11'
  install-gpu:
    description: 'Install GPU dependencies'
    required: false
    default: 'false'

runs:
  using: "composite"
  steps:
    - uses: actions/setup-python@v5
      with:
        python-version: ${{ inputs.python-version }}
        cache: 'pip'

    - name: Install base dependencies
      shell: bash
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt

    - name: Install GPU dependencies
      if: ${{ inputs.install-gpu == 'true' }}
      shell: bash
      run: pip install -r requirements-gpu.txt

Using the Composite Action

jobs:
  train:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: ./.github/actions/setup-ml-env
        with:
          python-version: '3.11'
          install-gpu: 'true'

      - run: python train.py

Best Practices

1. Environment Separation

jobs:
  train:
    runs-on: ubuntu-latest
    environment: ${{ github.ref == 'refs/heads/main' && 'production' || 'staging' }}
    steps:
      - run: echo "Deploying to ${{ github.ref == 'refs/heads/main' && 'production' || 'staging' }}"

2. Secrets Management

# Use environment-specific secrets
jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Deploy
        run: ./deploy.sh
        env:
          # These secrets are only available in 'production' environment
          API_KEY: ${{ secrets.PROD_API_KEY }}
          DB_URL: ${{ secrets.PROD_DB_URL }}

3. Concurrency Control

# Prevent multiple training runs
concurrency:
  group: training-${{ github.ref }}
  cancel-in-progress: true

jobs:
  train:
    runs-on: ubuntu-latest
    steps:
      - run: python train.py

4. Error Handling

jobs:
  train:
    runs-on: ubuntu-latest
    steps:
      - name: Train with retry
        uses: nick-fields/retry@v2
        with:
          timeout_minutes: 60
          max_attempts: 3
          command: python train.py

      - name: Cleanup on failure
        if: failure()
        run: |
          # Clean up resources
          python scripts/cleanup.py

      - name: Notify on completion
        if: always()
        run: |
          curl -X POST $SLACK_WEBHOOK \
            -d '{"text": "Training ${{ job.status }}"}'

5. Cost Optimization

jobs:
  # Run expensive GPU jobs only when needed
  quick-check:
    runs-on: ubuntu-latest
    outputs:
      needs_training: ${{ steps.check.outputs.result }}
    steps:
      - id: check
        run: |
          # Check if data changed
          if [ "$(git diff --name-only HEAD~1 | grep data/)" ]; then
            echo "result=true" >> $GITHUB_OUTPUT
          else
            echo "result=false" >> $GITHUB_OUTPUT
          fi

  train:
    needs: quick-check
    if: needs.quick-check.outputs.needs_training == 'true'
    runs-on: [self-hosted, gpu]
    steps:
      - run: python train.py

6. Workflow Documentation

name: ML Training Pipeline

# This workflow:
# 1. Runs tests on every push
# 2. Trains model on main branch only
# 3. Validates and deploys if all gates pass
#
# Required secrets:
#   - MLFLOW_URI: MLflow tracking server URL
#   - AWS_KEY: AWS access key for S3
#
# Triggers:
#   - Push to any branch (tests only)
#   - Push to main (full pipeline)
#   - Weekly schedule (retraining)

on:
  push:
    branches: ['*']
  schedule:
    - cron: '0 2 * * 0'

Workflow Organization

.github/
├── workflows/
│   ├── main.yml              # Primary entry point
│   ├── reusable-train.yml    # Reusable training
│   ├── reusable-validate.yml # Reusable validation
│   └── scheduled-retrain.yml # Scheduled retraining
├── actions/
│   ├── setup-ml-env/         # Composite action
│   │   └── action.yml
│   └── report-metrics/       # Another composite
│       └── action.yml
└── CODEOWNERS                # Workflow owners

Summary Checklist

Practice Implementation
Reuse workflows workflow_call for shared logic
Bundle steps Composite actions for setup
Control concurrency concurrency to prevent conflicts
Handle errors if: failure() and retries
Optimize costs Conditional GPU usage
Document Comments in workflow files
Organize Consistent folder structure

Key Insight: A well-organized CI/CD codebase is as important as your ML codebase. Treat your workflows as production code.

In the next module, we'll explore GitLab CI/CD and compare it with GitHub Actions. :::

Quiz

Module 2: GitHub Actions for ML Workflows

Take Quiz