GitHub Actions for ML Workflows
Reusable Workflows & Best Practices
3 min read
As your ML CI/CD grows, you'll want to avoid duplicating workflow code. Let's explore reusable workflows and production best practices.
Reusable Workflows
Create workflows that can be called from other workflows:
Defining a Reusable Workflow
# .github/workflows/reusable-train.yml
name: Reusable Training Workflow
on:
workflow_call:
inputs:
model_name:
required: true
type: string
python_version:
required: false
type: string
default: '3.11'
run_tests:
required: false
type: boolean
default: true
secrets:
MLFLOW_URI:
required: true
outputs:
model_path:
description: "Path to trained model"
value: ${{ jobs.train.outputs.model_path }}
accuracy:
description: "Model accuracy"
value: ${{ jobs.train.outputs.accuracy }}
jobs:
train:
runs-on: ubuntu-latest
outputs:
model_path: ${{ steps.train.outputs.path }}
accuracy: ${{ steps.train.outputs.accuracy }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python_version }}
cache: 'pip'
- run: pip install -r requirements.txt
- name: Run tests
if: ${{ inputs.run_tests }}
run: pytest tests/
- name: Train model
id: train
run: |
python train.py --model ${{ inputs.model_name }}
echo "path=models/${{ inputs.model_name }}.pkl" >> $GITHUB_OUTPUT
echo "accuracy=$(cat metrics.json | jq .accuracy)" >> $GITHUB_OUTPUT
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}
- uses: actions/upload-artifact@v4
with:
name: model-${{ inputs.model_name }}
path: models/
Calling the Reusable Workflow
# .github/workflows/main.yml
name: Main Pipeline
on:
push:
branches: [main]
jobs:
train-baseline:
uses: ./.github/workflows/reusable-train.yml
with:
model_name: baseline
python_version: '3.11'
secrets:
MLFLOW_URI: ${{ secrets.MLFLOW_URI }}
train-improved:
uses: ./.github/workflows/reusable-train.yml
with:
model_name: improved
run_tests: false
secrets:
MLFLOW_URI: ${{ secrets.MLFLOW_URI }}
compare:
needs: [train-baseline, train-improved]
runs-on: ubuntu-latest
steps:
- run: |
echo "Baseline: ${{ needs.train-baseline.outputs.accuracy }}"
echo "Improved: ${{ needs.train-improved.outputs.accuracy }}"
Composite Actions
Bundle multiple steps into a single reusable action:
# .github/actions/setup-ml-env/action.yml
name: 'Setup ML Environment'
description: 'Sets up Python and installs ML dependencies'
inputs:
python-version:
description: 'Python version'
required: false
default: '3.11'
install-gpu:
description: 'Install GPU dependencies'
required: false
default: 'false'
runs:
using: "composite"
steps:
- uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python-version }}
cache: 'pip'
- name: Install base dependencies
shell: bash
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Install GPU dependencies
if: ${{ inputs.install-gpu == 'true' }}
shell: bash
run: pip install -r requirements-gpu.txt
Using the Composite Action
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/setup-ml-env
with:
python-version: '3.11'
install-gpu: 'true'
- run: python train.py
Best Practices
1. Environment Separation
jobs:
train:
runs-on: ubuntu-latest
environment: ${{ github.ref == 'refs/heads/main' && 'production' || 'staging' }}
steps:
- run: echo "Deploying to ${{ github.ref == 'refs/heads/main' && 'production' || 'staging' }}"
2. Secrets Management
# Use environment-specific secrets
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- name: Deploy
run: ./deploy.sh
env:
# These secrets are only available in 'production' environment
API_KEY: ${{ secrets.PROD_API_KEY }}
DB_URL: ${{ secrets.PROD_DB_URL }}
3. Concurrency Control
# Prevent multiple training runs
concurrency:
group: training-${{ github.ref }}
cancel-in-progress: true
jobs:
train:
runs-on: ubuntu-latest
steps:
- run: python train.py
4. Error Handling
jobs:
train:
runs-on: ubuntu-latest
steps:
- name: Train with retry
uses: nick-fields/retry@v2
with:
timeout_minutes: 60
max_attempts: 3
command: python train.py
- name: Cleanup on failure
if: failure()
run: |
# Clean up resources
python scripts/cleanup.py
- name: Notify on completion
if: always()
run: |
curl -X POST $SLACK_WEBHOOK \
-d '{"text": "Training ${{ job.status }}"}'
5. Cost Optimization
jobs:
# Run expensive GPU jobs only when needed
quick-check:
runs-on: ubuntu-latest
outputs:
needs_training: ${{ steps.check.outputs.result }}
steps:
- id: check
run: |
# Check if data changed
if [ "$(git diff --name-only HEAD~1 | grep data/)" ]; then
echo "result=true" >> $GITHUB_OUTPUT
else
echo "result=false" >> $GITHUB_OUTPUT
fi
train:
needs: quick-check
if: needs.quick-check.outputs.needs_training == 'true'
runs-on: [self-hosted, gpu]
steps:
- run: python train.py
6. Workflow Documentation
name: ML Training Pipeline
# This workflow:
# 1. Runs tests on every push
# 2. Trains model on main branch only
# 3. Validates and deploys if all gates pass
#
# Required secrets:
# - MLFLOW_URI: MLflow tracking server URL
# - AWS_KEY: AWS access key for S3
#
# Triggers:
# - Push to any branch (tests only)
# - Push to main (full pipeline)
# - Weekly schedule (retraining)
on:
push:
branches: ['*']
schedule:
- cron: '0 2 * * 0'
Workflow Organization
.github/
├── workflows/
│ ├── main.yml # Primary entry point
│ ├── reusable-train.yml # Reusable training
│ ├── reusable-validate.yml # Reusable validation
│ └── scheduled-retrain.yml # Scheduled retraining
├── actions/
│ ├── setup-ml-env/ # Composite action
│ │ └── action.yml
│ └── report-metrics/ # Another composite
│ └── action.yml
└── CODEOWNERS # Workflow owners
Summary Checklist
| Practice | Implementation |
|---|---|
| Reuse workflows | workflow_call for shared logic |
| Bundle steps | Composite actions for setup |
| Control concurrency | concurrency to prevent conflicts |
| Handle errors | if: failure() and retries |
| Optimize costs | Conditional GPU usage |
| Document | Comments in workflow files |
| Organize | Consistent folder structure |
Key Insight: A well-organized CI/CD codebase is as important as your ML codebase. Treat your workflows as production code.
In the next module, we'll explore GitLab CI/CD and compare it with GitHub Actions. :::