DVC + CML for ML Automation

CML (Continuous ML) Setup

5 min read

English Content

What is CML?

CML (Continuous Machine Learning) is an open-source tool from Iterative.ai (the makers of DVC) that brings DevOps practices to ML. It automates ML experiment reporting directly in your Git platform.

CML capabilities:

  • Post experiment reports as PR/MR comments
  • Compare model metrics across branches
  • Display plots and visualizations in PRs
  • Provision cloud runners for training
  • Integrate with GitHub Actions and GitLab CI

Installing CML

# Install via npm (recommended)
npm install -g @dvcorg/cml

# Or via pip
pip install cml

# Verify installation
cml --version

Basic CML Workflow

# .github/workflows/cml.yml
name: CML Report
on:
  pull_request:
    branches: [main]

jobs:
  train-and-report:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - uses: actions/setup-node@v4
        with:
          node-version: '18'

      - name: Install dependencies
        run: |
          pip install pandas scikit-learn matplotlib
          npm install -g @dvcorg/cml

      - name: Train model
        run: python train.py

      - name: Create CML report
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          # Add metrics to report
          echo "## Model Metrics" >> report.md
          cat metrics.json >> report.md

          # Add plot to report
          echo "## Training Curves" >> report.md
          cml asset publish training_curve.png --md >> report.md

          # Post comment to PR
          cml comment create report.md

CML Commands

cml comment create: Post a comment to a PR/MR

# Create a comment with markdown content
cml comment create report.md

# Create a comment from stdin
echo "Training complete! Accuracy: 94%" | cml comment create

cml asset publish: Upload and get markdown link for assets

# Publish an image and get markdown
cml asset publish confusion_matrix.png --md >> report.md

# Publish multiple images
for img in plots/*.png; do
  cml asset publish "$img" --md >> report.md
done

cml runner: Launch cloud runners (covered in lesson 04)

Comprehensive Report Example

# .github/workflows/ml-report.yml
name: ML Experiment Report
on:
  pull_request:
    branches: [main]

jobs:
  report:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - uses: actions/setup-node@v4
        with:
          node-version: '18'

      - name: Install dependencies
        run: |
          pip install dvc pandas scikit-learn matplotlib seaborn
          npm install -g @dvcorg/cml

      - name: Pull data and train
        run: |
          dvc pull
          python train.py

      - name: Generate report
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          # Start report
          echo "# ML Experiment Report" > report.md
          echo "" >> report.md

          # Add experiment summary
          echo "## Experiment Summary" >> report.md
          echo "- **Branch:** ${{ github.head_ref }}" >> report.md
          echo "- **Commit:** ${{ github.sha }}" >> report.md
          echo "- **Date:** $(date)" >> report.md
          echo "" >> report.md

          # Add metrics comparison
          echo "## Metrics Comparison" >> report.md
          echo '```' >> report.md
          dvc metrics diff main >> report.md || echo "No previous metrics" >> report.md
          echo '```' >> report.md
          echo "" >> report.md

          # Add current metrics
          echo "## Current Metrics" >> report.md
          echo '```json' >> report.md
          cat metrics/eval_metrics.json >> report.md
          echo '```' >> report.md
          echo "" >> report.md

          # Add plots
          echo "## Visualizations" >> report.md
          echo "### Confusion Matrix" >> report.md
          cml asset publish plots/confusion_matrix.png --md >> report.md
          echo "" >> report.md
          echo "### ROC Curve" >> report.md
          cml asset publish plots/roc_curve.png --md >> report.md
          echo "" >> report.md
          echo "### Feature Importance" >> report.md
          cml asset publish plots/feature_importance.png --md >> report.md

          # Post comment
          cml comment create report.md

GitLab CI Integration

# .gitlab-ci.yml
stages:
  - train
  - report

train:
  stage: train
  image: python:3.11
  script:
    - pip install dvc pandas scikit-learn matplotlib
    - dvc pull
    - python train.py
  artifacts:
    paths:
      - metrics/
      - plots/

report:
  stage: report
  image: node:18
  script:
    - npm install -g @dvcorg/cml
    - |
      echo "# ML Experiment Report" > report.md
      echo "## Metrics" >> report.md
      cat metrics/eval_metrics.json >> report.md
      echo "## Plots" >> report.md
      cml asset publish plots/confusion_matrix.png --md >> report.md
      cml comment create report.md
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

Key Takeaways

CML Command Purpose
cml comment create Post markdown report to PR/MR
cml asset publish Upload images/files, get markdown
cml runner Provision cloud compute
REPO_TOKEN GitHub token for API access

المحتوى العربي

ما هو CML؟

CML (التعلم الآلي المستمر) هو أداة مفتوحة المصدر من Iterative.ai (صانعي DVC) تجلب ممارسات DevOps إلى ML. تؤتمت تقارير تجارب ML مباشرة في منصة Git الخاصة بك.

قدرات CML:

  • نشر تقارير التجارب كتعليقات PR/MR
  • مقارنة مقاييس النموذج عبر الفروع
  • عرض الرسوم البيانية والتصورات في PRs
  • توفير runners سحابية للتدريب
  • التكامل مع GitHub Actions وGitLab CI

تثبيت CML

# التثبيت عبر npm (موصى به)
npm install -g @dvcorg/cml

# أو عبر pip
pip install cml

# التحقق من التثبيت
cml --version

سير عمل CML الأساسي

# .github/workflows/cml.yml
name: CML Report
on:
  pull_request:
    branches: [main]

jobs:
  train-and-report:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - uses: actions/setup-node@v4
        with:
          node-version: '18'

      - name: Install dependencies
        run: |
          pip install pandas scikit-learn matplotlib
          npm install -g @dvcorg/cml

      - name: Train model
        run: python train.py

      - name: Create CML report
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          # إضافة المقاييس للتقرير
          echo "## Model Metrics" >> report.md
          cat metrics.json >> report.md

          # إضافة الرسم البياني للتقرير
          echo "## Training Curves" >> report.md
          cml asset publish training_curve.png --md >> report.md

          # نشر التعليق على PR
          cml comment create report.md

أوامر CML

cml comment create: نشر تعليق على PR/MR

# إنشاء تعليق بمحتوى markdown
cml comment create report.md

# إنشاء تعليق من stdin
echo "Training complete! Accuracy: 94%" | cml comment create

cml asset publish: رفع والحصول على رابط markdown للأصول

# نشر صورة والحصول على markdown
cml asset publish confusion_matrix.png --md >> report.md

# نشر صور متعددة
for img in plots/*.png; do
  cml asset publish "$img" --md >> report.md
done

cml runner: إطلاق runners سحابية (مغطاة في الدرس 04)

مثال تقرير شامل

# .github/workflows/ml-report.yml
name: ML Experiment Report
on:
  pull_request:
    branches: [main]

jobs:
  report:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - uses: actions/setup-node@v4
        with:
          node-version: '18'

      - name: Install dependencies
        run: |
          pip install dvc pandas scikit-learn matplotlib seaborn
          npm install -g @dvcorg/cml

      - name: Pull data and train
        run: |
          dvc pull
          python train.py

      - name: Generate report
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          # بدء التقرير
          echo "# ML Experiment Report" > report.md
          echo "" >> report.md

          # إضافة ملخص التجربة
          echo "## Experiment Summary" >> report.md
          echo "- **Branch:** ${{ github.head_ref }}" >> report.md
          echo "- **Commit:** ${{ github.sha }}" >> report.md
          echo "- **Date:** $(date)" >> report.md
          echo "" >> report.md

          # إضافة مقارنة المقاييس
          echo "## Metrics Comparison" >> report.md
          echo '```' >> report.md
          dvc metrics diff main >> report.md || echo "No previous metrics" >> report.md
          echo '```' >> report.md
          echo "" >> report.md

          # إضافة المقاييس الحالية
          echo "## Current Metrics" >> report.md
          echo '```json' >> report.md
          cat metrics/eval_metrics.json >> report.md
          echo '```' >> report.md
          echo "" >> report.md

          # إضافة الرسوم البيانية
          echo "## Visualizations" >> report.md
          echo "### Confusion Matrix" >> report.md
          cml asset publish plots/confusion_matrix.png --md >> report.md
          echo "" >> report.md
          echo "### ROC Curve" >> report.md
          cml asset publish plots/roc_curve.png --md >> report.md
          echo "" >> report.md
          echo "### Feature Importance" >> report.md
          cml asset publish plots/feature_importance.png --md >> report.md

          # نشر التعليق
          cml comment create report.md

تكامل GitLab CI

# .gitlab-ci.yml
stages:
  - train
  - report

train:
  stage: train
  image: python:3.11
  script:
    - pip install dvc pandas scikit-learn matplotlib
    - dvc pull
    - python train.py
  artifacts:
    paths:
      - metrics/
      - plots/

report:
  stage: report
  image: node:18
  script:
    - npm install -g @dvcorg/cml
    - |
      echo "# ML Experiment Report" > report.md
      echo "## Metrics" >> report.md
      cat metrics/eval_metrics.json >> report.md
      echo "## Plots" >> report.md
      cml asset publish plots/confusion_matrix.png --md >> report.md
      cml comment create report.md
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

النقاط الرئيسية

أمر CML الغرض
cml comment create نشر تقرير markdown على PR/MR
cml asset publish رفع الصور/الملفات، الحصول على markdown
cml runner توفير حوسبة سحابية
REPO_TOKEN رمز GitHub للوصول إلى API

Quiz

Module 5: DVC + CML for ML Automation

Take Quiz