DVC + CML for ML Automation
CML (Continuous ML) Setup
5 min read
English Content
What is CML?
CML (Continuous Machine Learning) is an open-source tool from Iterative.ai (the makers of DVC) that brings DevOps practices to ML. It automates ML experiment reporting directly in your Git platform.
CML capabilities:
- Post experiment reports as PR/MR comments
- Compare model metrics across branches
- Display plots and visualizations in PRs
- Provision cloud runners for training
- Integrate with GitHub Actions and GitLab CI
Installing CML
# Install via npm (recommended)
npm install -g @dvcorg/cml
# Or via pip
pip install cml
# Verify installation
cml --version
Basic CML Workflow
# .github/workflows/cml.yml
name: CML Report
on:
pull_request:
branches: [main]
jobs:
train-and-report:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- uses: actions/setup-node@v4
with:
node-version: '18'
- name: Install dependencies
run: |
pip install pandas scikit-learn matplotlib
npm install -g @dvcorg/cml
- name: Train model
run: python train.py
- name: Create CML report
env:
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Add metrics to report
echo "## Model Metrics" >> report.md
cat metrics.json >> report.md
# Add plot to report
echo "## Training Curves" >> report.md
cml asset publish training_curve.png --md >> report.md
# Post comment to PR
cml comment create report.md
CML Commands
cml comment create: Post a comment to a PR/MR
# Create a comment with markdown content
cml comment create report.md
# Create a comment from stdin
echo "Training complete! Accuracy: 94%" | cml comment create
cml asset publish: Upload and get markdown link for assets
# Publish an image and get markdown
cml asset publish confusion_matrix.png --md >> report.md
# Publish multiple images
for img in plots/*.png; do
cml asset publish "$img" --md >> report.md
done
cml runner: Launch cloud runners (covered in lesson 04)
Comprehensive Report Example
# .github/workflows/ml-report.yml
name: ML Experiment Report
on:
pull_request:
branches: [main]
jobs:
report:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- uses: actions/setup-node@v4
with:
node-version: '18'
- name: Install dependencies
run: |
pip install dvc pandas scikit-learn matplotlib seaborn
npm install -g @dvcorg/cml
- name: Pull data and train
run: |
dvc pull
python train.py
- name: Generate report
env:
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Start report
echo "# ML Experiment Report" > report.md
echo "" >> report.md
# Add experiment summary
echo "## Experiment Summary" >> report.md
echo "- **Branch:** ${{ github.head_ref }}" >> report.md
echo "- **Commit:** ${{ github.sha }}" >> report.md
echo "- **Date:** $(date)" >> report.md
echo "" >> report.md
# Add metrics comparison
echo "## Metrics Comparison" >> report.md
echo '```' >> report.md
dvc metrics diff main >> report.md || echo "No previous metrics" >> report.md
echo '```' >> report.md
echo "" >> report.md
# Add current metrics
echo "## Current Metrics" >> report.md
echo '```json' >> report.md
cat metrics/eval_metrics.json >> report.md
echo '```' >> report.md
echo "" >> report.md
# Add plots
echo "## Visualizations" >> report.md
echo "### Confusion Matrix" >> report.md
cml asset publish plots/confusion_matrix.png --md >> report.md
echo "" >> report.md
echo "### ROC Curve" >> report.md
cml asset publish plots/roc_curve.png --md >> report.md
echo "" >> report.md
echo "### Feature Importance" >> report.md
cml asset publish plots/feature_importance.png --md >> report.md
# Post comment
cml comment create report.md
GitLab CI Integration
# .gitlab-ci.yml
stages:
- train
- report
train:
stage: train
image: python:3.11
script:
- pip install dvc pandas scikit-learn matplotlib
- dvc pull
- python train.py
artifacts:
paths:
- metrics/
- plots/
report:
stage: report
image: node:18
script:
- npm install -g @dvcorg/cml
- |
echo "# ML Experiment Report" > report.md
echo "## Metrics" >> report.md
cat metrics/eval_metrics.json >> report.md
echo "## Plots" >> report.md
cml asset publish plots/confusion_matrix.png --md >> report.md
cml comment create report.md
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
Key Takeaways
| CML Command | Purpose |
|---|---|
cml comment create |
Post markdown report to PR/MR |
cml asset publish |
Upload images/files, get markdown |
cml runner |
Provision cloud compute |
REPO_TOKEN |
GitHub token for API access |
المحتوى العربي
ما هو CML؟
CML (التعلم الآلي المستمر) هو أداة مفتوحة المصدر من Iterative.ai (صانعي DVC) تجلب ممارسات DevOps إلى ML. تؤتمت تقارير تجارب ML مباشرة في منصة Git الخاصة بك.
قدرات CML:
- نشر تقارير التجارب كتعليقات PR/MR
- مقارنة مقاييس النموذج عبر الفروع
- عرض الرسوم البيانية والتصورات في PRs
- توفير runners سحابية للتدريب
- التكامل مع GitHub Actions وGitLab CI
تثبيت CML
# التثبيت عبر npm (موصى به)
npm install -g @dvcorg/cml
# أو عبر pip
pip install cml
# التحقق من التثبيت
cml --version
سير عمل CML الأساسي
# .github/workflows/cml.yml
name: CML Report
on:
pull_request:
branches: [main]
jobs:
train-and-report:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- uses: actions/setup-node@v4
with:
node-version: '18'
- name: Install dependencies
run: |
pip install pandas scikit-learn matplotlib
npm install -g @dvcorg/cml
- name: Train model
run: python train.py
- name: Create CML report
env:
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# إضافة المقاييس للتقرير
echo "## Model Metrics" >> report.md
cat metrics.json >> report.md
# إضافة الرسم البياني للتقرير
echo "## Training Curves" >> report.md
cml asset publish training_curve.png --md >> report.md
# نشر التعليق على PR
cml comment create report.md
أوامر CML
cml comment create: نشر تعليق على PR/MR
# إنشاء تعليق بمحتوى markdown
cml comment create report.md
# إنشاء تعليق من stdin
echo "Training complete! Accuracy: 94%" | cml comment create
cml asset publish: رفع والحصول على رابط markdown للأصول
# نشر صورة والحصول على markdown
cml asset publish confusion_matrix.png --md >> report.md
# نشر صور متعددة
for img in plots/*.png; do
cml asset publish "$img" --md >> report.md
done
cml runner: إطلاق runners سحابية (مغطاة في الدرس 04)
مثال تقرير شامل
# .github/workflows/ml-report.yml
name: ML Experiment Report
on:
pull_request:
branches: [main]
jobs:
report:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- uses: actions/setup-node@v4
with:
node-version: '18'
- name: Install dependencies
run: |
pip install dvc pandas scikit-learn matplotlib seaborn
npm install -g @dvcorg/cml
- name: Pull data and train
run: |
dvc pull
python train.py
- name: Generate report
env:
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# بدء التقرير
echo "# ML Experiment Report" > report.md
echo "" >> report.md
# إضافة ملخص التجربة
echo "## Experiment Summary" >> report.md
echo "- **Branch:** ${{ github.head_ref }}" >> report.md
echo "- **Commit:** ${{ github.sha }}" >> report.md
echo "- **Date:** $(date)" >> report.md
echo "" >> report.md
# إضافة مقارنة المقاييس
echo "## Metrics Comparison" >> report.md
echo '```' >> report.md
dvc metrics diff main >> report.md || echo "No previous metrics" >> report.md
echo '```' >> report.md
echo "" >> report.md
# إضافة المقاييس الحالية
echo "## Current Metrics" >> report.md
echo '```json' >> report.md
cat metrics/eval_metrics.json >> report.md
echo '```' >> report.md
echo "" >> report.md
# إضافة الرسوم البيانية
echo "## Visualizations" >> report.md
echo "### Confusion Matrix" >> report.md
cml asset publish plots/confusion_matrix.png --md >> report.md
echo "" >> report.md
echo "### ROC Curve" >> report.md
cml asset publish plots/roc_curve.png --md >> report.md
echo "" >> report.md
echo "### Feature Importance" >> report.md
cml asset publish plots/feature_importance.png --md >> report.md
# نشر التعليق
cml comment create report.md
تكامل GitLab CI
# .gitlab-ci.yml
stages:
- train
- report
train:
stage: train
image: python:3.11
script:
- pip install dvc pandas scikit-learn matplotlib
- dvc pull
- python train.py
artifacts:
paths:
- metrics/
- plots/
report:
stage: report
image: node:18
script:
- npm install -g @dvcorg/cml
- |
echo "# ML Experiment Report" > report.md
echo "## Metrics" >> report.md
cat metrics/eval_metrics.json >> report.md
echo "## Plots" >> report.md
cml asset publish plots/confusion_matrix.png --md >> report.md
cml comment create report.md
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
النقاط الرئيسية
| أمر CML | الغرض |
|---|---|
cml comment create |
نشر تقرير markdown على PR/MR |
cml asset publish |
رفع الصور/الملفات، الحصول على markdown |
cml runner |
توفير حوسبة سحابية |
REPO_TOKEN |
رمز GitHub للوصول إلى API |