GitLab CI/CD for ML

English Content

Why GitLab for ML Pipelines?

GitLab provides a complete DevOps platform with built-in CI/CD, making it an excellent choice for ML workflows. Unlike GitHub Actions, GitLab CI/CD is deeply integrated with the platform, offering features like built-in container registry, model registry, and experiment tracking.

Key advantages for ML teams:

Integrated MLflow-compatible Model Registry
Built-in container registry for training images
Parent-child pipelines for complex ML workflows
GPU runner support (SaaS and self-hosted)
Artifact management with expiration policies

GitLab CI/CD Basics

GitLab CI/CD uses a .gitlab-ci.yml file in your repository root:

# .gitlab-ci.yml - Basic ML Pipeline Structure
stages:
  - validate
  - train
  - evaluate
  - deploy

variables:
  PYTHON_VERSION: "3.11"
  MODEL_NAME: "sentiment-classifier"

default:
  image: python:${PYTHON_VERSION}
  before_script:
    - pip install -r requirements.txt

validate-data:
  stage: validate
  script:
    - python scripts/validate_data.py
  artifacts:
    reports:
      dotenv: data_validation.env

train-model:
  stage: train
  script:
    - python scripts/train.py
    - echo "MODEL_VERSION=$(cat model_version.txt)" >> build.env
  artifacts:
    paths:
      - models/
      - metrics/
    reports:
      dotenv: build.env
  needs:
    - validate-data

evaluate-model:
  stage: evaluate
  script:
    - python scripts/evaluate.py
  artifacts:
    paths:
      - reports/
    reports:
      metrics: metrics.txt
  needs:
    - train-model

deploy-model:
  stage: deploy
  script:
    - python scripts/deploy.py
  environment:
    name: production
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  needs:
    - evaluate-model

Pipeline Triggers for ML

GitLab supports multiple trigger types essential for ML workflows:

# Trigger on data changes
train-on-data-change:
  stage: train
  script:
    - python scripts/train.py
  rules:
    - changes:
        - data/**/*
        - features/**/*
      when: always
    - when: never

# Scheduled retraining
scheduled-retrain:
  stage: train
  script:
    - python scripts/full_retrain.py
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"

# Manual trigger with parameters
manual-experiment:
  stage: train
  script:
    - python scripts/train.py --lr $LEARNING_RATE --epochs $EPOCHS
  rules:
    - if: $CI_PIPELINE_SOURCE == "web"
      when: manual
  variables:
    LEARNING_RATE: "0.001"
    EPOCHS: "100"

Caching and Artifacts

Efficient caching is crucial for ML pipelines:

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - .cache/pip
    - .venv/
    - data/processed/  # Cache processed datasets

train-model:
  stage: train
  cache:
    key: "model-cache-${CI_COMMIT_REF_SLUG}"
    paths:
      - models/checkpoints/
    policy: pull-push
  artifacts:
    paths:
      - models/final/
    expire_in: 30 days
  script:
    - python scripts/train.py

Parent-Child Pipelines for Complex ML

For complex ML workflows, use parent-child pipelines:

# .gitlab-ci.yml (parent)
stages:
  - prepare
  - experiments
  - select

prepare-data:
  stage: prepare
  script:
    - python scripts/prepare_data.py
  artifacts:
    paths:
      - data/prepared/

run-experiments:
  stage: experiments
  trigger:
    include: .gitlab/experiments.yml
    strategy: depend
  variables:
    DATA_PATH: data/prepared/

select-best:
  stage: select
  script:
    - python scripts/select_best_model.py
  needs:
    - run-experiments

# .gitlab/experiments.yml (child)
stages:
  - train

.train-template:
  stage: train
  script:
    - python scripts/train.py --config configs/${CONFIG_FILE}
  artifacts:
    paths:
      - models/${CONFIG_FILE}/
    reports:
      metrics: metrics_${CONFIG_FILE}.txt

train-config-a:
  extends: .train-template
  variables:
    CONFIG_FILE: "config_a.yaml"

train-config-b:
  extends: .train-template
  variables:
    CONFIG_FILE: "config_b.yaml"

train-config-c:
  extends: .train-template
  variables:
    CONFIG_FILE: "config_c.yaml"

Using extends and includes

Keep your CI configuration DRY with templates:

# .gitlab/ci/templates.yml
.ml-job-template:
  image: python:3.11
  before_script:
    - pip install -r requirements.txt
  tags:
    - ml-runner
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

.gpu-job-template:
  extends: .ml-job-template
  image: nvidia/cuda:12.0-runtime-ubuntu22.04
  tags:
    - gpu
  variables:
    CUDA_VISIBLE_DEVICES: "0"

# .gitlab-ci.yml
include:
  - local: '.gitlab/ci/templates.yml'

train-cpu:
  extends: .ml-job-template
  stage: train
  script:
    - python scripts/train.py --device cpu

train-gpu:
  extends: .gpu-job-template
  stage: train
  script:
    - python scripts/train.py --device cuda

Key Takeaways

Feature	GitLab CI/CD Approach
Configuration	`.gitlab-ci.yml` in repo root
Stages	Sequential by default, parallel within stage
Dependencies	`needs` keyword for DAG execution
Caching	Key-based with policy control
Complex workflows	Parent-child pipelines
Templates	`extends` and `include` keywords

المحتوى العربي

لماذا GitLab لخطوط أنابيب ML؟

يوفر GitLab منصة DevOps كاملة مع CI/CD مدمج، مما يجعله خياراً ممتازاً لسير عمل ML. على عكس GitHub Actions، GitLab CI/CD مدمج بعمق مع المنصة، ويقدم ميزات مثل سجل الحاويات المدمج وسجل النماذج وتتبع التجارب.

المزايا الرئيسية لفرق ML:

سجل نماذج مدمج متوافق مع MLflow
سجل حاويات مدمج لصور التدريب
خطوط أنابيب أب-طفل للسير العمل المعقدة
دعم GPU runner (SaaS وذاتية الاستضافة)
إدارة artifacts مع سياسات انتهاء الصلاحية

أساسيات GitLab CI/CD

يستخدم GitLab CI/CD ملف .gitlab-ci.yml في جذر مستودعك:

# .gitlab-ci.yml - هيكل خط أنابيب ML الأساسي
stages:
  - validate
  - train
  - evaluate
  - deploy

variables:
  PYTHON_VERSION: "3.11"
  MODEL_NAME: "sentiment-classifier"

default:
  image: python:${PYTHON_VERSION}
  before_script:
    - pip install -r requirements.txt

validate-data:
  stage: validate
  script:
    - python scripts/validate_data.py
  artifacts:
    reports:
      dotenv: data_validation.env

train-model:
  stage: train
  script:
    - python scripts/train.py
    - echo "MODEL_VERSION=$(cat model_version.txt)" >> build.env
  artifacts:
    paths:
      - models/
      - metrics/
    reports:
      dotenv: build.env
  needs:
    - validate-data

evaluate-model:
  stage: evaluate
  script:
    - python scripts/evaluate.py
  artifacts:
    paths:
      - reports/
    reports:
      metrics: metrics.txt
  needs:
    - train-model

deploy-model:
  stage: deploy
  script:
    - python scripts/deploy.py
  environment:
    name: production
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  needs:
    - evaluate-model

محفزات Pipeline لـ ML

يدعم GitLab أنواع محفزات متعددة أساسية لسير عمل ML:

# التشغيل عند تغييرات البيانات
train-on-data-change:
  stage: train
  script:
    - python scripts/train.py
  rules:
    - changes:
        - data/**/*
        - features/**/*
      when: always
    - when: never

# إعادة التدريب المجدولة
scheduled-retrain:
  stage: train
  script:
    - python scripts/full_retrain.py
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"

# تشغيل يدوي مع معلمات
manual-experiment:
  stage: train
  script:
    - python scripts/train.py --lr $LEARNING_RATE --epochs $EPOCHS
  rules:
    - if: $CI_PIPELINE_SOURCE == "web"
      when: manual
  variables:
    LEARNING_RATE: "0.001"
    EPOCHS: "100"

التخزين المؤقت والـ Artifacts

التخزين المؤقت الفعال أمر حاسم لخطوط أنابيب ML:

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - .cache/pip
    - .venv/
    - data/processed/  # تخزين مؤقت للبيانات المعالجة

train-model:
  stage: train
  cache:
    key: "model-cache-${CI_COMMIT_REF_SLUG}"
    paths:
      - models/checkpoints/
    policy: pull-push
  artifacts:
    paths:
      - models/final/
    expire_in: 30 days
  script:
    - python scripts/train.py

خطوط الأنابيب أب-طفل لـ ML المعقد

للسير العمل المعقدة في ML، استخدم خطوط أنابيب أب-طفل:

# .gitlab-ci.yml (الأب)
stages:
  - prepare
  - experiments
  - select

prepare-data:
  stage: prepare
  script:
    - python scripts/prepare_data.py
  artifacts:
    paths:
      - data/prepared/

run-experiments:
  stage: experiments
  trigger:
    include: .gitlab/experiments.yml
    strategy: depend
  variables:
    DATA_PATH: data/prepared/

select-best:
  stage: select
  script:
    - python scripts/select_best_model.py
  needs:
    - run-experiments

# .gitlab/experiments.yml (الطفل)
stages:
  - train

.train-template:
  stage: train
  script:
    - python scripts/train.py --config configs/${CONFIG_FILE}
  artifacts:
    paths:
      - models/${CONFIG_FILE}/
    reports:
      metrics: metrics_${CONFIG_FILE}.txt

train-config-a:
  extends: .train-template
  variables:
    CONFIG_FILE: "config_a.yaml"

train-config-b:
  extends: .train-template
  variables:
    CONFIG_FILE: "config_b.yaml"

train-config-c:
  extends: .train-template
  variables:
    CONFIG_FILE: "config_c.yaml"

استخدام extends وincludes

حافظ على تكوين CI الخاص بك جافاً مع القوالب:

# .gitlab/ci/templates.yml
.ml-job-template:
  image: python:3.11
  before_script:
    - pip install -r requirements.txt
  tags:
    - ml-runner
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

.gpu-job-template:
  extends: .ml-job-template
  image: nvidia/cuda:12.0-runtime-ubuntu22.04
  tags:
    - gpu
  variables:
    CUDA_VISIBLE_DEVICES: "0"

# .gitlab-ci.yml
include:
  - local: '.gitlab/ci/templates.yml'

train-cpu:
  extends: .ml-job-template
  stage: train
  script:
    - python scripts/train.py --device cpu

train-gpu:
  extends: .gpu-job-template
  stage: train
  script:
    - python scripts/train.py --device cuda

النقاط الرئيسية

الميزة	نهج GitLab CI/CD
التكوين	`.gitlab-ci.yml` في جذر المستودع
المراحل	تسلسلية افتراضياً، متوازية داخل المرحلة
التبعيات	كلمة `needs` المفتاحية لتنفيذ DAG
التخزين المؤقت	قائم على المفتاح مع تحكم السياسة
السير العمل المعقدة	خطوط أنابيب أب-طفل
القوالب	كلمات `extends` و`include` المفتاحية