GitLab CI/CD & Alternative Platforms

GitLab Model Registry Integration

5 min read

English Content

GitLab Model Registry Overview

GitLab introduced a built-in Model Registry that provides MLflow-compatible model versioning directly within the platform. This eliminates the need for external model registries and integrates seamlessly with GitLab CI/CD pipelines.

Key features:

  • MLflow-compatible API
  • Model versioning with semantic versions
  • Candidate and production model stages
  • Direct integration with CI/CD pipelines
  • Artifact storage with GitLab's infrastructure

Enabling Model Registry

The Model Registry is available at the project level under Deploy > Model registry. To use it programmatically, you'll interact with it through the MLflow API:

# scripts/register_model.py
import mlflow
import os

# Configure MLflow to use GitLab's Model Registry
mlflow.set_tracking_uri(os.environ["CI_SERVER_URL"])
mlflow.set_experiment(os.environ["CI_PROJECT_PATH"])

# Set authentication
os.environ["MLFLOW_TRACKING_TOKEN"] = os.environ["CI_JOB_TOKEN"]

def register_model(model_path, model_name, metrics):
    """Register a model with GitLab Model Registry."""
    with mlflow.start_run() as run:
        # Log metrics
        for metric_name, value in metrics.items():
            mlflow.log_metric(metric_name, value)

        # Log model artifacts
        mlflow.log_artifacts(model_path, "model")

        # Register the model
        model_uri = f"runs:/{run.info.run_id}/model"
        mlflow.register_model(model_uri, model_name)

        return run.info.run_id

if __name__ == "__main__":
    metrics = {
        "accuracy": 0.94,
        "f1_score": 0.92,
        "latency_ms": 45
    }
    run_id = register_model("models/trained/", "sentiment-classifier", metrics)
    print(f"Registered model with run ID: {run_id}")

CI/CD Integration for Model Registration

Integrate model registration into your GitLab CI/CD pipeline:

# .gitlab-ci.yml
stages:
  - train
  - register
  - deploy

variables:
  MODEL_NAME: "sentiment-classifier"

train-model:
  stage: train
  image: python:3.11
  script:
    - pip install -r requirements.txt
    - python scripts/train.py
    - python scripts/evaluate.py > metrics.json
  artifacts:
    paths:
      - models/
      - metrics.json
    expire_in: 7 days

register-model:
  stage: register
  image: python:3.11
  script:
    - pip install mlflow
    - |
      python << 'EOF'
      import mlflow
      import json
      import os

      # Configure GitLab Model Registry
      mlflow.set_tracking_uri(os.environ["CI_SERVER_URL"])

      with open("metrics.json") as f:
          metrics = json.load(f)

      with mlflow.start_run():
          for k, v in metrics.items():
              mlflow.log_metric(k, v)
          mlflow.log_artifacts("models/", "model")

          model_uri = f"runs:/{mlflow.active_run().info.run_id}/model"
          mlflow.register_model(model_uri, os.environ["MODEL_NAME"])
      EOF
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  needs:
    - train-model

deploy-model:
  stage: deploy
  script:
    - echo "Deploying model $MODEL_NAME"
    - python scripts/deploy_from_registry.py
  environment:
    name: production
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  needs:
    - register-model

Model Versioning Workflow

Implement a complete model versioning workflow:

# .gitlab-ci.yml
variables:
  MODEL_NAME: "fraud-detector"
  ACCURACY_THRESHOLD: "0.90"

stages:
  - train
  - validate
  - register
  - promote

train-candidate:
  stage: train
  script:
    - python scripts/train.py
    - python scripts/evaluate.py --output metrics.json
  artifacts:
    paths:
      - models/candidate/
      - metrics.json

validate-candidate:
  stage: validate
  script:
    - |
      python << 'EOF'
      import json
      import sys

      with open("metrics.json") as f:
          metrics = json.load(f)

      threshold = float("$ACCURACY_THRESHOLD")
      if metrics["accuracy"] < threshold:
          print(f"Model accuracy {metrics['accuracy']} below threshold {threshold}")
          sys.exit(1)

      print(f"Model passed validation with accuracy {metrics['accuracy']}")
      EOF
  needs:
    - train-candidate

register-candidate:
  stage: register
  script:
    - |
      python << 'EOF'
      import mlflow
      import os

      mlflow.set_tracking_uri(os.environ["CI_SERVER_URL"])

      with mlflow.start_run():
          mlflow.log_artifacts("models/candidate/", "model")
          model_uri = f"runs:/{mlflow.active_run().info.run_id}/model"

          # Register as new version in "Staging" stage
          result = mlflow.register_model(model_uri, "$MODEL_NAME")

          client = mlflow.tracking.MlflowClient()
          client.transition_model_version_stage(
              name="$MODEL_NAME",
              version=result.version,
              stage="Staging"
          )
          print(f"Registered version {result.version} in Staging")
      EOF
  needs:
    - validate-candidate

promote-to-production:
  stage: promote
  script:
    - |
      python << 'EOF'
      import mlflow
      import os

      mlflow.set_tracking_uri(os.environ["CI_SERVER_URL"])
      client = mlflow.tracking.MlflowClient()

      # Get latest staging version
      staging_versions = client.get_latest_versions("$MODEL_NAME", stages=["Staging"])
      if staging_versions:
          version = staging_versions[0].version
          client.transition_model_version_stage(
              name="$MODEL_NAME",
              version=version,
              stage="Production",
              archive_existing_versions=True
          )
          print(f"Promoted version {version} to Production")
      EOF
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  needs:
    - register-candidate

Fetching Models from Registry

Deploy models by fetching from the registry:

# scripts/deploy_from_registry.py
import mlflow
import os

def load_production_model(model_name):
    """Load the current production model from GitLab Model Registry."""
    mlflow.set_tracking_uri(os.environ["CI_SERVER_URL"])

    # Load model in production stage
    model_uri = f"models:/{model_name}/Production"
    model = mlflow.pyfunc.load_model(model_uri)

    return model

def get_model_metadata(model_name):
    """Get metadata for the production model."""
    client = mlflow.tracking.MlflowClient()

    versions = client.get_latest_versions(model_name, stages=["Production"])
    if versions:
        version = versions[0]
        return {
            "version": version.version,
            "run_id": version.run_id,
            "created_at": version.creation_timestamp,
            "description": version.description
        }
    return None

if __name__ == "__main__":
    model_name = os.environ.get("MODEL_NAME", "sentiment-classifier")

    metadata = get_model_metadata(model_name)
    print(f"Deploying model version {metadata['version']}")

    model = load_production_model(model_name)
    # Deploy model to serving infrastructure...

Key Takeaways

Aspect GitLab Model Registry
API Compatibility MLflow-compatible
Model Stages None, Staging, Production, Archived
Authentication CI_JOB_TOKEN in pipelines
Versioning Automatic version incrementing
Integration Native GitLab CI/CD support

المحتوى العربي

نظرة عامة على سجل نماذج GitLab

قدم GitLab سجل نماذج مدمج يوفر إصدار نماذج متوافق مع MLflow مباشرة داخل المنصة. هذا يلغي الحاجة لسجلات نماذج خارجية ويتكامل بسلاسة مع خطوط أنابيب GitLab CI/CD.

الميزات الرئيسية:

  • API متوافق مع MLflow
  • إصدار النماذج مع إصدارات دلالية
  • مراحل النموذج المرشح والإنتاج
  • تكامل مباشر مع خطوط أنابيب CI/CD
  • تخزين artifacts مع بنية GitLab التحتية

تمكين سجل النماذج

سجل النماذج متاح على مستوى المشروع تحت Deploy > Model registry. لاستخدامه برمجياً، ستتفاعل معه من خلال MLflow API:

# scripts/register_model.py
import mlflow
import os

# تكوين MLflow لاستخدام سجل نماذج GitLab
mlflow.set_tracking_uri(os.environ["CI_SERVER_URL"])
mlflow.set_experiment(os.environ["CI_PROJECT_PATH"])

# تعيين المصادقة
os.environ["MLFLOW_TRACKING_TOKEN"] = os.environ["CI_JOB_TOKEN"]

def register_model(model_path, model_name, metrics):
    """تسجيل نموذج مع سجل نماذج GitLab."""
    with mlflow.start_run() as run:
        # تسجيل المقاييس
        for metric_name, value in metrics.items():
            mlflow.log_metric(metric_name, value)

        # تسجيل artifacts النموذج
        mlflow.log_artifacts(model_path, "model")

        # تسجيل النموذج
        model_uri = f"runs:/{run.info.run_id}/model"
        mlflow.register_model(model_uri, model_name)

        return run.info.run_id

if __name__ == "__main__":
    metrics = {
        "accuracy": 0.94,
        "f1_score": 0.92,
        "latency_ms": 45
    }
    run_id = register_model("models/trained/", "sentiment-classifier", metrics)
    print(f"Registered model with run ID: {run_id}")

تكامل CI/CD لتسجيل النماذج

دمج تسجيل النماذج في خط أنابيب GitLab CI/CD:

# .gitlab-ci.yml
stages:
  - train
  - register
  - deploy

variables:
  MODEL_NAME: "sentiment-classifier"

train-model:
  stage: train
  image: python:3.11
  script:
    - pip install -r requirements.txt
    - python scripts/train.py
    - python scripts/evaluate.py > metrics.json
  artifacts:
    paths:
      - models/
      - metrics.json
    expire_in: 7 days

register-model:
  stage: register
  image: python:3.11
  script:
    - pip install mlflow
    - |
      python << 'EOF'
      import mlflow
      import json
      import os

      # تكوين سجل نماذج GitLab
      mlflow.set_tracking_uri(os.environ["CI_SERVER_URL"])

      with open("metrics.json") as f:
          metrics = json.load(f)

      with mlflow.start_run():
          for k, v in metrics.items():
              mlflow.log_metric(k, v)
          mlflow.log_artifacts("models/", "model")

          model_uri = f"runs:/{mlflow.active_run().info.run_id}/model"
          mlflow.register_model(model_uri, os.environ["MODEL_NAME"])
      EOF
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  needs:
    - train-model

deploy-model:
  stage: deploy
  script:
    - echo "Deploying model $MODEL_NAME"
    - python scripts/deploy_from_registry.py
  environment:
    name: production
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  needs:
    - register-model

سير عمل إصدار النماذج

نفّذ سير عمل كامل لإصدار النماذج:

# .gitlab-ci.yml
variables:
  MODEL_NAME: "fraud-detector"
  ACCURACY_THRESHOLD: "0.90"

stages:
  - train
  - validate
  - register
  - promote

train-candidate:
  stage: train
  script:
    - python scripts/train.py
    - python scripts/evaluate.py --output metrics.json
  artifacts:
    paths:
      - models/candidate/
      - metrics.json

validate-candidate:
  stage: validate
  script:
    - |
      python << 'EOF'
      import json
      import sys

      with open("metrics.json") as f:
          metrics = json.load(f)

      threshold = float("$ACCURACY_THRESHOLD")
      if metrics["accuracy"] < threshold:
          print(f"Model accuracy {metrics['accuracy']} below threshold {threshold}")
          sys.exit(1)

      print(f"Model passed validation with accuracy {metrics['accuracy']}")
      EOF
  needs:
    - train-candidate

register-candidate:
  stage: register
  script:
    - |
      python << 'EOF'
      import mlflow
      import os

      mlflow.set_tracking_uri(os.environ["CI_SERVER_URL"])

      with mlflow.start_run():
          mlflow.log_artifacts("models/candidate/", "model")
          model_uri = f"runs:/{mlflow.active_run().info.run_id}/model"

          # التسجيل كإصدار جديد في مرحلة "Staging"
          result = mlflow.register_model(model_uri, "$MODEL_NAME")

          client = mlflow.tracking.MlflowClient()
          client.transition_model_version_stage(
              name="$MODEL_NAME",
              version=result.version,
              stage="Staging"
          )
          print(f"Registered version {result.version} in Staging")
      EOF
  needs:
    - validate-candidate

promote-to-production:
  stage: promote
  script:
    - |
      python << 'EOF'
      import mlflow
      import os

      mlflow.set_tracking_uri(os.environ["CI_SERVER_URL"])
      client = mlflow.tracking.MlflowClient()

      # الحصول على أحدث إصدار staging
      staging_versions = client.get_latest_versions("$MODEL_NAME", stages=["Staging"])
      if staging_versions:
          version = staging_versions[0].version
          client.transition_model_version_stage(
              name="$MODEL_NAME",
              version=version,
              stage="Production",
              archive_existing_versions=True
          )
          print(f"Promoted version {version} to Production")
      EOF
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  needs:
    - register-candidate

جلب النماذج من السجل

انشر النماذج بجلبها من السجل:

# scripts/deploy_from_registry.py
import mlflow
import os

def load_production_model(model_name):
    """تحميل نموذج الإنتاج الحالي من سجل نماذج GitLab."""
    mlflow.set_tracking_uri(os.environ["CI_SERVER_URL"])

    # تحميل النموذج في مرحلة الإنتاج
    model_uri = f"models:/{model_name}/Production"
    model = mlflow.pyfunc.load_model(model_uri)

    return model

def get_model_metadata(model_name):
    """الحصول على البيانات الوصفية لنموذج الإنتاج."""
    client = mlflow.tracking.MlflowClient()

    versions = client.get_latest_versions(model_name, stages=["Production"])
    if versions:
        version = versions[0]
        return {
            "version": version.version,
            "run_id": version.run_id,
            "created_at": version.creation_timestamp,
            "description": version.description
        }
    return None

if __name__ == "__main__":
    model_name = os.environ.get("MODEL_NAME", "sentiment-classifier")

    metadata = get_model_metadata(model_name)
    print(f"Deploying model version {metadata['version']}")

    model = load_production_model(model_name)
    # نشر النموذج إلى بنية التقديم التحتية...

النقاط الرئيسية

الجانب سجل نماذج GitLab
توافق API متوافق مع MLflow
مراحل النموذج None، Staging، Production، Archived
المصادقة CI_JOB_TOKEN في pipelines
الإصدار زيادة الإصدار تلقائياً
التكامل دعم GitLab CI/CD الأصلي

Quiz

Module 3: GitLab CI/CD & Alternative Platforms

Take Quiz