GitOps for ML Deployments

English Content

What is GitOps?

GitOps is an operational framework that applies DevOps best practices to infrastructure and deployment management. The core principle: Git is the single source of truth for declarative infrastructure and applications.

GitOps principles:

Declarative: Define desired state in Git
Versioned: All changes tracked in version control
Automated: Changes applied automatically
Self-healing: System reconciles to desired state

Why GitOps for ML?

ML deployments add complexity beyond traditional software:

Challenge	GitOps Solution
Model version tracking	Git tags + model registry references
Rollback requirements	Git revert to previous state
Audit trail	Git history for all changes
Multi-environment	Branch/folder per environment
Configuration drift	Continuous reconciliation

GitOps Architecture for ML

┌─────────────────────────────────────────────────────────┐
│                     Git Repository                       │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────────┐   │
│  │ /staging │  │ /prod    │  │ /model-configs       │   │
│  │ deploy/  │  │ deploy/  │  │ model-v1.2.yaml      │   │
│  └──────────┘  └──────────┘  └──────────────────────┘   │
└───────────────────────┬─────────────────────────────────┘
                        │ Watch
                        ▼
┌─────────────────────────────────────────────────────────┐
│                   GitOps Operator                        │
│                  (ArgoCD / Flux)                         │
│     ┌─────────────────────────────────────────────┐     │
│     │  Compare: Git State vs Cluster State        │     │
│     │  Action: Apply differences                  │     │
│     └─────────────────────────────────────────────┘     │
└───────────────────────┬─────────────────────────────────┘
                        │ Apply
                        ▼
┌─────────────────────────────────────────────────────────┐
│                  Kubernetes Cluster                      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │ Model Server │  │ Feature      │  │ Monitoring   │   │
│  │ (v1.2)       │  │ Store        │  │ Stack        │   │
│  └──────────────┘  └──────────────┘  └──────────────┘   │
└─────────────────────────────────────────────────────────┘

Repository Structure for ML GitOps

ml-deployments/
├── base/                          # Shared configurations
│   ├── model-server/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── kustomization.yaml
│   └── monitoring/
│       ├── prometheus-rules.yaml
│       └── grafana-dashboard.yaml
│
├── overlays/                      # Environment-specific
│   ├── staging/
│   │   ├── kustomization.yaml
│   │   ├── model-config.yaml      # Model version: v1.3-rc1
│   │   └── replicas-patch.yaml
│   └── production/
│       ├── kustomization.yaml
│       ├── model-config.yaml      # Model version: v1.2
│       └── replicas-patch.yaml
│
└── apps/                          # ArgoCD Applications
    ├── staging.yaml
    └── production.yaml

Declarative Model Deployment

Define model deployments declaratively:

# base/model-server/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-server
  labels:
    app: model-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: model-server
  template:
    metadata:
      labels:
        app: model-server
    spec:
      containers:
        - name: model-server
          image: model-registry/sentiment-classifier
          ports:
            - containerPort: 8080
          env:
            - name: MODEL_NAME
              value: "sentiment-classifier"
          resources:
            requests:
              memory: "2Gi"
              cpu: "1"
            limits:
              memory: "4Gi"
              cpu: "2"
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 10

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: ml-production

resources:
  - ../../base/model-server

images:
  - name: model-registry/sentiment-classifier
    newTag: v1.2.0  # Model version

replicas:
  - name: model-server
    count: 5

patches:
  - path: resources-patch.yaml

GitOps Workflow for Model Updates

1. Train new model → Push to Model Registry
                            ↓
2. Update Git → Change image tag in overlay
                            ↓
3. Create PR → Review model metrics & config
                            ↓
4. Merge PR → GitOps operator detects change
                            ↓
5. Auto-deploy → Operator applies to cluster
                            ↓
6. Monitor → Watch model performance metrics

Pull vs Push Deployment

Aspect	Push (Traditional CI/CD)	Pull (GitOps)
Trigger	CI pipeline pushes	Operator pulls from Git
Credentials	CI needs cluster access	Operator has cluster access
Drift detection	Manual	Automatic
Rollback	Re-run pipeline	Git revert
Audit	Pipeline logs	Git history

Key Takeaways

GitOps Concept	ML Application
Source of truth	Git repo with model configs
Declarative	Kubernetes manifests + model versions
Automated sync	ArgoCD/Flux applies changes
Self-healing	Reconciles to Git state
Rollback	Git revert to previous version

المحتوى العربي

ما هو GitOps؟

GitOps هو إطار تشغيلي يطبق أفضل ممارسات DevOps على إدارة البنية التحتية والنشر. المبدأ الأساسي: Git هو مصدر الحقيقة الوحيد للبنية التحتية والتطبيقات التصريحية.

مبادئ GitOps:

تصريحي: تعريف الحالة المطلوبة في Git
مُصدّر: جميع التغييرات مُتتبعة في التحكم بالإصدار
آلي: التغييرات تُطبق تلقائياً
ذاتي الشفاء: النظام يتوافق مع الحالة المطلوبة

لماذا GitOps لـ ML؟

نشر ML يضيف تعقيداً ما وراء البرمجيات التقليدية:

التحدي	حل GitOps
تتبع إصدار النموذج	علامات Git + مراجع سجل النماذج
متطلبات التراجع	Git revert للحالة السابقة
مسار التدقيق	تاريخ Git لجميع التغييرات
بيئات متعددة	فرع/مجلد لكل بيئة
انجراف التكوين	التوافق المستمر

بنية GitOps لـ ML

┌─────────────────────────────────────────────────────────┐
│                     Git Repository                       │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────────┐   │
│  │ /staging │  │ /prod    │  │ /model-configs       │   │
│  │ deploy/  │  │ deploy/  │  │ model-v1.2.yaml      │   │
│  └──────────┘  └──────────┘  └──────────────────────┘   │
└───────────────────────┬─────────────────────────────────┘
                        │ مراقبة
                        ▼
┌─────────────────────────────────────────────────────────┐
│                   GitOps Operator                        │
│                  (ArgoCD / Flux)                         │
│     ┌─────────────────────────────────────────────┐     │
│     │  المقارنة: حالة Git مقابل حالة الكلستر       │     │
│     │  الإجراء: تطبيق الاختلافات                   │     │
│     └─────────────────────────────────────────────┘     │
└───────────────────────┬─────────────────────────────────┘
                        │ تطبيق
                        ▼
┌─────────────────────────────────────────────────────────┐
│                  Kubernetes Cluster                      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │ Model Server │  │ Feature      │  │ Monitoring   │   │
│  │ (v1.2)       │  │ Store        │  │ Stack        │   │
│  └──────────────┘  └──────────────┘  └──────────────┘   │
└─────────────────────────────────────────────────────────┘

هيكل المستودع لـ ML GitOps

ml-deployments/
├── base/                          # التكوينات المشتركة
│   ├── model-server/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── kustomization.yaml
│   └── monitoring/
│       ├── prometheus-rules.yaml
│       └── grafana-dashboard.yaml
│
├── overlays/                      # خاص بالبيئة
│   ├── staging/
│   │   ├── kustomization.yaml
│   │   ├── model-config.yaml      # إصدار النموذج: v1.3-rc1
│   │   └── replicas-patch.yaml
│   └── production/
│       ├── kustomization.yaml
│       ├── model-config.yaml      # إصدار النموذج: v1.2
│       └── replicas-patch.yaml
│
└── apps/                          # تطبيقات ArgoCD
    ├── staging.yaml
    └── production.yaml

نشر النموذج التصريحي

عرّف نشر النماذج بشكل تصريحي:

# base/model-server/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-server
  labels:
    app: model-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: model-server
  template:
    metadata:
      labels:
        app: model-server
    spec:
      containers:
        - name: model-server
          image: model-registry/sentiment-classifier
          ports:
            - containerPort: 8080
          env:
            - name: MODEL_NAME
              value: "sentiment-classifier"
          resources:
            requests:
              memory: "2Gi"
              cpu: "1"
            limits:
              memory: "4Gi"
              cpu: "2"
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 10

# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: ml-production

resources:
  - ../../base/model-server

images:
  - name: model-registry/sentiment-classifier
    newTag: v1.2.0  # إصدار النموذج

replicas:
  - name: model-server
    count: 5

patches:
  - path: resources-patch.yaml

سير عمل GitOps لتحديثات النموذج

1. تدريب نموذج جديد → دفع إلى سجل النماذج
                            ↓
2. تحديث Git → تغيير علامة الصورة في overlay
                            ↓
3. إنشاء PR → مراجعة مقاييس النموذج والتكوين
                            ↓
4. دمج PR → مشغل GitOps يكتشف التغيير
                            ↓
5. نشر تلقائي → المشغل يطبق على الكلستر
                            ↓
6. المراقبة → مشاهدة مقاييس أداء النموذج

النشر بالسحب مقابل الدفع

الجانب	الدفع (CI/CD التقليدي)	السحب (GitOps)
المحفز	خط أنابيب CI يدفع	المشغل يسحب من Git
بيانات الاعتماد	CI يحتاج وصول الكلستر	المشغل لديه وصول الكلستر
اكتشاف الانجراف	يدوي	تلقائي
التراجع	إعادة تشغيل pipeline	Git revert
التدقيق	سجلات pipeline	تاريخ Git

النقاط الرئيسية

مفهوم GitOps	تطبيق ML
مصدر الحقيقة	مستودع Git مع تكوينات النماذج
تصريحي	manifests Kubernetes + إصدارات النماذج
المزامنة الآلية	ArgoCD/Flux يطبق التغييرات
ذاتي الشفاء	يتوافق مع حالة Git
التراجع	Git revert للإصدار السابق