Model Registry & Serving

Model Registry Concepts

3 min read

A model registry is a centralized hub for managing the lifecycle of ML models—from experimentation to production. It brings version control, governance, and collaboration to model management.

Why Model Registry?

Without a registry:

Models scattered across:
├── /home/alice/models/best_model_v2_final_FINAL.pkl
├── /home/bob/experiments/model_2025_01_15.h5
├── s3://bucket/models/classifier/
├── /mnt/shared/archived_models/
└── "I think the production model is in Slack somewhere..."

With a registry:

Model Registry
├── fraud-detector
│   ├── Version 1 (Staging)
│   ├── Version 2 (Production) ← Current
│   └── Version 3 (Development)
├── recommendation-engine
│   ├── Version 1 (Archived)
│   └── Version 2 (Production)
└── churn-predictor
    └── Version 1 (Production)

Core Concepts

Model

A trained ML model ready for deployment:

# What gets registered
model = {
    "name": "fraud-detector",
    "version": 3,
    "artifacts": {
        "model.pkl": "s3://bucket/models/fraud/v3/model.pkl",
        "preprocessor.pkl": "s3://bucket/models/fraud/v3/preprocessor.pkl"
    },
    "metrics": {
        "accuracy": 0.95,
        "f1_score": 0.93,
        "auc_roc": 0.98
    },
    "parameters": {
        "n_estimators": 100,
        "max_depth": 10
    },
    "tags": {
        "team": "risk",
        "use_case": "real-time fraud detection"
    }
}

Model Version

Each training run produces a new version:

fraud-detector
├── v1: accuracy=0.85, created=2025-01-01
├── v2: accuracy=0.90, created=2025-01-15
└── v3: accuracy=0.95, created=2025-01-20 ← Latest

Model Stage

Stages track where a model is in its lifecycle:

Stage Description Who Can Access
Development Experimental, not tested Data scientists
Staging Under testing/validation QA team
Production Live, serving traffic Production systems
Archived Deprecated, kept for audit Compliance
Development ──▶ Staging ──▶ Production
                          Archived

Model Registry Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Model Registry                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Metadata Store                         │  │
│  │  • Model name, version, stage                             │  │
│  │  • Training parameters                                    │  │
│  │  • Metrics and tags                                       │  │
│  │  • Lineage (data, code, experiment)                       │  │
│  └───────────────────────────────────────────────────────────┘  │
│                              │                                  │
│                              ▼                                  │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Artifact Store                         │  │
│  │  • Model files (pkl, pt, onnx, savedmodel)               │  │
│  │  • Preprocessing pipelines                                │  │
│  │  • Configuration files                                    │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
         Training        Serving         CI/CD
         Pipeline        System          Pipeline

Key Features

1. Version Control

# Register multiple versions
mlflow.register_model("runs:/abc123/model", "fraud-detector")  # v1
mlflow.register_model("runs:/def456/model", "fraud-detector")  # v2
mlflow.register_model("runs:/ghi789/model", "fraud-detector")  # v3

2. Stage Transitions

# Promote model to production
client.transition_model_version_stage(
    name="fraud-detector",
    version=3,
    stage="Production"
)

3. Model Lineage

Model: fraud-detector v3
├── Training Run: experiment_123/run_456
├── Dataset: s3://bucket/data/train_2025_01.parquet
├── Code: git@github.com:org/repo.git@commit_abc
├── Environment: python=3.11, sklearn=1.4.0
└── Parent Model: fraud-detector v2

4. Access Control

Role Permissions
Data Scientist Create, read models
ML Engineer Promote to staging
DevOps Promote to production
Admin Delete, archive models

Model Metadata

What to Track

Category Examples
Identity Name, version, aliases
Performance Accuracy, latency, throughput
Training Hyperparameters, dataset version
Lineage Experiment ID, code commit
Operational Owner, team, SLA requirements

Example Metadata

model:
  name: fraud-detector
  version: 3
  stage: Production

metrics:
  accuracy: 0.95
  f1_score: 0.93
  latency_p99_ms: 15
  throughput_qps: 1000

training:
  experiment_id: exp_123
  run_id: run_456
  dataset_version: v2.1
  training_date: "2025-01-20"

parameters:
  algorithm: XGBoost
  n_estimators: 100
  max_depth: 10
  learning_rate: 0.1

tags:
  team: risk
  owner: alice@company.com
  compliance: SOC2

Model Registry Options

Tool Type Best For
MLflow Open-source General purpose
Weights & Biases Managed Experiment tracking + registry
Neptune Managed MLOps teams
SageMaker Cloud AWS ecosystem
Vertex AI Cloud GCP ecosystem

Best Practices

Practice Why
One model per use case Clear ownership and versioning
Meaningful version descriptions Know what changed
Automate stage transitions Reduce human error
Enforce approval workflows Governance and compliance
Track all metadata Full reproducibility

Key insight: A model registry transforms model management from ad-hoc file sharing to a governed, auditable process—essential for production ML at scale.

Next, we'll explore MLflow Model Registry in depth. :::

Quiz

Module 5: Model Registry & Serving

Take Quiz