Model Registry Concepts

A model registry is a centralized hub for managing the lifecycle of ML models—from experimentation to production. It brings version control, governance, and collaboration to model management.

Why Model Registry?

Without a registry:

Models scattered across:
├── /home/alice/models/best_model_v2_final_FINAL.pkl
├── /home/bob/experiments/model_2025_01_15.h5
├── s3://bucket/models/classifier/
├── /mnt/shared/archived_models/
└── "I think the production model is in Slack somewhere..."

With a registry:

Model Registry
├── fraud-detector
│   ├── Version 1 (Staging)
│   ├── Version 2 (Production) ← Current
│   └── Version 3 (Development)
├── recommendation-engine
│   ├── Version 1 (Archived)
│   └── Version 2 (Production)
└── churn-predictor
    └── Version 1 (Production)

Core Concepts

Model

A trained ML model ready for deployment:

# What gets registered
model = {
    "name": "fraud-detector",
    "version": 3,
    "artifacts": {
        "model.pkl": "s3://bucket/models/fraud/v3/model.pkl",
        "preprocessor.pkl": "s3://bucket/models/fraud/v3/preprocessor.pkl"
    },
    "metrics": {
        "accuracy": 0.95,
        "f1_score": 0.93,
        "auc_roc": 0.98
    },
    "parameters": {
        "n_estimators": 100,
        "max_depth": 10
    },
    "tags": {
        "team": "risk",
        "use_case": "real-time fraud detection"
    }
}

Model Version

Each training run produces a new version:

fraud-detector
├── v1: accuracy=0.85, created=2025-01-01
├── v2: accuracy=0.90, created=2025-01-15
└── v3: accuracy=0.95, created=2025-01-20 ← Latest

Model Stage

Stages track where a model is in its lifecycle:

Stage	Description	Who Can Access
Development	Experimental, not tested	Data scientists
Staging	Under testing/validation	QA team
Production	Live, serving traffic	Production systems
Archived	Deprecated, kept for audit	Compliance

Development ──▶ Staging ──▶ Production
                              │
                              ▼
                          Archived

Model Registry Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Model Registry                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Metadata Store                         │  │
│  │  • Model name, version, stage                             │  │
│  │  • Training parameters                                    │  │
│  │  • Metrics and tags                                       │  │
│  │  • Lineage (data, code, experiment)                       │  │
│  └───────────────────────────────────────────────────────────┘  │
│                              │                                  │
│                              ▼                                  │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Artifact Store                         │  │
│  │  • Model files (pkl, pt, onnx, savedmodel)               │  │
│  │  • Preprocessing pipelines                                │  │
│  │  • Configuration files                                    │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
         Training        Serving         CI/CD
         Pipeline        System          Pipeline

Key Features

1. Version Control

# Register multiple versions
mlflow.register_model("runs:/abc123/model", "fraud-detector")  # v1
mlflow.register_model("runs:/def456/model", "fraud-detector")  # v2
mlflow.register_model("runs:/ghi789/model", "fraud-detector")  # v3

2. Stage Transitions

# Promote model to production
client.transition_model_version_stage(
    name="fraud-detector",
    version=3,
    stage="Production"
)

3. Model Lineage

Model: fraud-detector v3
├── Training Run: experiment_123/run_456
├── Dataset: s3://bucket/data/train_2025_01.parquet
├── Code: git@github.com:org/repo.git@commit_abc
├── Environment: python=3.11, sklearn=1.4.0
└── Parent Model: fraud-detector v2

4. Access Control

Role	Permissions
Data Scientist	Create, read models
ML Engineer	Promote to staging
DevOps	Promote to production
Admin	Delete, archive models

Model Metadata

What to Track

Category	Examples
Identity	Name, version, aliases
Performance	Accuracy, latency, throughput
Training	Hyperparameters, dataset version
Lineage	Experiment ID, code commit
Operational	Owner, team, SLA requirements

Example Metadata

model:
  name: fraud-detector
  version: 3
  stage: Production

metrics:
  accuracy: 0.95
  f1_score: 0.93
  latency_p99_ms: 15
  throughput_qps: 1000

training:
  experiment_id: exp_123
  run_id: run_456
  dataset_version: v2.1
  training_date: "2025-01-20"

parameters:
  algorithm: XGBoost
  n_estimators: 100
  max_depth: 10
  learning_rate: 0.1

tags:
  team: risk
  owner: alice@company.com
  compliance: SOC2

Model Registry Options

Tool	Type	Best For
MLflow	Open-source	General purpose
Weights & Biases	Managed	Experiment tracking + registry
Neptune	Managed	MLOps teams
SageMaker	Cloud	AWS ecosystem
Vertex AI	Cloud	GCP ecosystem

Best Practices

Practice	Why
One model per use case	Clear ownership and versioning
Meaningful version descriptions	Know what changed
Automate stage transitions	Reduce human error
Enforce approval workflows	Governance and compliance
Track all metadata	Full reproducibility

Key insight: A model registry transforms model management from ad-hoc file sharing to a governed, auditable process—essential for production ML at scale.

Next, we'll explore MLflow Model Registry in depth. :::