Model Registry & Serving

Model Registry Concepts

3 min read

A model registry is a centralized hub for managing the lifecycle of ML models—from experimentation to production. It brings version control, governance, and collaboration to model management.

Why Model Registry?

Without a registry:

Models scattered across:
├── /home/alice/models/best_model_v2_final_FINAL.pkl
├── /home/bob/experiments/model_2025_01_15.h5
├── s3://bucket/models/classifier/
├── /mnt/shared/archived_models/
└── "I think the production model is in Slack somewhere..."

With a registry:

Model Registry
├── fraud-detector
│   ├── Version 1 (Staging)
│   ├── Version 2 (Production) ← Current
│   └── Version 3 (Development)
├── recommendation-engine
│   ├── Version 1 (Archived)
│   └── Version 2 (Production)
└── churn-predictor
    └── Version 1 (Production)

Core Concepts

Model

A trained ML model ready for deployment:

# What gets registered
model = {
    "name": "fraud-detector",
    "version": 3,
    "artifacts": {
        "model.pkl": "s3://bucket/models/fraud/v3/model.pkl",
        "preprocessor.pkl": "s3://bucket/models/fraud/v3/preprocessor.pkl"
    },
    "metrics": {
        "accuracy": 0.95,
        "f1_score": 0.93,
        "auc_roc": 0.98
    },
    "parameters": {
        "n_estimators": 100,
        "max_depth": 10
    },
    "tags": {
        "team": "risk",
        "use_case": "real-time fraud detection"
    }
}

Model Version

Each training run produces a new version:

fraud-detector
├── v1: accuracy=0.85, created=2025-01-01
├── v2: accuracy=0.90, created=2025-01-15
└── v3: accuracy=0.95, created=2025-01-20 ← Latest

Model Stage

Stages track where a model is in its lifecycle:

StageDescriptionWho Can Access
DevelopmentExperimental, not testedData scientists
StagingUnder testing/validationQA team
ProductionLive, serving trafficProduction systems
ArchivedDeprecated, kept for auditCompliance
Development ──▶ Staging ──▶ Production
                          Archived

Model Registry Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Model Registry                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Metadata Store                         │  │
│  │  • Model name, version, stage                             │  │
│  │  • Training parameters                                    │  │
│  │  • Metrics and tags                                       │  │
│  │  • Lineage (data, code, experiment)                       │  │
│  └───────────────────────────────────────────────────────────┘  │
│                              │                                  │
│                              ▼                                  │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                    Artifact Store                         │  │
│  │  • Model files (pkl, pt, onnx, savedmodel)               │  │
│  │  • Preprocessing pipelines                                │  │
│  │  • Configuration files                                    │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
         Training        Serving         CI/CD
         Pipeline        System          Pipeline

Key Features

1. Version Control

# Register multiple versions
mlflow.register_model("runs:/abc123/model", "fraud-detector")  # v1
mlflow.register_model("runs:/def456/model", "fraud-detector")  # v2
mlflow.register_model("runs:/ghi789/model", "fraud-detector")  # v3

2. Stage Transitions

# Promote model to production
client.transition_model_version_stage(
    name="fraud-detector",
    version=3,
    stage="Production"
)

3. Model Lineage

Model: fraud-detector v3
├── Training Run: experiment_123/run_456
├── Dataset: s3://bucket/data/train_2025_01.parquet
├── Code: git@github.com:org/repo.git@commit_abc
├── Environment: python=3.11, sklearn=1.4.0
└── Parent Model: fraud-detector v2

4. Access Control

RolePermissions
Data ScientistCreate, read models
ML EngineerPromote to staging
DevOpsPromote to production
AdminDelete, archive models

Model Metadata

What to Track

CategoryExamples
IdentityName, version, aliases
PerformanceAccuracy, latency, throughput
TrainingHyperparameters, dataset version
LineageExperiment ID, code commit
OperationalOwner, team, SLA requirements

Example Metadata

model:
  name: fraud-detector
  version: 3
  stage: Production

metrics:
  accuracy: 0.95
  f1_score: 0.93
  latency_p99_ms: 15
  throughput_qps: 1000

training:
  experiment_id: exp_123
  run_id: run_456
  dataset_version: v2.1
  training_date: "2025-01-20"

parameters:
  algorithm: XGBoost
  n_estimators: 100
  max_depth: 10
  learning_rate: 0.1

tags:
  team: risk
  owner: alice@company.com
  compliance: SOC2

Model Registry Options

ToolTypeBest For
MLflowOpen-sourceGeneral purpose
Weights & BiasesManagedExperiment tracking + registry
NeptuneManagedMLOps teams
SageMakerCloudAWS ecosystem
Vertex AICloudGCP ecosystem

Best Practices

PracticeWhy
One model per use caseClear ownership and versioning
Meaningful version descriptionsKnow what changed
Automate stage transitionsReduce human error
Enforce approval workflowsGovernance and compliance
Track all metadataFull reproducibility

Key insight: A model registry transforms model management from ad-hoc file sharing to a governed, auditable process—essential for production ML at scale.

Next, we'll explore MLflow Model Registry in depth. :::

Quick check: how does this lesson land for you?

Quiz

Module 5: Model Registry & Serving

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.