Monitoring & Observability

Experiment Tracking and Model Registry

4 min read

Experiment tracking is how teams avoid "which model version is in production?" chaos. Interviewers expect you to know MLflow deeply.

MLflow Core Concepts

ConceptPurposeInterview Point
ExperimentGroup related runsOne per project/task
RunSingle training executionHas parameters, metrics, artifacts
ArtifactOutput filesModels, plots, data samples
Model RegistryModel lifecycleStaging → Production transitions

Interview Question: MLflow Setup

Question: "Walk me through setting up experiment tracking for a team of 10 ML engineers."

import mlflow
from mlflow.tracking import MlflowClient

# Server configuration (for team access)
# mlflow server --backend-store-uri postgresql://... --default-artifact-root s3://...
TRACKING_URI = "http://mlflow.internal:5000"
mlflow.set_tracking_uri(TRACKING_URI)

def train_model_with_tracking(params: dict, data):
    # Create or get experiment
    experiment = mlflow.set_experiment("fraud_detection_v2")

    with mlflow.start_run(run_name=f"run_{datetime.now().isoformat()}"):
        # Log parameters
        mlflow.log_params(params)

        # Training
        model = train(data, **params)
        metrics = evaluate(model, test_data)

        # Log metrics
        mlflow.log_metrics({
            "precision": metrics["precision"],
            "recall": metrics["recall"],
            "f1": metrics["f1"],
            "auc_roc": metrics["auc_roc"]
        })

        # Log model with signature
        signature = mlflow.models.infer_signature(
            X_train, model.predict(X_train)
        )

        mlflow.sklearn.log_model(
            model,
            artifact_path="model",
            signature=signature,
            registered_model_name="fraud_detector"
        )

        # Log additional artifacts
        mlflow.log_artifact("feature_importance.png")
        mlflow.log_artifact("confusion_matrix.png")

        return mlflow.active_run().info.run_id

Model Registry Workflow

from mlflow.tracking import MlflowClient

client = MlflowClient()

def promote_model(model_name: str, run_id: str, stage: str):
    """
    Stages: None → Staging → Production → Archived
    """
    # Get model version from run
    model_uri = f"runs:/{run_id}/model"

    # Register if not exists
    model_version = mlflow.register_model(model_uri, model_name)

    # Transition stage
    client.transition_model_version_stage(
        name=model_name,
        version=model_version.version,
        stage=stage,
        archive_existing_versions=(stage == "Production")
    )

    # Add description
    client.update_model_version(
        name=model_name,
        version=model_version.version,
        description=f"Promoted to {stage} on {datetime.now()}"
    )

    return model_version.version

# Usage
promote_model("fraud_detector", "abc123", "Staging")  # After validation
promote_model("fraud_detector", "abc123", "Production")  # After approval

Interview Question: Loading Production Model

Question: "How do you ensure your serving code always uses the latest production model?"

import mlflow

def load_production_model(model_name: str):
    """Load the current production model"""

    # Method 1: By stage (recommended)
    model = mlflow.pyfunc.load_model(
        model_uri=f"models:/{model_name}/Production"
    )

    # Method 2: By version (for reproducibility)
    # model = mlflow.pyfunc.load_model(
    #     model_uri=f"models:/{model_name}/3"
    # )

    return model

def predict_with_production_model(features):
    model = load_production_model("fraud_detector")
    return model.predict(features)

Comparing Tracking Solutions

FeatureMLflowWeights & BiasesNeptune
DeploymentSelf-hosted or managedCloud-onlyCloud-only
CostFree (self-hosted)Free tier + paidFree tier + paid
UIBasicRich visualizationRich visualization
RegistryBuilt-inLimitedLimited
Best forFull controlExperiment visualizationResearch teams

Interview Talking Point: "I prefer MLflow for production because we can self-host it within our VPC for data security, and it integrates well with our deployment pipelines."

Common Interview Mistakes

# What NOT to say
mistakes:
  - "We just save models to S3 with timestamps"
    why_bad: "No lineage, can't reproduce, can't audit"

  - "We use Git to version models"
    why_bad: "Git isn't designed for large binaries"

  - "Each engineer uses their own tracking"
    why_bad: "No collaboration, duplicate work, inconsistent"

# What TO say
best_practices:
  - "Centralized tracking server with PostgreSQL backend"
  - "Model signatures for input/output validation"
  - "Automated promotion through CI/CD"
  - "Audit trail for compliance"

Pro Tip: In interviews, mention model signatures: "We use MLflow signatures to catch input schema mismatches before they cause production errors."

Next, we'll cover infrastructure monitoring for ML systems. :::

Quick check: how does this lesson land for you?

Quiz

Module 4: Monitoring & Observability

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.