Monitoring & Observability

Experiment Tracking and Model Registry

4 min read

Experiment tracking is how teams avoid "which model version is in production?" chaos. Interviewers expect you to know MLflow deeply.

MLflow Core Concepts

Concept Purpose Interview Point
Experiment Group related runs One per project/task
Run Single training execution Has parameters, metrics, artifacts
Artifact Output files Models, plots, data samples
Model Registry Model lifecycle Staging → Production transitions

Interview Question: MLflow Setup

Question: "Walk me through setting up experiment tracking for a team of 10 ML engineers."

import mlflow
from mlflow.tracking import MlflowClient

# Server configuration (for team access)
# mlflow server --backend-store-uri postgresql://... --default-artifact-root s3://...
TRACKING_URI = "http://mlflow.internal:5000"
mlflow.set_tracking_uri(TRACKING_URI)

def train_model_with_tracking(params: dict, data):
    # Create or get experiment
    experiment = mlflow.set_experiment("fraud_detection_v2")

    with mlflow.start_run(run_name=f"run_{datetime.now().isoformat()}"):
        # Log parameters
        mlflow.log_params(params)

        # Training
        model = train(data, **params)
        metrics = evaluate(model, test_data)

        # Log metrics
        mlflow.log_metrics({
            "precision": metrics["precision"],
            "recall": metrics["recall"],
            "f1": metrics["f1"],
            "auc_roc": metrics["auc_roc"]
        })

        # Log model with signature
        signature = mlflow.models.infer_signature(
            X_train, model.predict(X_train)
        )

        mlflow.sklearn.log_model(
            model,
            artifact_path="model",
            signature=signature,
            registered_model_name="fraud_detector"
        )

        # Log additional artifacts
        mlflow.log_artifact("feature_importance.png")
        mlflow.log_artifact("confusion_matrix.png")

        return mlflow.active_run().info.run_id

Model Registry Workflow

from mlflow.tracking import MlflowClient

client = MlflowClient()

def promote_model(model_name: str, run_id: str, stage: str):
    """
    Stages: None → Staging → Production → Archived
    """
    # Get model version from run
    model_uri = f"runs:/{run_id}/model"

    # Register if not exists
    model_version = mlflow.register_model(model_uri, model_name)

    # Transition stage
    client.transition_model_version_stage(
        name=model_name,
        version=model_version.version,
        stage=stage,
        archive_existing_versions=(stage == "Production")
    )

    # Add description
    client.update_model_version(
        name=model_name,
        version=model_version.version,
        description=f"Promoted to {stage} on {datetime.now()}"
    )

    return model_version.version

# Usage
promote_model("fraud_detector", "abc123", "Staging")  # After validation
promote_model("fraud_detector", "abc123", "Production")  # After approval

Interview Question: Loading Production Model

Question: "How do you ensure your serving code always uses the latest production model?"

import mlflow

def load_production_model(model_name: str):
    """Load the current production model"""

    # Method 1: By stage (recommended)
    model = mlflow.pyfunc.load_model(
        model_uri=f"models:/{model_name}/Production"
    )

    # Method 2: By version (for reproducibility)
    # model = mlflow.pyfunc.load_model(
    #     model_uri=f"models:/{model_name}/3"
    # )

    return model

def predict_with_production_model(features):
    model = load_production_model("fraud_detector")
    return model.predict(features)

Comparing Tracking Solutions

Feature MLflow Weights & Biases Neptune
Deployment Self-hosted or managed Cloud-only Cloud-only
Cost Free (self-hosted) Free tier + paid Free tier + paid
UI Basic Rich visualization Rich visualization
Registry Built-in Limited Limited
Best for Full control Experiment visualization Research teams

Interview Talking Point: "I prefer MLflow for production because we can self-host it within our VPC for data security, and it integrates well with our deployment pipelines."

Common Interview Mistakes

# What NOT to say
mistakes:
  - "We just save models to S3 with timestamps"
    why_bad: "No lineage, can't reproduce, can't audit"

  - "We use Git to version models"
    why_bad: "Git isn't designed for large binaries"

  - "Each engineer uses their own tracking"
    why_bad: "No collaboration, duplicate work, inconsistent"

# What TO say
best_practices:
  - "Centralized tracking server with PostgreSQL backend"
  - "Model signatures for input/output validation"
  - "Automated promotion through CI/CD"
  - "Audit trail for compliance"

Pro Tip: In interviews, mention model signatures: "We use MLflow signatures to catch input schema mismatches before they cause production errors."

Next, we'll cover infrastructure monitoring for ML systems. :::

Quiz

Module 4: Monitoring & Observability

Take Quiz