Experiment Tracking and Model Registry

Experiment tracking is how teams avoid "which model version is in production?" chaos. Interviewers expect you to know MLflow deeply.

MLflow Core Concepts

Concept	Purpose	Interview Point
Experiment	Group related runs	One per project/task
Run	Single training execution	Has parameters, metrics, artifacts
Artifact	Output files	Models, plots, data samples
Model Registry	Model lifecycle	Staging → Production transitions

Interview Question: MLflow Setup

Question: "Walk me through setting up experiment tracking for a team of 10 ML engineers."

import mlflow
from mlflow.tracking import MlflowClient

# Server configuration (for team access)
# mlflow server --backend-store-uri postgresql://... --default-artifact-root s3://...
TRACKING_URI = "http://mlflow.internal:5000"
mlflow.set_tracking_uri(TRACKING_URI)

def train_model_with_tracking(params: dict, data):
    # Create or get experiment
    experiment = mlflow.set_experiment("fraud_detection_v2")

    with mlflow.start_run(run_name=f"run_{datetime.now().isoformat()}"):
        # Log parameters
        mlflow.log_params(params)

        # Training
        model = train(data, **params)
        metrics = evaluate(model, test_data)

        # Log metrics
        mlflow.log_metrics({
            "precision": metrics["precision"],
            "recall": metrics["recall"],
            "f1": metrics["f1"],
            "auc_roc": metrics["auc_roc"]
        })

        # Log model with signature
        signature = mlflow.models.infer_signature(
            X_train, model.predict(X_train)
        )

        mlflow.sklearn.log_model(
            model,
            artifact_path="model",
            signature=signature,
            registered_model_name="fraud_detector"
        )

        # Log additional artifacts
        mlflow.log_artifact("feature_importance.png")
        mlflow.log_artifact("confusion_matrix.png")

        return mlflow.active_run().info.run_id

Model Registry Workflow

from mlflow.tracking import MlflowClient

client = MlflowClient()

def promote_model(model_name: str, run_id: str, stage: str):
    """
    Stages: None → Staging → Production → Archived
    """
    # Get model version from run
    model_uri = f"runs:/{run_id}/model"

    # Register if not exists
    model_version = mlflow.register_model(model_uri, model_name)

    # Transition stage
    client.transition_model_version_stage(
        name=model_name,
        version=model_version.version,
        stage=stage,
        archive_existing_versions=(stage == "Production")
    )

    # Add description
    client.update_model_version(
        name=model_name,
        version=model_version.version,
        description=f"Promoted to {stage} on {datetime.now()}"
    )

    return model_version.version

# Usage
promote_model("fraud_detector", "abc123", "Staging")  # After validation
promote_model("fraud_detector", "abc123", "Production")  # After approval

Interview Question: Loading Production Model

Question: "How do you ensure your serving code always uses the latest production model?"

import mlflow

def load_production_model(model_name: str):
    """Load the current production model"""

    # Method 1: By stage (recommended)
    model = mlflow.pyfunc.load_model(
        model_uri=f"models:/{model_name}/Production"
    )

    # Method 2: By version (for reproducibility)
    # model = mlflow.pyfunc.load_model(
    #     model_uri=f"models:/{model_name}/3"
    # )

    return model

def predict_with_production_model(features):
    model = load_production_model("fraud_detector")
    return model.predict(features)

Comparing Tracking Solutions

Feature	MLflow	Weights & Biases	Neptune
Deployment	Self-hosted or managed	Cloud-only	Cloud-only
Cost	Free (self-hosted)	Free tier + paid	Free tier + paid
UI	Basic	Rich visualization	Rich visualization
Registry	Built-in	Limited	Limited
Best for	Full control	Experiment visualization	Research teams

Interview Talking Point: "I prefer MLflow for production because we can self-host it within our VPC for data security, and it integrates well with our deployment pipelines."

Common Interview Mistakes

# What NOT to say
mistakes:
  - "We just save models to S3 with timestamps"
    why_bad: "No lineage, can't reproduce, can't audit"

  - "We use Git to version models"
    why_bad: "Git isn't designed for large binaries"

  - "Each engineer uses their own tracking"
    why_bad: "No collaboration, duplicate work, inconsistent"

# What TO say
best_practices:
  - "Centralized tracking server with PostgreSQL backend"
  - "Model signatures for input/output validation"
  - "Automated promotion through CI/CD"
  - "Audit trail for compliance"

Pro Tip: In interviews, mention model signatures: "We use MLflow signatures to catch input schema mismatches before they cause production errors."

Next, we'll cover infrastructure monitoring for ML systems. :::