Monitoring & Observability
Experiment Tracking and Model Registry
4 min read
Experiment tracking is how teams avoid "which model version is in production?" chaos. Interviewers expect you to know MLflow deeply.
MLflow Core Concepts
| Concept | Purpose | Interview Point |
|---|---|---|
| Experiment | Group related runs | One per project/task |
| Run | Single training execution | Has parameters, metrics, artifacts |
| Artifact | Output files | Models, plots, data samples |
| Model Registry | Model lifecycle | Staging → Production transitions |
Interview Question: MLflow Setup
Question: "Walk me through setting up experiment tracking for a team of 10 ML engineers."
import mlflow
from mlflow.tracking import MlflowClient
# Server configuration (for team access)
# mlflow server --backend-store-uri postgresql://... --default-artifact-root s3://...
TRACKING_URI = "http://mlflow.internal:5000"
mlflow.set_tracking_uri(TRACKING_URI)
def train_model_with_tracking(params: dict, data):
# Create or get experiment
experiment = mlflow.set_experiment("fraud_detection_v2")
with mlflow.start_run(run_name=f"run_{datetime.now().isoformat()}"):
# Log parameters
mlflow.log_params(params)
# Training
model = train(data, **params)
metrics = evaluate(model, test_data)
# Log metrics
mlflow.log_metrics({
"precision": metrics["precision"],
"recall": metrics["recall"],
"f1": metrics["f1"],
"auc_roc": metrics["auc_roc"]
})
# Log model with signature
signature = mlflow.models.infer_signature(
X_train, model.predict(X_train)
)
mlflow.sklearn.log_model(
model,
artifact_path="model",
signature=signature,
registered_model_name="fraud_detector"
)
# Log additional artifacts
mlflow.log_artifact("feature_importance.png")
mlflow.log_artifact("confusion_matrix.png")
return mlflow.active_run().info.run_id
Model Registry Workflow
from mlflow.tracking import MlflowClient
client = MlflowClient()
def promote_model(model_name: str, run_id: str, stage: str):
"""
Stages: None → Staging → Production → Archived
"""
# Get model version from run
model_uri = f"runs:/{run_id}/model"
# Register if not exists
model_version = mlflow.register_model(model_uri, model_name)
# Transition stage
client.transition_model_version_stage(
name=model_name,
version=model_version.version,
stage=stage,
archive_existing_versions=(stage == "Production")
)
# Add description
client.update_model_version(
name=model_name,
version=model_version.version,
description=f"Promoted to {stage} on {datetime.now()}"
)
return model_version.version
# Usage
promote_model("fraud_detector", "abc123", "Staging") # After validation
promote_model("fraud_detector", "abc123", "Production") # After approval
Interview Question: Loading Production Model
Question: "How do you ensure your serving code always uses the latest production model?"
import mlflow
def load_production_model(model_name: str):
"""Load the current production model"""
# Method 1: By stage (recommended)
model = mlflow.pyfunc.load_model(
model_uri=f"models:/{model_name}/Production"
)
# Method 2: By version (for reproducibility)
# model = mlflow.pyfunc.load_model(
# model_uri=f"models:/{model_name}/3"
# )
return model
def predict_with_production_model(features):
model = load_production_model("fraud_detector")
return model.predict(features)
Comparing Tracking Solutions
| Feature | MLflow | Weights & Biases | Neptune |
|---|---|---|---|
| Deployment | Self-hosted or managed | Cloud-only | Cloud-only |
| Cost | Free (self-hosted) | Free tier + paid | Free tier + paid |
| UI | Basic | Rich visualization | Rich visualization |
| Registry | Built-in | Limited | Limited |
| Best for | Full control | Experiment visualization | Research teams |
Interview Talking Point: "I prefer MLflow for production because we can self-host it within our VPC for data security, and it integrates well with our deployment pipelines."
Common Interview Mistakes
# What NOT to say
mistakes:
- "We just save models to S3 with timestamps"
why_bad: "No lineage, can't reproduce, can't audit"
- "We use Git to version models"
why_bad: "Git isn't designed for large binaries"
- "Each engineer uses their own tracking"
why_bad: "No collaboration, duplicate work, inconsistent"
# What TO say
best_practices:
- "Centralized tracking server with PostgreSQL backend"
- "Model signatures for input/output validation"
- "Automated promotion through CI/CD"
- "Audit trail for compliance"
Pro Tip: In interviews, mention model signatures: "We use MLflow signatures to catch input schema mismatches before they cause production errors."
Next, we'll cover infrastructure monitoring for ML systems. :::