Monitoring & Observability
Model Drift Detection
Model drift is the silent killer of ML systems. Interviewers test your understanding of drift types, detection methods, and mitigation strategies.
Types of Drift
| Drift Type | Definition | Example | Detection Method |
|---|---|---|---|
| Data Drift | Input distribution changes | User demographics shift | KS-test, PSI |
| Concept Drift | Relationship between X and Y changes | Fraud patterns evolve | Performance monitoring |
| Label Drift | Target distribution changes | More positive reviews | Chi-square test |
| Upstream Drift | Data pipeline changes | New data source added | Schema validation |
Interview Question: Detect Production Drift
Question: "Your fraud detection model's precision dropped from 95% to 80% over 3 months. How would you diagnose and fix this?"
Structured Answer:
def diagnose_model_degradation():
steps = {
"1_verify_metrics": """
First, verify the metrics calculation itself:
- Is the ground truth labeling consistent?
- Did evaluation methodology change?
- Sample size sufficient for significance?
""",
"2_check_data_drift": """
Compare production data to training data:
- Feature distributions (KS-test per feature)
- Population Stability Index (PSI) overall
- Missing value patterns
""",
"3_check_concept_drift": """
Analyze model behavior:
- Prediction distribution shift
- Confidence score distribution
- Performance by time cohort
""",
"4_identify_root_cause": """
Potential causes:
- Fraudsters adapted to model (concept drift)
- New user segment (data drift)
- Feature engineering bug (upstream drift)
- Seasonality (temporal drift)
""",
"5_remediation": """
Based on diagnosis:
- Retrain on recent data
- Add new features capturing new patterns
- Implement online learning
- Deploy challenger model
"""
}
return steps
Statistical Tests for Drift Detection
Kolmogorov-Smirnov Test (KS-test):
from scipy import stats
import numpy as np
def detect_feature_drift(training_data, production_data, threshold=0.05):
"""Detect drift using KS-test per feature"""
drift_results = {}
for feature in training_data.columns:
stat, p_value = stats.ks_2samp(
training_data[feature],
production_data[feature]
)
drift_results[feature] = {
"ks_statistic": stat,
"p_value": p_value,
"drift_detected": p_value < threshold
}
return drift_results
Population Stability Index (PSI):
def calculate_psi(baseline, current, bins=10):
"""
PSI interpretation:
- PSI < 0.1: No significant drift
- 0.1 <= PSI < 0.2: Moderate drift (investigate)
- PSI >= 0.2: Significant drift (action required)
"""
baseline_pct, _ = np.histogram(baseline, bins=bins, density=True)
current_pct, _ = np.histogram(current, bins=bins, density=True)
# Avoid division by zero
baseline_pct = np.where(baseline_pct == 0, 0.0001, baseline_pct)
current_pct = np.where(current_pct == 0, 0.0001, current_pct)
psi = np.sum((current_pct - baseline_pct) *
np.log(current_pct / baseline_pct))
return psi
Drift Detection with Evidently
from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
def generate_drift_report(reference_data, current_data):
column_mapping = ColumnMapping(
target="label",
prediction="prediction",
numerical_features=["feature_1", "feature_2", "feature_3"],
categorical_features=["category_a", "category_b"]
)
report = Report(metrics=[
DataDriftPreset(),
TargetDriftPreset()
])
report.run(
reference_data=reference_data,
current_data=current_data,
column_mapping=column_mapping
)
# Export for dashboards
report.save_html("drift_report.html")
# Programmatic access
results = report.as_dict()
return results
Interview Follow-up: Alerting Strategy
Question: "How do you set drift alert thresholds?"
# Tiered alerting strategy
alerting_config:
feature_drift:
psi_warning: 0.1 # Investigate
psi_critical: 0.2 # Immediate action
performance_drift:
precision_warning: 0.05 # 5% drop
precision_critical: 0.10 # 10% drop
prediction_drift:
distribution_shift_warning: 0.15
distribution_shift_critical: 0.25
response:
warning: "Slack notification + Jira ticket"
critical: "PagerDuty + auto-trigger retraining"
Expert Insight: In interviews, mention that drift detection is useless without ground truth latency consideration. "We can only detect concept drift after labels arrive, which may be weeks in fraud detection."
Next, we'll cover experiment tracking and model registry. :::