Monitoring & Observability

Model Drift Detection

5 min read

Model drift is the silent killer of ML systems. Interviewers test your understanding of drift types, detection methods, and mitigation strategies.

Types of Drift

Drift Type Definition Example Detection Method
Data Drift Input distribution changes User demographics shift KS-test, PSI
Concept Drift Relationship between X and Y changes Fraud patterns evolve Performance monitoring
Label Drift Target distribution changes More positive reviews Chi-square test
Upstream Drift Data pipeline changes New data source added Schema validation

Interview Question: Detect Production Drift

Question: "Your fraud detection model's precision dropped from 95% to 80% over 3 months. How would you diagnose and fix this?"

Structured Answer:

def diagnose_model_degradation():
    steps = {
        "1_verify_metrics": """
            First, verify the metrics calculation itself:
            - Is the ground truth labeling consistent?
            - Did evaluation methodology change?
            - Sample size sufficient for significance?
        """,

        "2_check_data_drift": """
            Compare production data to training data:
            - Feature distributions (KS-test per feature)
            - Population Stability Index (PSI) overall
            - Missing value patterns
        """,

        "3_check_concept_drift": """
            Analyze model behavior:
            - Prediction distribution shift
            - Confidence score distribution
            - Performance by time cohort
        """,

        "4_identify_root_cause": """
            Potential causes:
            - Fraudsters adapted to model (concept drift)
            - New user segment (data drift)
            - Feature engineering bug (upstream drift)
            - Seasonality (temporal drift)
        """,

        "5_remediation": """
            Based on diagnosis:
            - Retrain on recent data
            - Add new features capturing new patterns
            - Implement online learning
            - Deploy challenger model
        """
    }
    return steps

Statistical Tests for Drift Detection

Kolmogorov-Smirnov Test (KS-test):

from scipy import stats
import numpy as np

def detect_feature_drift(training_data, production_data, threshold=0.05):
    """Detect drift using KS-test per feature"""
    drift_results = {}

    for feature in training_data.columns:
        stat, p_value = stats.ks_2samp(
            training_data[feature],
            production_data[feature]
        )

        drift_results[feature] = {
            "ks_statistic": stat,
            "p_value": p_value,
            "drift_detected": p_value < threshold
        }

    return drift_results

Population Stability Index (PSI):

def calculate_psi(baseline, current, bins=10):
    """
    PSI interpretation:
    - PSI < 0.1: No significant drift
    - 0.1 <= PSI < 0.2: Moderate drift (investigate)
    - PSI >= 0.2: Significant drift (action required)
    """
    baseline_pct, _ = np.histogram(baseline, bins=bins, density=True)
    current_pct, _ = np.histogram(current, bins=bins, density=True)

    # Avoid division by zero
    baseline_pct = np.where(baseline_pct == 0, 0.0001, baseline_pct)
    current_pct = np.where(current_pct == 0, 0.0001, current_pct)

    psi = np.sum((current_pct - baseline_pct) *
                 np.log(current_pct / baseline_pct))

    return psi

Drift Detection with Evidently

from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset

def generate_drift_report(reference_data, current_data):
    column_mapping = ColumnMapping(
        target="label",
        prediction="prediction",
        numerical_features=["feature_1", "feature_2", "feature_3"],
        categorical_features=["category_a", "category_b"]
    )

    report = Report(metrics=[
        DataDriftPreset(),
        TargetDriftPreset()
    ])

    report.run(
        reference_data=reference_data,
        current_data=current_data,
        column_mapping=column_mapping
    )

    # Export for dashboards
    report.save_html("drift_report.html")

    # Programmatic access
    results = report.as_dict()
    return results

Interview Follow-up: Alerting Strategy

Question: "How do you set drift alert thresholds?"

# Tiered alerting strategy
alerting_config:
  feature_drift:
    psi_warning: 0.1      # Investigate
    psi_critical: 0.2     # Immediate action

  performance_drift:
    precision_warning: 0.05   # 5% drop
    precision_critical: 0.10  # 10% drop

  prediction_drift:
    distribution_shift_warning: 0.15
    distribution_shift_critical: 0.25

  response:
    warning: "Slack notification + Jira ticket"
    critical: "PagerDuty + auto-trigger retraining"

Expert Insight: In interviews, mention that drift detection is useless without ground truth latency consideration. "We can only detect concept drift after labels arrive, which may be weeks in fraud detection."

Next, we'll cover experiment tracking and model registry. :::

Quiz

Module 4: Monitoring & Observability

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.