Production Monitoring & Next Steps

Alerting & SLOs

3 min read

Production LLM systems need proactive monitoring. Set quality thresholds, configure alerts, and define Service Level Objectives (SLOs) to catch issues before users do.

What are SLOs for LLMs?

Service Level Objectives define acceptable quality levels:

SLO TypeExampleThreshold
LatencyP95 response time< 3 seconds
QualityHelpfulness score> 0.8
AvailabilitySuccessful responses> 99.5%
CostCost per query< $0.05

Defining Quality SLOs

Set thresholds for your evaluation metrics:

# Quality SLOs for a support bot
QUALITY_SLOS = {
    "accuracy": {
        "target": 0.90,
        "warning": 0.85,
        "critical": 0.75
    },
    "helpfulness": {
        "target": 0.85,
        "warning": 0.80,
        "critical": 0.70
    },
    "response_time_p95_ms": {
        "target": 2000,
        "warning": 3000,
        "critical": 5000
    }
}

Setting Up Alerts

LangSmith Alerting

LangSmith supports alerting on trace metrics:

# Configure alert in LangSmith UI:
# 1. Navigate to Settings > Alerts
# 2. Create new alert rule
# 3. Set conditions:

alert_config = {
    "name": "Quality Drop Alert",
    "condition": "avg(helpfulness_score) < 0.8",
    "window": "1 hour",
    "notification": {
        "type": "slack",
        "channel": "#llm-alerts"
    }
}

MLflow Alerting Pattern

import mlflow

def check_quality_slos(results: dict) -> list:
    """Check if evaluation results meet SLOs."""
    violations = []

    for metric, thresholds in QUALITY_SLOS.items():
        value = results.get(metric)
        if value is None:
            continue

        if value < thresholds["critical"]:
            violations.append({
                "metric": metric,
                "level": "critical",
                "value": value,
                "threshold": thresholds["critical"]
            })
        elif value < thresholds["warning"]:
            violations.append({
                "metric": metric,
                "level": "warning",
                "value": value,
                "threshold": thresholds["warning"]
            })

    return violations

# After each evaluation
violations = check_quality_slos(eval_results.metrics)
if violations:
    send_alert(violations)

W&B Weave Alerting

import weave

@weave.op()
def production_eval_with_alerts():
    """Run evaluation and check SLOs."""
    results = await evaluation.evaluate(production_model)

    # Check against SLOs
    if results.summary["accuracy"] < 0.85:
        # Trigger alert
        send_slack_alert(
            message=f"Quality SLO breach: accuracy = {results.summary['accuracy']}"
        )

    return results

Alert Channels

Configure multiple notification channels:

ChannelUse Case
SlackReal-time team notifications
EmailDetailed reports and summaries
PagerDutyCritical on-call alerts
WebhooksCustom integrations

Alert Fatigue Prevention

Avoid too many alerts:

  1. Set appropriate thresholds: Not too sensitive
  2. Use warning before critical: Catch issues early
  3. Aggregate alerts: Don't alert per-request
  4. Add context: Include relevant information
  5. Define escalation paths: Warning → Critical → Page

SLO Dashboard

Track SLO compliance over time:

SLO Dashboard - Last 7 Days
───────────────────────────────────────────
Metric          │ Target │ Current │ Status
───────────────────────────────────────────
Accuracy        │ 90%    │ 92.3%   │ ✅
Helpfulness     │ 85%    │ 87.1%   │ ✅
P95 Latency     │ 2s     │ 1.8s    │ ✅
Error Rate      │ <1%    │ 0.3%    │ ✅
Cost/Query      │ $0.05  │ $0.042  │ ✅
───────────────────────────────────────────
Overall SLO Compliance: 100%

Best Practices

PracticeWhy
Start with few SLOsAdd more as you understand your system
Use error budgetsAllow some SLO breaches
Review regularlyAdjust thresholds as needed
Document runbooksWhat to do when alerts fire

Tip: Start with 3-5 key SLOs. You can always add more, but too many early on leads to alert fatigue.

Next, we'll explore cost tracking and optimization strategies. :::

Quick check: how does this lesson land for you?

Quiz

Module 6: Production Monitoring & Next Steps

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.