Alerting & SLOs

Production LLM systems need proactive monitoring. Set quality thresholds, configure alerts, and define Service Level Objectives (SLOs) to catch issues before users do.

What are SLOs for LLMs?

Service Level Objectives define acceptable quality levels:

SLO Type	Example	Threshold
Latency	P95 response time	< 3 seconds
Quality	Helpfulness score	> 0.8
Availability	Successful responses	> 99.5%
Cost	Cost per query	< $0.05

Defining Quality SLOs

Set thresholds for your evaluation metrics:

# Quality SLOs for a support bot
QUALITY_SLOS = {
    "accuracy": {
        "target": 0.90,
        "warning": 0.85,
        "critical": 0.75
    },
    "helpfulness": {
        "target": 0.85,
        "warning": 0.80,
        "critical": 0.70
    },
    "response_time_p95_ms": {
        "target": 2000,
        "warning": 3000,
        "critical": 5000
    }
}

Setting Up Alerts

LangSmith Alerting

LangSmith supports alerting on trace metrics:

# Configure alert in LangSmith UI:
# 1. Navigate to Settings > Alerts
# 2. Create new alert rule
# 3. Set conditions:

alert_config = {
    "name": "Quality Drop Alert",
    "condition": "avg(helpfulness_score) < 0.8",
    "window": "1 hour",
    "notification": {
        "type": "slack",
        "channel": "#llm-alerts"
    }
}

MLflow Alerting Pattern

import mlflow

def check_quality_slos(results: dict) -> list:
    """Check if evaluation results meet SLOs."""
    violations = []

    for metric, thresholds in QUALITY_SLOS.items():
        value = results.get(metric)
        if value is None:
            continue

        if value < thresholds["critical"]:
            violations.append({
                "metric": metric,
                "level": "critical",
                "value": value,
                "threshold": thresholds["critical"]
            })
        elif value < thresholds["warning"]:
            violations.append({
                "metric": metric,
                "level": "warning",
                "value": value,
                "threshold": thresholds["warning"]
            })

    return violations

# After each evaluation
violations = check_quality_slos(eval_results.metrics)
if violations:
    send_alert(violations)

W&B Weave Alerting

import weave

@weave.op()
def production_eval_with_alerts():
    """Run evaluation and check SLOs."""
    results = await evaluation.evaluate(production_model)

    # Check against SLOs
    if results.summary["accuracy"] < 0.85:
        # Trigger alert
        send_slack_alert(
            message=f"Quality SLO breach: accuracy = {results.summary['accuracy']}"
        )

    return results

Alert Channels

Configure multiple notification channels:

Channel	Use Case
Slack	Real-time team notifications
Email	Detailed reports and summaries
PagerDuty	Critical on-call alerts
Webhooks	Custom integrations

Alert Fatigue Prevention

Avoid too many alerts:

Set appropriate thresholds: Not too sensitive
Use warning before critical: Catch issues early
Aggregate alerts: Don't alert per-request
Add context: Include relevant information
Define escalation paths: Warning → Critical → Page

SLO Dashboard

Track SLO compliance over time:

SLO Dashboard - Last 7 Days
───────────────────────────────────────────
Metric          │ Target │ Current │ Status
───────────────────────────────────────────
Accuracy        │ 90%    │ 92.3%   │ ✅
Helpfulness     │ 85%    │ 87.1%   │ ✅
P95 Latency     │ 2s     │ 1.8s    │ ✅
Error Rate      │ <1%    │ 0.3%    │ ✅
Cost/Query      │ $0.05  │ $0.042  │ ✅
───────────────────────────────────────────
Overall SLO Compliance: 100%

Best Practices

Practice	Why
Start with few SLOs	Add more as you understand your system
Use error budgets	Allow some SLO breaches
Review regularly	Adjust thresholds as needed
Document runbooks	What to do when alerts fire

Tip: Start with 3-5 key SLOs. You can always add more, but too many early on leads to alert fatigue.

Next, we'll explore cost tracking and optimization strategies. :::