Production Guardrails Architecture
Choosing the Right Guardrails Stack
3 min read
With many guardrails tools available—NeMo Guardrails, Guardrails AI, LlamaGuard, ShieldGemma, Presidio—how do you choose the right combination? This lesson provides a decision framework based on your requirements.
Tool Comparison Matrix
| Tool | Type | Latency | Accuracy | Customization | Self-Hosted |
|---|---|---|---|---|---|
| NeMo Guardrails | Flow control + LLM | 200-500ms | High | Very high (Colang) | Yes |
| Guardrails AI | Schema validation | 10-50ms | Variable | High (Pydantic) | Yes |
| LlamaGuard 3 8B | Safety classifier | 100-300ms | High | Medium | Yes |
| ShieldGemma 27B | Safety classifier | 300-800ms | Highest | Low | Yes |
| Presidio | PII detection | 20-50ms | High | High | Yes |
| OpenAI Moderation | Content filter | 50-100ms | Good | None | API only |
Decision Framework
By Use Case
┌─────────────────────────────────────────────────────────────────────┐
│ Guardrails Selection Guide │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Need structured output validation? │
│ ├── Yes → Guardrails AI (Pydantic schemas) │
│ └── No ↓ │
│ │
│ Need conversation flow control? │
│ ├── Yes → NeMo Guardrails (Colang rules) │
│ └── No ↓ │
│ │
│ Need PII protection? │
│ ├── Yes → Presidio + your choice of safety classifier │
│ └── No ↓ │
│ │
│ Need content safety classification? │
│ ├── Highest accuracy → ShieldGemma 27B │
│ ├── Fast + accurate → LlamaGuard 3 8B │
│ ├── Ultra-fast → LlamaGuard 3 1B or toxic-bert │
│ └── Simple API → OpenAI Moderation │
│ │
└─────────────────────────────────────────────────────────────────────┘
By Industry Requirements
| Industry | Primary Concerns | Recommended Stack |
|---|---|---|
| Healthcare | PII, medical accuracy | Presidio + LlamaGuard + NeMo (fact-checking) |
| Finance | PII, compliance, fraud | Presidio + Guardrails AI (schema) + LlamaGuard |
| Consumer Apps | Toxicity, speed | toxic-bert → LlamaGuard (escalation) |
| Enterprise Internal | Data leakage, compliance | Presidio + NeMo Guardrails |
| Education | Age-appropriate content | ShieldGemma + NeMo (topic control) |
Building Your Stack
Example 1: High-Security Enterprise
from typing import List
from dataclasses import dataclass
@dataclass
class EnterpriseStack:
"""High-security guardrails stack for enterprise."""
layers = [
# Layer 1: Fast input validation
("blocklist", BlocklistFilter()),
# Layer 2: PII protection (required for enterprise)
("presidio", PresidioFilter(
entities=["PERSON", "EMAIL", "PHONE", "CREDIT_CARD", "SSN"],
action="mask"
)),
# Layer 3: Safety classification
("llamaguard", LlamaGuard8B(
threshold=0.7,
categories=["violence", "hate", "self_harm"]
)),
# Layer 4: Dialog control
("nemo", NeMoGuardrails(
config_path="./config",
enable_fact_checking=True
)),
# Layer 5: Output validation
("guardrails_ai", GuardrailsAI(
schema=ResponseSchema,
on_fail="reask"
)),
]
Example 2: Consumer Chat Application
@dataclass
class ConsumerStack:
"""Fast, user-friendly guardrails for consumer apps."""
layers = [
# Layer 1: Ultra-fast toxicity
("toxic_bert", ToxicBertClassifier(
threshold=0.8,
escalate_threshold=0.5
)),
# Layer 2: Escalation only for uncertain cases
("llamaguard_escalation", LlamaGuard1B(
only_on_escalation=True
)),
# Layer 3: Simple output check
("output_toxic", ToxicBertClassifier(
check_output=True
)),
]
# Total latency target: < 100ms for 90% of requests
Example 3: RAG Application
@dataclass
class RAGStack:
"""Guardrails for Retrieval-Augmented Generation."""
input_layers = [
("blocklist", BlocklistFilter()),
("injection", InjectionClassifier()),
]
retrieval_layers = [
# Check retrieved chunks
("chunk_relevance", RelevanceFilter(min_score=0.7)),
("chunk_toxicity", ToxicityFilter()),
]
output_layers = [
("hallucination", HallucinationDetector(
compare_to_sources=True
)),
("citation", CitationEnforcer()),
("pii", PresidioFilter(action="block")),
]
Cost Considerations
| Approach | Compute Cost | API Cost | Notes |
|---|---|---|---|
| Self-hosted LlamaGuard | GPU required | None | Best for high volume |
| OpenAI Moderation API | None | $0.0001/req | Simple, no GPU |
| ShieldGemma on Cloud | ~$0.01/req | None | High accuracy |
| Hybrid (fast local + API) | Low GPU | Low | Best of both |
# Cost-optimized hybrid approach
async def cost_optimized_check(user_input: str):
# Free local check first
local_result = await toxic_bert.check(user_input)
if local_result.confidence > 0.9:
return local_result # High confidence = no API call
# Only escalate uncertain cases to paid API
return await openai_moderation.check(user_input)
Stack Validation Checklist
Before deploying your guardrails stack:
- Coverage: Does the stack address all threat categories?
- Latency: Total latency within budget (< 500ms typical)?
- Fallbacks: What happens when each component fails?
- Monitoring: Can you observe each layer's performance?
- Updates: How will you update blocklists and models?
- Testing: Do you have adversarial test cases?
Key Takeaway: The best guardrails stack combines complementary tools—fast local filters for obvious cases, accurate classifiers for nuanced decisions, and schema validation for structured outputs.
Next module: Deep dive into input filtering at scale with Presidio and injection detection. :::