Production Guardrails Architecture

Choosing the Right Guardrails Stack

3 min read

With many guardrails tools available—NeMo Guardrails, Guardrails AI, LlamaGuard, ShieldGemma, Presidio—how do you choose the right combination? This lesson provides a decision framework based on your requirements.

Tool Comparison Matrix

Tool Type Latency Accuracy Customization Self-Hosted
NeMo Guardrails Flow control + LLM 200-500ms High Very high (Colang) Yes
Guardrails AI Schema validation 10-50ms Variable High (Pydantic) Yes
LlamaGuard 3 8B Safety classifier 100-300ms High Medium Yes
ShieldGemma 27B Safety classifier 300-800ms Highest Low Yes
Presidio PII detection 20-50ms High High Yes
OpenAI Moderation Content filter 50-100ms Good None API only

Decision Framework

By Use Case

┌─────────────────────────────────────────────────────────────────────┐
│                     Guardrails Selection Guide                       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Need structured output validation?                                  │
│  ├── Yes → Guardrails AI (Pydantic schemas)                         │
│  └── No ↓                                                           │
│                                                                      │
│  Need conversation flow control?                                     │
│  ├── Yes → NeMo Guardrails (Colang rules)                           │
│  └── No ↓                                                           │
│                                                                      │
│  Need PII protection?                                                │
│  ├── Yes → Presidio + your choice of safety classifier              │
│  └── No ↓                                                           │
│                                                                      │
│  Need content safety classification?                                 │
│  ├── Highest accuracy → ShieldGemma 27B                             │
│  ├── Fast + accurate → LlamaGuard 3 8B                              │
│  ├── Ultra-fast → LlamaGuard 3 1B or toxic-bert                     │
│  └── Simple API → OpenAI Moderation                                 │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

By Industry Requirements

Industry Primary Concerns Recommended Stack
Healthcare PII, medical accuracy Presidio + LlamaGuard + NeMo (fact-checking)
Finance PII, compliance, fraud Presidio + Guardrails AI (schema) + LlamaGuard
Consumer Apps Toxicity, speed toxic-bert → LlamaGuard (escalation)
Enterprise Internal Data leakage, compliance Presidio + NeMo Guardrails
Education Age-appropriate content ShieldGemma + NeMo (topic control)

Building Your Stack

Example 1: High-Security Enterprise

from typing import List
from dataclasses import dataclass

@dataclass
class EnterpriseStack:
    """High-security guardrails stack for enterprise."""

    layers = [
        # Layer 1: Fast input validation
        ("blocklist", BlocklistFilter()),

        # Layer 2: PII protection (required for enterprise)
        ("presidio", PresidioFilter(
            entities=["PERSON", "EMAIL", "PHONE", "CREDIT_CARD", "SSN"],
            action="mask"
        )),

        # Layer 3: Safety classification
        ("llamaguard", LlamaGuard8B(
            threshold=0.7,
            categories=["violence", "hate", "self_harm"]
        )),

        # Layer 4: Dialog control
        ("nemo", NeMoGuardrails(
            config_path="./config",
            enable_fact_checking=True
        )),

        # Layer 5: Output validation
        ("guardrails_ai", GuardrailsAI(
            schema=ResponseSchema,
            on_fail="reask"
        )),
    ]

Example 2: Consumer Chat Application

@dataclass
class ConsumerStack:
    """Fast, user-friendly guardrails for consumer apps."""

    layers = [
        # Layer 1: Ultra-fast toxicity
        ("toxic_bert", ToxicBertClassifier(
            threshold=0.8,
            escalate_threshold=0.5
        )),

        # Layer 2: Escalation only for uncertain cases
        ("llamaguard_escalation", LlamaGuard1B(
            only_on_escalation=True
        )),

        # Layer 3: Simple output check
        ("output_toxic", ToxicBertClassifier(
            check_output=True
        )),
    ]

    # Total latency target: < 100ms for 90% of requests

Example 3: RAG Application

@dataclass
class RAGStack:
    """Guardrails for Retrieval-Augmented Generation."""

    input_layers = [
        ("blocklist", BlocklistFilter()),
        ("injection", InjectionClassifier()),
    ]

    retrieval_layers = [
        # Check retrieved chunks
        ("chunk_relevance", RelevanceFilter(min_score=0.7)),
        ("chunk_toxicity", ToxicityFilter()),
    ]

    output_layers = [
        ("hallucination", HallucinationDetector(
            compare_to_sources=True
        )),
        ("citation", CitationEnforcer()),
        ("pii", PresidioFilter(action="block")),
    ]

Cost Considerations

Approach Compute Cost API Cost Notes
Self-hosted LlamaGuard GPU required None Best for high volume
OpenAI Moderation API None $0.0001/req Simple, no GPU
ShieldGemma on Cloud ~$0.01/req None High accuracy
Hybrid (fast local + API) Low GPU Low Best of both
# Cost-optimized hybrid approach
async def cost_optimized_check(user_input: str):
    # Free local check first
    local_result = await toxic_bert.check(user_input)

    if local_result.confidence > 0.9:
        return local_result  # High confidence = no API call

    # Only escalate uncertain cases to paid API
    return await openai_moderation.check(user_input)

Stack Validation Checklist

Before deploying your guardrails stack:

  • Coverage: Does the stack address all threat categories?
  • Latency: Total latency within budget (< 500ms typical)?
  • Fallbacks: What happens when each component fails?
  • Monitoring: Can you observe each layer's performance?
  • Updates: How will you update blocklists and models?
  • Testing: Do you have adversarial test cases?

Key Takeaway: The best guardrails stack combines complementary tools—fast local filters for obvious cases, accurate classifiers for nuanced decisions, and schema validation for structured outputs.

Next module: Deep dive into input filtering at scale with Presidio and injection detection. :::

Quiz

Module 1: Production Guardrails Architecture

Take Quiz