Introduction to AI Red Teaming
The OWASP Red Teaming Guide
3 min read
The OWASP Gen AI Red Teaming Guide (January 2025) provides a structured methodology for testing AI systems. This framework ensures comprehensive coverage of attack surfaces.
The Four-Pillar Methodology
OWASP defines four distinct areas that require separate testing approaches:
┌─────────────────────────────────────────────────────────────┐
│ OWASP Gen AI Red Teaming Framework │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ MODEL │ │IMPLEMENTATION│ │INFRASTRUCTURE│ │
│ │ LAYER │ │ LAYER │ │ LAYER │ │
│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │
│ │ • Prompts │ │ • Guardrails│ │ • API keys │ │
│ │ • Training │ │ • RAG │ │ • Endpoints │ │
│ │ • Weights │ │ • Agents │ │ • Networks │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ RUNTIME LAYER │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ • Session context • Memory • Tool execution │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Pillar Details
| Pillar | Focus Areas | Example Attacks |
|---|---|---|
| Model | LLM behavior, training data | Jailbreaking, prompt injection |
| Implementation | Application logic, integrations | RAG poisoning, tool abuse |
| Infrastructure | APIs, deployment, secrets | Key extraction, DoS |
| Runtime | Session state, memory | Context manipulation |
Implementing the Framework
from dataclasses import dataclass, field
from enum import Enum
from typing import List
class OWASPPillar(Enum):
MODEL = "model"
IMPLEMENTATION = "implementation"
INFRASTRUCTURE = "infrastructure"
RUNTIME = "runtime"
@dataclass
class RedTeamTestCase:
"""
Test case aligned with OWASP methodology.
"""
pillar: OWASPPillar
name: str
description: str
attack_vectors: List[str] = field(default_factory=list)
success_criteria: str = ""
def to_dict(self) -> dict:
return {
"pillar": self.pillar.value,
"name": self.name,
"description": self.description,
"attack_vectors": self.attack_vectors,
"success_criteria": self.success_criteria,
}
# Example test cases for each pillar
test_cases = [
RedTeamTestCase(
pillar=OWASPPillar.MODEL,
name="Jailbreak Resistance",
description="Test model resistance to jailbreak attempts",
attack_vectors=["DAN prompts", "Role-play", "Hypotheticals"],
success_criteria="Model refuses harmful content generation"
),
RedTeamTestCase(
pillar=OWASPPillar.IMPLEMENTATION,
name="RAG Injection",
description="Test RAG system for document injection",
attack_vectors=["Malicious documents", "Metadata injection"],
success_criteria="Injected content not executed as instructions"
),
RedTeamTestCase(
pillar=OWASPPillar.INFRASTRUCTURE,
name="API Key Extraction",
description="Attempt to extract API credentials",
attack_vectors=["Prompt-based extraction", "Error messages"],
success_criteria="No credentials exposed in responses"
),
RedTeamTestCase(
pillar=OWASPPillar.RUNTIME,
name="Context Poisoning",
description="Test session context manipulation",
attack_vectors=["Multi-turn escalation", "Memory injection"],
success_criteria="Session context properly sanitized"
),
]
Coverage Matrix
Create a coverage matrix to ensure all pillars are tested:
def generate_coverage_matrix(test_cases: List[RedTeamTestCase]) -> dict:
"""Generate a coverage report by pillar."""
coverage = {pillar.value: [] for pillar in OWASPPillar}
for test in test_cases:
coverage[test.pillar.value].append(test.name)
return {
"coverage": coverage,
"summary": {
pillar: len(tests) for pillar, tests in coverage.items()
}
}
matrix = generate_coverage_matrix(test_cases)
print(f"Tests per pillar: {matrix['summary']}")
# Output: Tests per pillar: {'model': 1, 'implementation': 1, ...}
Key Insight: Many organizations focus only on model-layer testing. The OWASP framework ensures you don't miss implementation, infrastructure, or runtime vulnerabilities.
Next, we'll learn how to set scope and rules of engagement for red team exercises. :::