Introduction to AI Red Teaming

The OWASP Red Teaming Guide

3 min read

The OWASP Gen AI Red Teaming Guide (January 2025) provides a structured methodology for testing AI systems. This framework ensures comprehensive coverage of attack surfaces.

The Four-Pillar Methodology

OWASP defines four distinct areas that require separate testing approaches:

┌─────────────────────────────────────────────────────────────┐
│              OWASP Gen AI Red Teaming Framework             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │   MODEL     │  │IMPLEMENTATION│ │INFRASTRUCTURE│         │
│  │   LAYER     │  │   LAYER     │  │   LAYER     │         │
│  ├─────────────┤  ├─────────────┤  ├─────────────┤         │
│  │ • Prompts   │  │ • Guardrails│  │ • API keys  │         │
│  │ • Training  │  │ • RAG       │  │ • Endpoints │         │
│  │ • Weights   │  │ • Agents    │  │ • Networks  │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
│                                                             │
│  ┌─────────────────────────────────────────────────┐       │
│  │              RUNTIME LAYER                       │       │
│  ├─────────────────────────────────────────────────┤       │
│  │ • Session context • Memory • Tool execution     │       │
│  └─────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘

Pillar Details

Pillar Focus Areas Example Attacks
Model LLM behavior, training data Jailbreaking, prompt injection
Implementation Application logic, integrations RAG poisoning, tool abuse
Infrastructure APIs, deployment, secrets Key extraction, DoS
Runtime Session state, memory Context manipulation

Implementing the Framework

from dataclasses import dataclass, field
from enum import Enum
from typing import List

class OWASPPillar(Enum):
    MODEL = "model"
    IMPLEMENTATION = "implementation"
    INFRASTRUCTURE = "infrastructure"
    RUNTIME = "runtime"

@dataclass
class RedTeamTestCase:
    """
    Test case aligned with OWASP methodology.
    """
    pillar: OWASPPillar
    name: str
    description: str
    attack_vectors: List[str] = field(default_factory=list)
    success_criteria: str = ""

    def to_dict(self) -> dict:
        return {
            "pillar": self.pillar.value,
            "name": self.name,
            "description": self.description,
            "attack_vectors": self.attack_vectors,
            "success_criteria": self.success_criteria,
        }

# Example test cases for each pillar
test_cases = [
    RedTeamTestCase(
        pillar=OWASPPillar.MODEL,
        name="Jailbreak Resistance",
        description="Test model resistance to jailbreak attempts",
        attack_vectors=["DAN prompts", "Role-play", "Hypotheticals"],
        success_criteria="Model refuses harmful content generation"
    ),
    RedTeamTestCase(
        pillar=OWASPPillar.IMPLEMENTATION,
        name="RAG Injection",
        description="Test RAG system for document injection",
        attack_vectors=["Malicious documents", "Metadata injection"],
        success_criteria="Injected content not executed as instructions"
    ),
    RedTeamTestCase(
        pillar=OWASPPillar.INFRASTRUCTURE,
        name="API Key Extraction",
        description="Attempt to extract API credentials",
        attack_vectors=["Prompt-based extraction", "Error messages"],
        success_criteria="No credentials exposed in responses"
    ),
    RedTeamTestCase(
        pillar=OWASPPillar.RUNTIME,
        name="Context Poisoning",
        description="Test session context manipulation",
        attack_vectors=["Multi-turn escalation", "Memory injection"],
        success_criteria="Session context properly sanitized"
    ),
]

Coverage Matrix

Create a coverage matrix to ensure all pillars are tested:

def generate_coverage_matrix(test_cases: List[RedTeamTestCase]) -> dict:
    """Generate a coverage report by pillar."""
    coverage = {pillar.value: [] for pillar in OWASPPillar}

    for test in test_cases:
        coverage[test.pillar.value].append(test.name)

    return {
        "coverage": coverage,
        "summary": {
            pillar: len(tests) for pillar, tests in coverage.items()
        }
    }

matrix = generate_coverage_matrix(test_cases)
print(f"Tests per pillar: {matrix['summary']}")
# Output: Tests per pillar: {'model': 1, 'implementation': 1, ...}

Key Insight: Many organizations focus only on model-layer testing. The OWASP framework ensures you don't miss implementation, infrastructure, or runtime vulnerabilities.

Next, we'll learn how to set scope and rules of engagement for red team exercises. :::

Quiz

Module 1: Introduction to AI Red Teaming

Take Quiz