Introduction to AI Red Teaming

The OWASP Red Teaming Guide

3 min read

The OWASP Gen AI Red Teaming Guide (January 2025) provides a structured methodology for testing AI systems. This framework ensures comprehensive coverage of attack surfaces.

The Four-Pillar Methodology

OWASP defines four distinct areas that require separate testing approaches:

┌─────────────────────────────────────────────────────────────┐
│              OWASP Gen AI Red Teaming Framework             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │   MODEL     │  │IMPLEMENTATION│ │INFRASTRUCTURE│         │
│  │   LAYER     │  │   LAYER     │  │   LAYER     │         │
│  ├─────────────┤  ├─────────────┤  ├─────────────┤         │
│  │ • Prompts   │  │ • Guardrails│  │ • API keys  │         │
│  │ • Training  │  │ • RAG       │  │ • Endpoints │         │
│  │ • Weights   │  │ • Agents    │  │ • Networks  │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
│                                                             │
│  ┌─────────────────────────────────────────────────┐       │
│  │              RUNTIME LAYER                       │       │
│  ├─────────────────────────────────────────────────┤       │
│  │ • Session context • Memory • Tool execution     │       │
│  └─────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘

Pillar Details

PillarFocus AreasExample Attacks
ModelLLM behavior, training dataJailbreaking, prompt injection
ImplementationApplication logic, integrationsRAG poisoning, tool abuse
InfrastructureAPIs, deployment, secretsKey extraction, DoS
RuntimeSession state, memoryContext manipulation

Implementing the Framework

from dataclasses import dataclass, field
from enum import Enum
from typing import List

class OWASPPillar(Enum):
    MODEL = "model"
    IMPLEMENTATION = "implementation"
    INFRASTRUCTURE = "infrastructure"
    RUNTIME = "runtime"

@dataclass
class RedTeamTestCase:
    """
    Test case aligned with OWASP methodology.
    """
    pillar: OWASPPillar
    name: str
    description: str
    attack_vectors: List[str] = field(default_factory=list)
    success_criteria: str = ""

    def to_dict(self) -> dict:
        return {
            "pillar": self.pillar.value,
            "name": self.name,
            "description": self.description,
            "attack_vectors": self.attack_vectors,
            "success_criteria": self.success_criteria,
        }

# Example test cases for each pillar
test_cases = [
    RedTeamTestCase(
        pillar=OWASPPillar.MODEL,
        name="Jailbreak Resistance",
        description="Test model resistance to jailbreak attempts",
        attack_vectors=["DAN prompts", "Role-play", "Hypotheticals"],
        success_criteria="Model refuses harmful content generation"
    ),
    RedTeamTestCase(
        pillar=OWASPPillar.IMPLEMENTATION,
        name="RAG Injection",
        description="Test RAG system for document injection",
        attack_vectors=["Malicious documents", "Metadata injection"],
        success_criteria="Injected content not executed as instructions"
    ),
    RedTeamTestCase(
        pillar=OWASPPillar.INFRASTRUCTURE,
        name="API Key Extraction",
        description="Attempt to extract API credentials",
        attack_vectors=["Prompt-based extraction", "Error messages"],
        success_criteria="No credentials exposed in responses"
    ),
    RedTeamTestCase(
        pillar=OWASPPillar.RUNTIME,
        name="Context Poisoning",
        description="Test session context manipulation",
        attack_vectors=["Multi-turn escalation", "Memory injection"],
        success_criteria="Session context properly sanitized"
    ),
]

Coverage Matrix

Create a coverage matrix to ensure all pillars are tested:

def generate_coverage_matrix(test_cases: List[RedTeamTestCase]) -> dict:
    """Generate a coverage report by pillar."""
    coverage = {pillar.value: [] for pillar in OWASPPillar}

    for test in test_cases:
        coverage[test.pillar.value].append(test.name)

    return {
        "coverage": coverage,
        "summary": {
            pillar: len(tests) for pillar, tests in coverage.items()
        }
    }

matrix = generate_coverage_matrix(test_cases)
print(f"Tests per pillar: {matrix['summary']}")
# Output: Tests per pillar: {'model': 1, 'implementation': 1, ...}

Key Insight: Many organizations focus only on model-layer testing. The OWASP framework ensures you don't miss implementation, infrastructure, or runtime vulnerabilities.

Next, we'll learn how to set scope and rules of engagement for red team exercises. :::

Quick check: how does this lesson land for you?

Quiz

Module 1: Introduction to AI Red Teaming

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.