Introduction to AI Red Teaming

Red Team vs Blue Team Dynamics

2 min read

Effective AI security requires collaboration between attackers (red team) and defenders (blue team). This adversarial cooperation strengthens overall security posture through continuous improvement cycles.

Team Roles

┌─────────────────────────────────────────────────────────────┐
│                    Security Team Structure                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌───────────────┐           ┌───────────────┐            │
│   │   RED TEAM    │    VS     │   BLUE TEAM   │            │
│   │   (Offense)   │◄─────────►│   (Defense)   │            │
│   ├───────────────┤           ├───────────────┤            │
│   │ • Find vulns  │           │ • Fix vulns   │            │
│   │ • Attack      │           │ • Detect      │            │
│   │ • Exploit     │           │ • Respond     │            │
│   │ • Report      │           │ • Harden      │            │
│   └───────────────┘           └───────────────┘            │
│                         │                                   │
│                         ▼                                   │
│              ┌───────────────────┐                         │
│              │    PURPLE TEAM    │                         │
│              │  (Collaboration)  │                         │
│              ├───────────────────┤                         │
│              │ • Share findings  │                         │
│              │ • Joint exercises │                         │
│              │ • Improve both    │                         │
│              └───────────────────┘                         │
└─────────────────────────────────────────────────────────────┘

The Purple Team Approach

Modern security favors "purple teaming" - active collaboration between red and blue:

from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional
from enum import Enum

class FindingSeverity(Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
    INFO = "informational"

@dataclass
class SecurityFinding:
    """
    Finding shared between red and blue teams.
    """
    id: str
    title: str
    severity: FindingSeverity
    description: str
    attack_vector: str
    affected_systems: List[str]
    discovered_at: datetime
    discovered_by: str  # Red team member

    # Blue team response fields
    remediation_status: str = "open"
    remediation_owner: Optional[str] = None
    remediation_notes: str = ""
    resolved_at: Optional[datetime] = None

    def time_to_remediate(self) -> Optional[float]:
        """Calculate hours from discovery to resolution."""
        if self.resolved_at:
            delta = self.resolved_at - self.discovered_at
            return delta.total_seconds() / 3600
        return None

@dataclass
class PurpleTeamSession:
    """
    Collaborative session between red and blue teams.
    """
    session_id: str
    date: datetime
    red_team_members: List[str]
    blue_team_members: List[str]
    findings_reviewed: List[SecurityFinding] = field(default_factory=list)
    action_items: List[str] = field(default_factory=list)

    def generate_summary(self) -> dict:
        severity_counts = {}
        for finding in self.findings_reviewed:
            sev = finding.severity.value
            severity_counts[sev] = severity_counts.get(sev, 0) + 1

        return {
            "session_id": self.session_id,
            "total_findings": len(self.findings_reviewed),
            "by_severity": severity_counts,
            "action_items": len(self.action_items),
            "participants": len(self.red_team_members) + len(self.blue_team_members)
        }

Responsible Disclosure Timeline

Phase Timeframe Activities
Discovery Day 0 Red team finds vulnerability
Initial Report Day 0-1 Document and notify blue team
Triage Day 1-3 Blue team assesses severity
Remediation Day 3-30 Blue team implements fix
Verification Day 30+ Red team confirms fix works

Communication Framework

from enum import Enum
from typing import List

class CommunicationChannel(Enum):
    SECURE_CHAT = "secure_chat"
    ENCRYPTED_EMAIL = "encrypted_email"
    TICKET_SYSTEM = "ticket_system"
    EMERGENCY_CALL = "emergency_call"

@dataclass
class DisclosureProtocol:
    """
    How red team communicates findings to blue team.
    """
    channels: List[CommunicationChannel]
    encryption_required: bool = True
    max_disclosure_delay_hours: int = 24

    def select_channel(self, severity: FindingSeverity) -> CommunicationChannel:
        """Select appropriate channel based on severity."""
        if severity == FindingSeverity.CRITICAL:
            return CommunicationChannel.EMERGENCY_CALL
        elif severity == FindingSeverity.HIGH:
            return CommunicationChannel.SECURE_CHAT
        else:
            return CommunicationChannel.TICKET_SYSTEM

# Example protocol
protocol = DisclosureProtocol(
    channels=[
        CommunicationChannel.EMERGENCY_CALL,
        CommunicationChannel.SECURE_CHAT,
        CommunicationChannel.TICKET_SYSTEM,
    ],
    encryption_required=True,
    max_disclosure_delay_hours=24
)

# Critical finding - use emergency channel
finding = SecurityFinding(
    id="VULN-2025-001",
    title="System Prompt Extraction via Multi-Turn Attack",
    severity=FindingSeverity.CRITICAL,
    description="Full system prompt can be extracted in 5 turns",
    attack_vector="multi_turn_escalation",
    affected_systems=["customer-chatbot"],
    discovered_at=datetime.now(),
    discovered_by="alice@redteam.com"
)

channel = protocol.select_channel(finding.severity)
print(f"Use {channel.value} for {finding.severity.value} finding")

Continuous Red Teaming

The most effective approach is continuous, not one-time testing:

Week 1-2: Initial Assessment
Week 3-4: Blue Team Remediates
Week 5: Re-test Fixed Issues
Week 6+: Continuous Monitoring
(Repeat cycle quarterly)

Key Insight: Red teams that work with (not against) blue teams produce better security outcomes. The goal is organizational improvement, not "winning."

In the next module, we'll set up your red teaming environment with professional tools. :::

Quiz

Module 1: Introduction to AI Red Teaming

Take Quiz