Introduction to AI Red Teaming
Red Team vs Blue Team Dynamics
2 min read
Effective AI security requires collaboration between attackers (red team) and defenders (blue team). This adversarial cooperation strengthens overall security posture through continuous improvement cycles.
Team Roles
┌─────────────────────────────────────────────────────────────┐
│ Security Team Structure │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ RED TEAM │ VS │ BLUE TEAM │ │
│ │ (Offense) │◄─────────►│ (Defense) │ │
│ ├───────────────┤ ├───────────────┤ │
│ │ • Find vulns │ │ • Fix vulns │ │
│ │ • Attack │ │ • Detect │ │
│ │ • Exploit │ │ • Respond │ │
│ │ • Report │ │ • Harden │ │
│ └───────────────┘ └───────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────┐ │
│ │ PURPLE TEAM │ │
│ │ (Collaboration) │ │
│ ├───────────────────┤ │
│ │ • Share findings │ │
│ │ • Joint exercises │ │
│ │ • Improve both │ │
│ └───────────────────┘ │
└─────────────────────────────────────────────────────────────┘
The Purple Team Approach
Modern security favors "purple teaming" - active collaboration between red and blue:
from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional
from enum import Enum
class FindingSeverity(Enum):
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
INFO = "informational"
@dataclass
class SecurityFinding:
"""
Finding shared between red and blue teams.
"""
id: str
title: str
severity: FindingSeverity
description: str
attack_vector: str
affected_systems: List[str]
discovered_at: datetime
discovered_by: str # Red team member
# Blue team response fields
remediation_status: str = "open"
remediation_owner: Optional[str] = None
remediation_notes: str = ""
resolved_at: Optional[datetime] = None
def time_to_remediate(self) -> Optional[float]:
"""Calculate hours from discovery to resolution."""
if self.resolved_at:
delta = self.resolved_at - self.discovered_at
return delta.total_seconds() / 3600
return None
@dataclass
class PurpleTeamSession:
"""
Collaborative session between red and blue teams.
"""
session_id: str
date: datetime
red_team_members: List[str]
blue_team_members: List[str]
findings_reviewed: List[SecurityFinding] = field(default_factory=list)
action_items: List[str] = field(default_factory=list)
def generate_summary(self) -> dict:
severity_counts = {}
for finding in self.findings_reviewed:
sev = finding.severity.value
severity_counts[sev] = severity_counts.get(sev, 0) + 1
return {
"session_id": self.session_id,
"total_findings": len(self.findings_reviewed),
"by_severity": severity_counts,
"action_items": len(self.action_items),
"participants": len(self.red_team_members) + len(self.blue_team_members)
}
Responsible Disclosure Timeline
| Phase | Timeframe | Activities |
|---|---|---|
| Discovery | Day 0 | Red team finds vulnerability |
| Initial Report | Day 0-1 | Document and notify blue team |
| Triage | Day 1-3 | Blue team assesses severity |
| Remediation | Day 3-30 | Blue team implements fix |
| Verification | Day 30+ | Red team confirms fix works |
Communication Framework
from enum import Enum
from typing import List
class CommunicationChannel(Enum):
SECURE_CHAT = "secure_chat"
ENCRYPTED_EMAIL = "encrypted_email"
TICKET_SYSTEM = "ticket_system"
EMERGENCY_CALL = "emergency_call"
@dataclass
class DisclosureProtocol:
"""
How red team communicates findings to blue team.
"""
channels: List[CommunicationChannel]
encryption_required: bool = True
max_disclosure_delay_hours: int = 24
def select_channel(self, severity: FindingSeverity) -> CommunicationChannel:
"""Select appropriate channel based on severity."""
if severity == FindingSeverity.CRITICAL:
return CommunicationChannel.EMERGENCY_CALL
elif severity == FindingSeverity.HIGH:
return CommunicationChannel.SECURE_CHAT
else:
return CommunicationChannel.TICKET_SYSTEM
# Example protocol
protocol = DisclosureProtocol(
channels=[
CommunicationChannel.EMERGENCY_CALL,
CommunicationChannel.SECURE_CHAT,
CommunicationChannel.TICKET_SYSTEM,
],
encryption_required=True,
max_disclosure_delay_hours=24
)
# Critical finding - use emergency channel
finding = SecurityFinding(
id="VULN-2025-001",
title="System Prompt Extraction via Multi-Turn Attack",
severity=FindingSeverity.CRITICAL,
description="Full system prompt can be extracted in 5 turns",
attack_vector="multi_turn_escalation",
affected_systems=["customer-chatbot"],
discovered_at=datetime.now(),
discovered_by="alice@redteam.com"
)
channel = protocol.select_channel(finding.severity)
print(f"Use {channel.value} for {finding.severity.value} finding")
Continuous Red Teaming
The most effective approach is continuous, not one-time testing:
Week 1-2: Initial Assessment
↓
Week 3-4: Blue Team Remediates
↓
Week 5: Re-test Fixed Issues
↓
Week 6+: Continuous Monitoring
↓
(Repeat cycle quarterly)
Key Insight: Red teams that work with (not against) blue teams produce better security outcomes. The goal is organizational improvement, not "winning."
In the next module, we'll set up your red teaming environment with professional tools. :::