Why System Prompts Leak

System prompt leakage has become one of the most significant security concerns in AI deployment. In 2025, OWASP added "System Prompt Leakage" as a new entry in the LLM Top 10, recognizing it as a distinct vulnerability category.

The Scale of the Problem

Two GitHub repositories have collected leaked prompts from 36+ major AI tools:

awesome-ai-system-prompts (106k+ stars): ChatGPT, Claude, Gemini, Cursor, v0, Copilot, Perplexity
leaked-system-prompts (24.8k+ stars): Devin, Manus, Windsurf, Lovable, Same.dev

These leaks reveal proprietary instructions, safety guardrails, tool configurations, and architectural decisions worth millions in R&D.

Why Prompts Are Valuable Targets

Asset Type	What Attackers Learn
Safety Guardrails	How to bypass content restrictions
Tool Definitions	Available functions and parameters
Persona Instructions	How to manipulate behavior
Rate Limits	System constraints to exploit
Architecture	Multi-agent orchestration patterns

Real-World Incidents (2025)

March 2025 - Fortune 500 Financial Services: A customer service AI leaked sensitive account data for weeks through carefully crafted prompt injection. Cost: millions in regulatory fines.

July-August 2025 - Global Data Leakage Wave: Multiple LLM applications exposed user chat records, credentials, and third-party application data through prompt injection attacks.

CVE-2025-54135/54136 - Cursor IDE: Indirect prompt injection in GitHub README files led to remote code execution. Attackers embedded malicious instructions in public documentation.

Why Extraction Works

LLMs have a fundamental design limitation: they cannot reliably distinguish between:

System instructions (from developers)
User input (potentially malicious)
Retrieved content (from RAG/tools)

This is called the context confusion problem.

┌─────────────────────────────────────────┐
│           LLM Context Window            │
├─────────────────────────────────────────┤
│  [SYSTEM PROMPT]     ← Developer trust  │
│  [USER MESSAGE]      ← User trust       │
│  [RAG CONTENT]       ← External trust   │
│                                         │
│  LLM treats ALL as text to process      │
│  Cannot verify true source/intent       │
└─────────────────────────────────────────┘

The OWASP LLM Top 10 2025 Perspective

Prompt Injection remains #1 in OWASP LLM Top 10 2025. Key categories:

Rank	Vulnerability	Relation to Extraction
LLM01	Prompt Injection	Primary extraction vector
LLM07	System Prompt Leakage	NEW in 2025 - Direct extraction
LLM04	Data & Model Poisoning	Can embed extraction payloads
LLM08	Vector & Embedding Weaknesses	RAG-based extraction

Research Reality Check

"It's refreshing to see another major research lab concluding that prompt injection remains an unsolved problem, and attempts to block or filter them have not proven reliable enough to depend on." — Simon Willison, October 2025

The "The Attacker Moves Second" paper (October 2025) from OpenAI, Anthropic, and Google DeepMind tested 12 published defenses and bypassed them with >90% success rate.

Business Impact

System prompt exposure can lead to:

Competitive Intelligence - Rivals learn your AI architecture
Security Bypass - Attackers know your guardrails
Regulatory Violations - EU AI Act (August 2025) requires transparency
Reputation Damage - Public disclosure of internal instructions
Liability - If leaked prompts enable harmful outputs

Key Insight: System prompts aren't secrets, but their disclosure reveals attack surfaces. The goal isn't perfect secrecy—it's defense in depth that assumes leakage will occur.

Next, we'll explore the specific techniques attackers use to extract system prompts. :::