Prompt Extraction Techniques
Why System Prompts Leak
System prompt leakage has become one of the most significant security concerns in AI deployment. In 2025, OWASP added "System Prompt Leakage" as a new entry in the LLM Top 10, recognizing it as a distinct vulnerability category.
The Scale of the Problem
Two GitHub repositories have collected leaked prompts from 36+ major AI tools:
- awesome-ai-system-prompts (106k+ stars): ChatGPT, Claude, Gemini, Cursor, v0, Copilot, Perplexity
- leaked-system-prompts (24.8k+ stars): Devin, Manus, Windsurf, Lovable, Same.dev
These leaks reveal proprietary instructions, safety guardrails, tool configurations, and architectural decisions worth millions in R&D.
Why Prompts Are Valuable Targets
| Asset Type | What Attackers Learn |
|---|---|
| Safety Guardrails | How to bypass content restrictions |
| Tool Definitions | Available functions and parameters |
| Persona Instructions | How to manipulate behavior |
| Rate Limits | System constraints to exploit |
| Architecture | Multi-agent orchestration patterns |
Real-World Incidents (2025)
March 2025 - Fortune 500 Financial Services: A customer service AI leaked sensitive account data for weeks through carefully crafted prompt injection. Cost: millions in regulatory fines.
July-August 2025 - Global Data Leakage Wave: Multiple LLM applications exposed user chat records, credentials, and third-party application data through prompt injection attacks.
CVE-2025-54135/54136 - Cursor IDE: Indirect prompt injection in GitHub README files led to remote code execution. Attackers embedded malicious instructions in public documentation.
Why Extraction Works
LLMs have a fundamental design limitation: they cannot reliably distinguish between:
- System instructions (from developers)
- User input (potentially malicious)
- Retrieved content (from RAG/tools)
This is called the context confusion problem.
┌─────────────────────────────────────────┐
│ LLM Context Window │
├─────────────────────────────────────────┤
│ [SYSTEM PROMPT] ← Developer trust │
│ [USER MESSAGE] ← User trust │
│ [RAG CONTENT] ← External trust │
│ │
│ LLM treats ALL as text to process │
│ Cannot verify true source/intent │
└─────────────────────────────────────────┘
The OWASP LLM Top 10 2025 Perspective
Prompt Injection remains #1 in OWASP LLM Top 10 2025. Key categories:
| Rank | Vulnerability | Relation to Extraction |
|---|---|---|
| LLM01 | Prompt Injection | Primary extraction vector |
| LLM07 | System Prompt Leakage | NEW in 2025 - Direct extraction |
| LLM04 | Data & Model Poisoning | Can embed extraction payloads |
| LLM08 | Vector & Embedding Weaknesses | RAG-based extraction |
Research Reality Check
"It's refreshing to see another major research lab concluding that prompt injection remains an unsolved problem, and attempts to block or filter them have not proven reliable enough to depend on." — Simon Willison, October 2025
The "The Attacker Moves Second" paper (October 2025) from OpenAI, Anthropic, and Google DeepMind tested 12 published defenses and bypassed them with >90% success rate.
Business Impact
System prompt exposure can lead to:
- Competitive Intelligence - Rivals learn your AI architecture
- Security Bypass - Attackers know your guardrails
- Regulatory Violations - EU AI Act (August 2025) requires transparency
- Reputation Damage - Public disclosure of internal instructions
- Liability - If leaked prompts enable harmful outputs
Key Insight: System prompts aren't secrets, but their disclosure reveals attack surfaces. The goal isn't perfect secrecy—it's defense in depth that assumes leakage will occur.
Next, we'll explore the specific techniques attackers use to extract system prompts. :::