Lesson 1 of 18

Prompt Extraction Techniques

Why System Prompts Leak

4 min read

System prompt leakage has become one of the most significant security concerns in AI deployment. In 2025, OWASP added "System Prompt Leakage" as a new entry in the LLM Top 10, recognizing it as a distinct vulnerability category.

The Scale of the Problem

Two GitHub repositories have collected leaked prompts from 36+ major AI tools:

  • awesome-ai-system-prompts (106k+ stars): ChatGPT, Claude, Gemini, Cursor, v0, Copilot, Perplexity
  • leaked-system-prompts (24.8k+ stars): Devin, Manus, Windsurf, Lovable, Same.dev

These leaks reveal proprietary instructions, safety guardrails, tool configurations, and architectural decisions worth millions in R&D.

Why Prompts Are Valuable Targets

Asset Type What Attackers Learn
Safety Guardrails How to bypass content restrictions
Tool Definitions Available functions and parameters
Persona Instructions How to manipulate behavior
Rate Limits System constraints to exploit
Architecture Multi-agent orchestration patterns

Real-World Incidents (2025)

March 2025 - Fortune 500 Financial Services: A customer service AI leaked sensitive account data for weeks through carefully crafted prompt injection. Cost: millions in regulatory fines.

July-August 2025 - Global Data Leakage Wave: Multiple LLM applications exposed user chat records, credentials, and third-party application data through prompt injection attacks.

CVE-2025-54135/54136 - Cursor IDE: Indirect prompt injection in GitHub README files led to remote code execution. Attackers embedded malicious instructions in public documentation.

Why Extraction Works

LLMs have a fundamental design limitation: they cannot reliably distinguish between:

  1. System instructions (from developers)
  2. User input (potentially malicious)
  3. Retrieved content (from RAG/tools)

This is called the context confusion problem.

┌─────────────────────────────────────────┐
│           LLM Context Window            │
├─────────────────────────────────────────┤
│  [SYSTEM PROMPT]     ← Developer trust  │
│  [USER MESSAGE]      ← User trust       │
│  [RAG CONTENT]       ← External trust   │
│                                         │
│  LLM treats ALL as text to process      │
│  Cannot verify true source/intent       │
└─────────────────────────────────────────┘

The OWASP LLM Top 10 2025 Perspective

Prompt Injection remains #1 in OWASP LLM Top 10 2025. Key categories:

Rank Vulnerability Relation to Extraction
LLM01 Prompt Injection Primary extraction vector
LLM07 System Prompt Leakage NEW in 2025 - Direct extraction
LLM04 Data & Model Poisoning Can embed extraction payloads
LLM08 Vector & Embedding Weaknesses RAG-based extraction

Research Reality Check

"It's refreshing to see another major research lab concluding that prompt injection remains an unsolved problem, and attempts to block or filter them have not proven reliable enough to depend on." — Simon Willison, October 2025

The "The Attacker Moves Second" paper (October 2025) from OpenAI, Anthropic, and Google DeepMind tested 12 published defenses and bypassed them with >90% success rate.

Business Impact

System prompt exposure can lead to:

  1. Competitive Intelligence - Rivals learn your AI architecture
  2. Security Bypass - Attackers know your guardrails
  3. Regulatory Violations - EU AI Act (August 2025) requires transparency
  4. Reputation Damage - Public disclosure of internal instructions
  5. Liability - If leaked prompts enable harmful outputs

Key Insight: System prompts aren't secrets, but their disclosure reveals attack surfaces. The goal isn't perfect secrecy—it's defense in depth that assumes leakage will occur.

Next, we'll explore the specific techniques attackers use to extract system prompts. :::

Quiz

Module 1 Quiz: Prompt Extraction Techniques

Take Quiz