Lesson 1 of 18

Prompt Extraction Techniques

Why System Prompts Leak

4 min read

System prompt leakage has become one of the most significant security concerns in AI deployment. In 2025, OWASP added "System Prompt Leakage" as a new entry in the LLM Top 10, recognizing it as a distinct vulnerability category.

The Scale of the Problem

Two GitHub repositories have collected leaked prompts from 36+ major AI tools:

  • awesome-ai-system-prompts (106k+ stars): ChatGPT, Claude, Gemini, Cursor, v0, Copilot, Perplexity
  • leaked-system-prompts (24.8k+ stars): Devin, Manus, Windsurf, Lovable, Same.dev

These leaks reveal proprietary instructions, safety guardrails, tool configurations, and architectural decisions worth millions in R&D.

Why Prompts Are Valuable Targets

Asset TypeWhat Attackers Learn
Safety GuardrailsHow to bypass content restrictions
Tool DefinitionsAvailable functions and parameters
Persona InstructionsHow to manipulate behavior
Rate LimitsSystem constraints to exploit
ArchitectureMulti-agent orchestration patterns

Real-World Incidents (2025)

March 2025 - Fortune 500 Financial Services: A customer service AI leaked sensitive account data for weeks through carefully crafted prompt injection. Cost: millions in regulatory fines.

July-August 2025 - Global Data Leakage Wave: Multiple LLM applications exposed user chat records, credentials, and third-party application data through prompt injection attacks.

CVE-2025-54135/54136 - Cursor IDE: Indirect prompt injection in GitHub README files led to remote code execution. Attackers embedded malicious instructions in public documentation.

Why Extraction Works

LLMs have a fundamental design limitation: they cannot reliably distinguish between:

  1. System instructions (from developers)
  2. User input (potentially malicious)
  3. Retrieved content (from RAG/tools)

This is called the context confusion problem.

┌─────────────────────────────────────────┐
│           LLM Context Window            │
├─────────────────────────────────────────┤
│  [SYSTEM PROMPT]     ← Developer trust  │
│  [USER MESSAGE]      ← User trust       │
│  [RAG CONTENT]       ← External trust   │
│                                         │
│  LLM treats ALL as text to process      │
│  Cannot verify true source/intent       │
└─────────────────────────────────────────┘

The OWASP LLM Top 10 2025 Perspective

Prompt Injection remains #1 in OWASP LLM Top 10 2025. Key categories:

RankVulnerabilityRelation to Extraction
LLM01Prompt InjectionPrimary extraction vector
LLM07System Prompt LeakageNEW in 2025 - Direct extraction
LLM04Data & Model PoisoningCan embed extraction payloads
LLM08Vector & Embedding WeaknessesRAG-based extraction

Research Reality Check

"It's refreshing to see another major research lab concluding that prompt injection remains an unsolved problem, and attempts to block or filter them have not proven reliable enough to depend on." — Simon Willison, October 2025

The "The Attacker Moves Second" paper (October 2025) from OpenAI, Anthropic, and Google DeepMind tested 12 published defenses and bypassed them with >90% success rate.

Business Impact

System prompt exposure can lead to:

  1. Competitive Intelligence - Rivals learn your AI architecture
  2. Security Bypass - Attackers know your guardrails
  3. Regulatory Violations - EU AI Act (August 2025) requires transparency
  4. Reputation Damage - Public disclosure of internal instructions
  5. Liability - If leaked prompts enable harmful outputs

Key Insight: System prompts aren't secrets, but their disclosure reveals attack surfaces. The goal isn't perfect secrecy—it's defense in depth that assumes leakage will occur.

Next, we'll explore the specific techniques attackers use to extract system prompts. :::

Quick check: how does this lesson land for you?

Quiz

Module 1: Prompt Extraction Techniques

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.