Lesson 5 of 18

Analyzing Leaked Prompts

Case Studies: Major AI Tool Prompts

5 min read

Let's analyze specific leaked prompts from major AI coding assistants to understand their security implications and architectural decisions.

Case Study 1: Cursor (January 2026)

Background: Cursor reached $500M ARR with 8 parallel Background Agents, making it the highest-revenue AI coding tool.

Security-Relevant Excerpts

Git Safety Protocol:

Git Safety Protocol:
- NEVER update the git config
- NEVER run destructive/irreversible git commands
  (like push --force, hard reset, etc) unless explicitly requested
- NEVER skip hooks (--no-verify, --no-gpg-sign, etc)
- NEVER run force push to main/master
- CRITICAL: If commit FAILED or was REJECTED by hook,
  NEVER amend - fix the issue and create a NEW commit

Security Analysis:

  • Explicit "NEVER" statements create bypass targets
  • Exceptions like "unless explicitly requested" enable social engineering
  • The distinction between failed/rejected commits reveals internal state handling

Tool Approval System:

You can use the following tools without requiring user approval:
Bash(ls:*), Bash(find:*), Bash(grep:*), Bash(npm install:*),
Bash(git status:*), Bash(git diff:*), Bash(git log:*)

Vulnerability: Attackers who can inject commands through these approved patterns bypass the confirmation system. For example, ls; malicious_command might execute if shell parsing isn't properly sanitized.

CVE-2025-54135/54136 Impact

The Cursor vulnerabilities demonstrated:

  1. Attack vector: Malicious instructions in GitHub README files
  2. Exploitation: When Cursor indexed the files, instructions executed
  3. Impact: Remote code execution through prompt injection
  4. Defense gap: No input sanitization for retrieved RAG content

Case Study 2: Claude Code (Anthropic)

Background: Anthropic's official CLI, known for detailed safety protocols.

MCP Integration Security

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem"],
      "env": {
        "ALLOWED_PATHS": "/home/user/projects"
      }
    }
  }
}

Security Analysis:

  • MCP servers execute external code with defined permissions
  • ALLOWED_PATHS creates a sandbox boundary
  • Attack surface: Can users manipulate MCP server configurations?

The Read-Before-Edit Rule

IMPORTANT: You must use your Read tool at least once in the
conversation before editing. This tool will error if you
attempt an edit without reading the file.

Why This Matters:

  • Prevents blind code modifications that could introduce vulnerabilities
  • Forces context awareness before changes
  • Security implication: What happens if an attacker poisons the read cache?

Defense-in-Depth Example

IMPORTANT: Assist with authorized security testing, defensive
security, CTF challenges, and educational contexts. Refuse
requests for destructive techniques, DoS attacks, mass targeting,
supply chain compromise, or detection evasion for malicious purposes.

Dual-use acknowledgment: The prompt explicitly handles the gray area between legitimate security research and malicious use—something many prompts fail to address.

Case Study 3: Devin 2.0 (Cognition Labs)

Background: Dropped from $500/month to $20/month in January 2026, reaching $4B valuation.

Confidence Evaluation System

CONFIDENCE EVALUATION:
Before ANY potentially impactful action, assess confidence:

HIGH CONFIDENCE (>80%):
- Clear user intent
- Familiar technology stack
- Well-documented approach
→ Proceed with execution

MEDIUM CONFIDENCE (50-80%):
- Some ambiguity in requirements
- Unfamiliar but documented technology
- Multiple valid approaches
→ Execute with verification checkpoint

LOW CONFIDENCE (<50%):
- Unclear requirements
- Undocumented or experimental approach
- High-risk changes
→ STOP and ask for clarification

Security Implications:

  1. Threshold manipulation: If attackers can artificially inflate confidence scores, they bypass safety checks
  2. Verification checkpoints: What constitutes "verification"? Is it bypassable?
  3. "Clear user intent": Attackers craft prompts that simulate high clarity

Multi-Agent Dispatch

Agent Dispatch Protocol:
- researcher: Information gathering, web search
- analyzer: Code analysis, security review
- writer: Documentation, code generation
- supervisor: Coordination, quality control

Attack surface: Inter-agent communication could be intercepted or poisoned. The "Prompt Infection" research showed self-replicating attacks between LLM agents.

Case Study 4: Windsurf (Codeium)

Background: Named Gartner Leader 2025, known for Memory System and Turbo Mode.

Memory System Security

Memory System:
- Persistent context across sessions
- User preferences and patterns stored
- Project-specific knowledge retained

Security Concerns:

  • What happens when memory is poisoned in an earlier session?
  • Can attackers inject persistent instructions through memory?
  • Memory corruption could affect all future sessions

Turbo Mode

Turbo Mode:
- Reduced safety checks for speed
- Parallel execution of multiple operations
- Cached responses for common patterns

Tradeoff: Speed vs. security. Turbo mode explicitly reduces safety checks, creating a target for attackers who can trigger this mode.

Common Vulnerability Patterns

From analyzing 36+ leaked prompts:

Pattern Frequency Severity Example
Explicit exceptions 85% High "unless explicitly requested"
Tool auto-approval 70% Critical Pre-approved command patterns
NEVER statements 90% Medium Creates obvious bypass targets
Confidence thresholds 40% High Manipulable decision boundaries
Memory/persistence 35% Critical Session poisoning vectors

Defense Lessons

1. Avoid binary exceptions:

# Bad: Creates bypass target
❌ "Never do X unless user explicitly asks"

# Better: Require verification workflow
✓ "X requires multi-step confirmation with explicit acknowledgment"

2. Sanitize approved patterns:

# Bad: Pattern allows injection
❌ "Approved: Bash(ls:*)"

# Better: Strict parameter validation
✓ "Approved: Bash(ls) with validated path arguments only"

3. Defense-in-depth for memory:

# Validate stored context before use
# Expire sensitive information
# Log memory access patterns for anomaly detection

Key Insight: Leaked prompts are security blueprints in reverse. Every "NEVER" tells attackers what to try, every exception reveals a loophole, and every confidence threshold shows where manipulation is possible.

Next, we'll examine the specific security vulnerabilities these patterns create. :::

Quiz

Module 2: Analyzing Leaked Prompts

Take Quiz