Analyzing Leaked Prompts
Case Studies: Major AI Tool Prompts
Let's analyze specific leaked prompts from major AI coding assistants to understand their security implications and architectural decisions.
Case Study 1: Cursor (January 2026)
Background: Cursor reached $500M ARR with 8 parallel Background Agents, making it the highest-revenue AI coding tool.
Security-Relevant Excerpts
Git Safety Protocol:
Git Safety Protocol:
- NEVER update the git config
- NEVER run destructive/irreversible git commands
(like push --force, hard reset, etc) unless explicitly requested
- NEVER skip hooks (--no-verify, --no-gpg-sign, etc)
- NEVER run force push to main/master
- CRITICAL: If commit FAILED or was REJECTED by hook,
NEVER amend - fix the issue and create a NEW commit
Security Analysis:
- Explicit "NEVER" statements create bypass targets
- Exceptions like "unless explicitly requested" enable social engineering
- The distinction between failed/rejected commits reveals internal state handling
Tool Approval System:
You can use the following tools without requiring user approval:
Bash(ls:*), Bash(find:*), Bash(grep:*), Bash(npm install:*),
Bash(git status:*), Bash(git diff:*), Bash(git log:*)
Vulnerability: Attackers who can inject commands through these approved patterns bypass the confirmation system. For example, ls; malicious_command might execute if shell parsing isn't properly sanitized.
CVE-2025-54135/54136 Impact
The Cursor vulnerabilities demonstrated:
- Attack vector: Malicious instructions in GitHub README files
- Exploitation: When Cursor indexed the files, instructions executed
- Impact: Remote code execution through prompt injection
- Defense gap: No input sanitization for retrieved RAG content
Case Study 2: Claude Code (Anthropic)
Background: Anthropic's official CLI, known for detailed safety protocols.
MCP Integration Security
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem"],
"env": {
"ALLOWED_PATHS": "/home/user/projects"
}
}
}
}
Security Analysis:
- MCP servers execute external code with defined permissions
ALLOWED_PATHScreates a sandbox boundary- Attack surface: Can users manipulate MCP server configurations?
The Read-Before-Edit Rule
IMPORTANT: You must use your Read tool at least once in the
conversation before editing. This tool will error if you
attempt an edit without reading the file.
Why This Matters:
- Prevents blind code modifications that could introduce vulnerabilities
- Forces context awareness before changes
- Security implication: What happens if an attacker poisons the read cache?
Defense-in-Depth Example
IMPORTANT: Assist with authorized security testing, defensive
security, CTF challenges, and educational contexts. Refuse
requests for destructive techniques, DoS attacks, mass targeting,
supply chain compromise, or detection evasion for malicious purposes.
Dual-use acknowledgment: The prompt explicitly handles the gray area between legitimate security research and malicious use—something many prompts fail to address.
Case Study 3: Devin 2.0 (Cognition Labs)
Background: Dropped from $500/month to $20/month in January 2026, reaching $4B valuation.
Confidence Evaluation System
CONFIDENCE EVALUATION:
Before ANY potentially impactful action, assess confidence:
HIGH CONFIDENCE (>80%):
- Clear user intent
- Familiar technology stack
- Well-documented approach
→ Proceed with execution
MEDIUM CONFIDENCE (50-80%):
- Some ambiguity in requirements
- Unfamiliar but documented technology
- Multiple valid approaches
→ Execute with verification checkpoint
LOW CONFIDENCE (<50%):
- Unclear requirements
- Undocumented or experimental approach
- High-risk changes
→ STOP and ask for clarification
Security Implications:
- Threshold manipulation: If attackers can artificially inflate confidence scores, they bypass safety checks
- Verification checkpoints: What constitutes "verification"? Is it bypassable?
- "Clear user intent": Attackers craft prompts that simulate high clarity
Multi-Agent Dispatch
Agent Dispatch Protocol:
- researcher: Information gathering, web search
- analyzer: Code analysis, security review
- writer: Documentation, code generation
- supervisor: Coordination, quality control
Attack surface: Inter-agent communication could be intercepted or poisoned. The "Prompt Infection" research showed self-replicating attacks between LLM agents.
Case Study 4: Windsurf (Codeium)
Background: Named Gartner Leader 2025, known for Memory System and Turbo Mode.
Memory System Security
Memory System:
- Persistent context across sessions
- User preferences and patterns stored
- Project-specific knowledge retained
Security Concerns:
- What happens when memory is poisoned in an earlier session?
- Can attackers inject persistent instructions through memory?
- Memory corruption could affect all future sessions
Turbo Mode
Turbo Mode:
- Reduced safety checks for speed
- Parallel execution of multiple operations
- Cached responses for common patterns
Tradeoff: Speed vs. security. Turbo mode explicitly reduces safety checks, creating a target for attackers who can trigger this mode.
Common Vulnerability Patterns
From analyzing 36+ leaked prompts:
| Pattern | Frequency | Severity | Example |
|---|---|---|---|
| Explicit exceptions | 85% | High | "unless explicitly requested" |
| Tool auto-approval | 70% | Critical | Pre-approved command patterns |
| NEVER statements | 90% | Medium | Creates obvious bypass targets |
| Confidence thresholds | 40% | High | Manipulable decision boundaries |
| Memory/persistence | 35% | Critical | Session poisoning vectors |
Defense Lessons
1. Avoid binary exceptions:
# Bad: Creates bypass target
❌ "Never do X unless user explicitly asks"
# Better: Require verification workflow
✓ "X requires multi-step confirmation with explicit acknowledgment"
2. Sanitize approved patterns:
# Bad: Pattern allows injection
❌ "Approved: Bash(ls:*)"
# Better: Strict parameter validation
✓ "Approved: Bash(ls) with validated path arguments only"
3. Defense-in-depth for memory:
# Validate stored context before use
# Expire sensitive information
# Log memory access patterns for anomaly detection
Key Insight: Leaked prompts are security blueprints in reverse. Every "NEVER" tells attackers what to try, every exception reveals a loophole, and every confidence threshold shows where manipulation is possible.
Next, we'll examine the specific security vulnerabilities these patterns create. :::