Other Critical Vulnerabilities

Sensitive Data Exposure

2 min read

LLMs can leak sensitive information in multiple ways: from training data memorization, context window exposure, or improper handling of user data.

Training Data Extraction

LLMs memorize portions of their training data. Attackers can extract this:

# Extraction attempt
prompt = """
The following is an excerpt from the OpenAI employee handbook:
"Chapter 1: Employee Benefits
All full-time employees receive..."

Complete the next paragraph:
"""

# If the model was trained on this document, it might reproduce it

Types of Exposed Data

Source Sensitive Data Risk
Training data PII, credentials, code Identity theft, breaches
Context window Previous user messages Privacy violation
RAG documents Internal documents Data breach
Tool outputs API responses Information disclosure

Context Window Leakage

In multi-user systems, conversation isolation is critical:

# Vulnerable: Shared context across users
class BadChatbot:
    def __init__(self):
        self.conversation_history = []  # Shared across all users!

    def chat(self, user_id: str, message: str) -> str:
        self.conversation_history.append(message)
        # Other users' messages are in context
        return llm.generate(self.conversation_history)

# Secure: Isolated contexts per user
class SecureChatbot:
    def __init__(self):
        self.conversations = {}  # Per-user isolation

    def chat(self, user_id: str, message: str) -> str:
        if user_id not in self.conversations:
            self.conversations[user_id] = []
        self.conversations[user_id].append(message)
        return llm.generate(self.conversations[user_id])

PII Detection and Redaction

import re
from typing import Tuple

def detect_and_redact_pii(text: str) -> Tuple[str, list]:
    """Detect and redact PII from text."""
    findings = []
    redacted = text

    patterns = {
        'email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
        'phone': r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b',
        'ssn': r'\b\d{3}-\d{2}-\d{4}\b',
        'credit_card': r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b',
        'ip_address': r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b',
    }

    for pii_type, pattern in patterns.items():
        matches = re.findall(pattern, text)
        for match in matches:
            findings.append({'type': pii_type, 'value': match})
            redacted = redacted.replace(match, f'[REDACTED_{pii_type.upper()}]')

    return redacted, findings

# Usage
user_input = "My email is john@example.com and SSN is 123-45-6789"
safe_input, detected = detect_and_redact_pii(user_input)
# safe_input: "My email is [REDACTED_EMAIL] and SSN is [REDACTED_SSN]"

Output Filtering

def filter_sensitive_output(response: str) -> str:
    """Remove sensitive data from LLM outputs."""
    # Redact PII
    response, _ = detect_and_redact_pii(response)

    # Remove potential credentials
    credential_patterns = [
        r'(?:password|pwd|pass)[:\s]*\S+',
        r'(?:api[_-]?key|apikey)[:\s]*\S+',
        r'(?:secret|token)[:\s]*\S+',
        r'Bearer\s+\S+',
    ]

    for pattern in credential_patterns:
        response = re.sub(pattern, '[REDACTED_CREDENTIAL]', response, flags=re.I)

    return response

Key Takeaway: Implement PII detection on both inputs and outputs. Assume the LLM might try to output sensitive data, and filter accordingly. :::

Quiz

Module 3: Other Critical Vulnerabilities

Take Quiz