The AI Security Landscape
Security Mindset for AI Developers
Building secure AI applications requires a different way of thinking. Here are the core principles that should guide every decision.
The Four Security Principles
1. Zero Trust
Never trust any input, even from authenticated users.
# Wrong: Trusting user input
def process_request(user, query):
if user.is_authenticated:
return llm.generate(query) # User could still inject
# Right: Validate regardless of authentication
def process_request(user, query):
if user.is_authenticated:
validated_query = validate_and_sanitize(query)
return llm.generate(validated_query)
2. Defense in Depth
Layer multiple security controls. If one fails, others protect you.
# Multiple layers of defense
def secure_chat(user_input):
# Layer 1: Input validation
if not validate_input(user_input):
return "Invalid input"
# Layer 2: Content filtering
filtered_input = filter_dangerous_patterns(user_input)
# Layer 3: Guardrails
response = guardrails.process(filtered_input)
# Layer 4: Output validation
safe_response = sanitize_output(response)
# Layer 5: Logging for monitoring
log_interaction(user_input, safe_response)
return safe_response
3. Principle of Least Privilege
Give the LLM only the permissions it needs.
| Bad Practice | Good Practice |
|---|---|
| Full database access | Read-only to specific tables |
| All file operations | Read from whitelisted paths |
| Unrestricted API calls | Rate-limited, scoped tokens |
| Admin email access | Send-only, templated messages |
4. Assume Breach
Design as if the LLM will be compromised.
# Assume breach: Limit blast radius
class SecureAgent:
def __init__(self):
# Separate credentials per capability
self.read_db = DatabaseConnection(role="reader")
self.write_db = DatabaseConnection(role="writer")
# Audit everything
self.audit_log = AuditLogger()
# Automatic timeouts
self.max_operation_time = 30 # seconds
# Sandboxed execution
self.sandbox = Sandbox(
network=False,
filesystem="read_only",
max_memory="512MB"
)
Security Checklist for Every Feature
Before deploying any AI feature, ask:
- What's the worst case? If this is exploited, what happens?
- Who controls the input? Users, documents, or external systems?
- What can the LLM do? Tools, data access, actions?
- How will we detect abuse? Logging, monitoring, alerts?
- Can we limit damage? Rate limits, permissions, rollback?
The Security Developer's Mantra
"The LLM will do exactly what it's told. Make sure only you're telling it."
Every input to the LLM is an instruction. User messages, documents, tool outputs - they all influence behavior. Your job is to ensure only intended instructions reach the model.
Key Takeaway: Security isn't a feature you add later. It's a mindset you apply from the first line of code. :::