Back to Course|Claude Computer Use: Building Autonomous Desktop & Browser Agents
Lab

Implement Safety Guardrails for Production Agents

60 min
Advanced
3 Free Attempts

Instructions

Objective

Create a comprehensive safety guardrails system that validates and filters Computer Use agent actions before they're executed. This is critical for production deployments where autonomous agents could cause damage.

Background

Anthropic achieved a 50% reduction in prompt injection success rates by implementing careful safety measures. Your guardrails should:

  • Block dangerous actions
  • Require confirmation for sensitive operations
  • Log all actions for audit
  • Rate-limit expensive operations

Requirements

Create a class SafetyGuardrails with these methods:

1. validate_action(action: dict) -> ValidationResult

Check if an action is safe to execute:

@dataclass
class ValidationResult:
    allowed: bool
    reason: str
    requires_confirmation: bool
    risk_level: str  # "low", "medium", "high", "blocked"

2. is_dangerous_command(text: str) -> bool

Check if typed text contains dangerous commands:

  • rm -rf or rm -r (file deletion)
  • sudo (privilege escalation)
  • curl | bash (piped scripts)
  • chmod 777 (dangerous permissions)
  • > /dev/sda (disk writes)
  • Password/credential patterns

3. is_sensitive_url(url: str) -> bool

Check if URL is sensitive:

  • Banking sites (contains "bank", "paypal", "stripe")
  • Admin panels (contains "admin", "console", "dashboard")
  • Authentication pages (contains "login", "signin", "oauth")
  • Cloud consoles (AWS, GCP, Azure)

4. check_rate_limit(action_type: str) -> bool

Implement rate limiting:

  • Maximum 10 clicks per minute
  • Maximum 50 keystrokes per minute
  • Maximum 5 URL navigations per minute
  • Returns True if within limit, False if exceeded

5. log_action(action: dict, result: ValidationResult) -> None

Log actions with timestamp, action details, and validation result. Store in memory for audit retrieval.

6. get_audit_log() -> list[dict]

Return all logged actions with timestamps.

Example Usage

guardrails = SafetyGuardrails()

action = {
    "action": "type",
    "text": "sudo rm -rf /"
}

result = guardrails.validate_action(action)
# result.allowed = False
# result.reason = "Blocked: dangerous command detected (sudo, rm -rf)"
# result.risk_level = "blocked"

Risk Levels

Level Examples Response
low mouse_move, screenshot Allow
medium type normal text, click Allow
high navigate to bank, type password Require confirmation
blocked rm -rf, sudo, curl|bash Block entirely

Hints

  • Use regex for pattern matching dangerous commands
  • Store rate limit timestamps in a deque with maxlen
  • The audit log should be immutable (append-only)
  • Consider both typed text and keyboard shortcuts

Grading Rubric

validate_action correctly categorizes actions by risk level25 points
is_dangerous_command detects all dangerous patterns20 points
is_sensitive_url identifies sensitive destinations15 points
check_rate_limit properly enforces time-based limits20 points
Audit logging captures complete action details20 points

Your Solution

This lab requires Python
🐍Python(required)
3 free attempts remaining