Production Deployment & Safety

Monitoring and Observability

5 min read

Production Computer Use agents need comprehensive monitoring to ensure reliability, debug issues, and maintain security.

What to Monitor

Category Metrics
Performance Response time, loop iterations, completion rate
Costs Tokens used, screenshots processed, API calls
Reliability Error rate, retry count, timeout frequency
Security Blocked actions, suspicious patterns, failed auth

Logging Framework

import logging
from datetime import datetime

logger = logging.getLogger("computer_use_agent")

class AgentLogger:
    def __init__(self, session_id: str):
        self.session_id = session_id
        self.start_time = datetime.now()

    def log_action(self, action: dict, result: dict):
        logger.info({
            "session": self.session_id,
            "timestamp": datetime.now().isoformat(),
            "action_type": action.get("type"),
            "coordinates": action.get("coordinate"),
            "success": result.get("success"),
            "duration_ms": result.get("duration_ms")
        })

    def log_screenshot(self, size_bytes: int):
        logger.info({
            "session": self.session_id,
            "event": "screenshot",
            "size_bytes": size_bytes
        })

Session Recording

Record full sessions for debugging:

class SessionRecorder:
    def __init__(self):
        self.screenshots = []
        self.actions = []

    def record_frame(self, screenshot_b64: str, action: dict):
        self.screenshots.append(screenshot_b64)
        self.actions.append({
            "timestamp": time.time(),
            "action": action
        })

    def save_recording(self, path: str):
        # Save as video or frame sequence
        with open(path, 'w') as f:
            json.dump({
                "actions": self.actions,
                "frame_count": len(self.screenshots)
            }, f)

Cost Tracking

class CostTracker:
    COSTS = {
        "input_token": 0.003 / 1000,   # Per token
        "output_token": 0.015 / 1000,  # Per token
        "image_token": 0.003 / 1000,   # Per image token
    }

    def __init__(self):
        self.total_cost = 0
        self.call_count = 0

    def add_usage(self, response):
        usage = response.usage
        cost = (
            usage.input_tokens * self.COSTS["input_token"] +
            usage.output_tokens * self.COSTS["output_token"]
        )
        self.total_cost += cost
        self.call_count += 1

        return {
            "call_cost": cost,
            "total_cost": self.total_cost,
            "call_count": self.call_count
        }

Health Checks

async def health_check():
    checks = {
        "api_connection": await test_api_connection(),
        "display_available": await test_display(),
        "disk_space": check_disk_space(),
        "memory_available": check_memory()
    }

    healthy = all(checks.values())
    return {"healthy": healthy, "checks": checks}

Alerting

def check_and_alert(metrics):
    alerts = []

    if metrics["error_rate"] > 0.1:
        alerts.append("High error rate: >10%")

    if metrics["avg_response_time"] > 30:
        alerts.append("Slow responses: >30s average")

    if metrics["cost_per_task"] > 1.0:
        alerts.append("High costs: >$1 per task")

    if alerts:
        send_alert(alerts)

Dashboard Metrics

Essential metrics for your dashboard:

Metric Target Alert Threshold
Success rate >95% <90%
Avg completion time <60s >120s
Cost per task <$0.50 >$1.00
Error rate <5% >10%

Debugging Tools

# Replay failed sessions
def replay_session(session_id: str):
    session = load_session(session_id)
    for i, (screenshot, action) in enumerate(session):
        print(f"Step {i}: {action}")
        display_screenshot(screenshot)
        input("Press Enter to continue...")

Tip: Store session recordings for 7-30 days to debug issues reported by users.

Congratulations! You've completed the course. Time to build your own Computer Use agents. :::

Quiz

Module 5: Production Deployment & Safety

Take Quiz