Production Deployment & Safety

Monitoring and Observability

5 min read

Production Computer Use agents need comprehensive monitoring to ensure reliability, debug issues, and maintain security.

What to Monitor

CategoryMetrics
PerformanceResponse time, loop iterations, completion rate
CostsTokens used, screenshots processed, API calls
ReliabilityError rate, retry count, timeout frequency
SecurityBlocked actions, suspicious patterns, failed auth

Logging Framework

import logging
from datetime import datetime

logger = logging.getLogger("computer_use_agent")

class AgentLogger:
    def __init__(self, session_id: str):
        self.session_id = session_id
        self.start_time = datetime.now()

    def log_action(self, action: dict, result: dict):
        logger.info({
            "session": self.session_id,
            "timestamp": datetime.now().isoformat(),
            "action_type": action.get("type"),
            "coordinates": action.get("coordinate"),
            "success": result.get("success"),
            "duration_ms": result.get("duration_ms")
        })

    def log_screenshot(self, size_bytes: int):
        logger.info({
            "session": self.session_id,
            "event": "screenshot",
            "size_bytes": size_bytes
        })

Session Recording

Record full sessions for debugging:

class SessionRecorder:
    def __init__(self):
        self.screenshots = []
        self.actions = []

    def record_frame(self, screenshot_b64: str, action: dict):
        self.screenshots.append(screenshot_b64)
        self.actions.append({
            "timestamp": time.time(),
            "action": action
        })

    def save_recording(self, path: str):
        # Save as video or frame sequence
        with open(path, 'w') as f:
            json.dump({
                "actions": self.actions,
                "frame_count": len(self.screenshots)
            }, f)

Cost Tracking

class CostTracker:
    # Example rates for Claude Sonnet 4.6 ($3 / $15 per 1M tokens).
    # Adjust these if you use Opus 4.6 ($5 / $25) or Haiku 4.5 ($1 / $5).
    COSTS = {
        "input_token": 0.003 / 1000,   # Per token ($3 / 1M)
        "output_token": 0.015 / 1000,  # Per token ($15 / 1M)
        "image_token": 0.003 / 1000,   # Per image token (same as input)
    }

    def __init__(self):
        self.total_cost = 0
        self.call_count = 0

    def add_usage(self, response):
        usage = response.usage
        cost = (
            usage.input_tokens * self.COSTS["input_token"] +
            usage.output_tokens * self.COSTS["output_token"]
        )
        self.total_cost += cost
        self.call_count += 1

        return {
            "call_cost": cost,
            "total_cost": self.total_cost,
            "call_count": self.call_count
        }

Health Checks

async def health_check():
    checks = {
        "api_connection": await test_api_connection(),
        "display_available": await test_display(),
        "disk_space": check_disk_space(),
        "memory_available": check_memory()
    }

    healthy = all(checks.values())
    return {"healthy": healthy, "checks": checks}

Alerting

def check_and_alert(metrics):
    alerts = []

    if metrics["error_rate"] > 0.1:
        alerts.append("High error rate: >10%")

    if metrics["avg_response_time"] > 30:
        alerts.append("Slow responses: >30s average")

    if metrics["cost_per_task"] > 1.0:
        alerts.append("High costs: >$1 per task")

    if alerts:
        send_alert(alerts)

Dashboard Metrics

Essential metrics for your dashboard:

MetricTargetAlert Threshold
Success rate>95%<90%
Avg completion time<60s>120s
Cost per task<$0.50>$1.00
Error rate<5%>10%

Debugging Tools

# Replay failed sessions
def replay_session(session_id: str):
    session = load_session(session_id)
    for i, (screenshot, action) in enumerate(session):
        print(f"Step {i}: {action}")
        display_screenshot(screenshot)
        input("Press Enter to continue...")

Tip: Store session recordings for 7-30 days to debug issues reported by users.

Congratulations! You've completed the course. Time to build your own Computer Use agents. :::

Quiz

Module 5: Production Deployment & Safety

Take Quiz
Was this lesson helpful?

Sign in to rate

FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.