Production Deployment & Safety
Monitoring and Observability
5 min read
Production Computer Use agents need comprehensive monitoring to ensure reliability, debug issues, and maintain security.
What to Monitor
| Category | Metrics |
|---|---|
| Performance | Response time, loop iterations, completion rate |
| Costs | Tokens used, screenshots processed, API calls |
| Reliability | Error rate, retry count, timeout frequency |
| Security | Blocked actions, suspicious patterns, failed auth |
Logging Framework
import logging
from datetime import datetime
logger = logging.getLogger("computer_use_agent")
class AgentLogger:
def __init__(self, session_id: str):
self.session_id = session_id
self.start_time = datetime.now()
def log_action(self, action: dict, result: dict):
logger.info({
"session": self.session_id,
"timestamp": datetime.now().isoformat(),
"action_type": action.get("type"),
"coordinates": action.get("coordinate"),
"success": result.get("success"),
"duration_ms": result.get("duration_ms")
})
def log_screenshot(self, size_bytes: int):
logger.info({
"session": self.session_id,
"event": "screenshot",
"size_bytes": size_bytes
})
Session Recording
Record full sessions for debugging:
class SessionRecorder:
def __init__(self):
self.screenshots = []
self.actions = []
def record_frame(self, screenshot_b64: str, action: dict):
self.screenshots.append(screenshot_b64)
self.actions.append({
"timestamp": time.time(),
"action": action
})
def save_recording(self, path: str):
# Save as video or frame sequence
with open(path, 'w') as f:
json.dump({
"actions": self.actions,
"frame_count": len(self.screenshots)
}, f)
Cost Tracking
class CostTracker:
COSTS = {
"input_token": 0.003 / 1000, # Per token
"output_token": 0.015 / 1000, # Per token
"image_token": 0.003 / 1000, # Per image token
}
def __init__(self):
self.total_cost = 0
self.call_count = 0
def add_usage(self, response):
usage = response.usage
cost = (
usage.input_tokens * self.COSTS["input_token"] +
usage.output_tokens * self.COSTS["output_token"]
)
self.total_cost += cost
self.call_count += 1
return {
"call_cost": cost,
"total_cost": self.total_cost,
"call_count": self.call_count
}
Health Checks
async def health_check():
checks = {
"api_connection": await test_api_connection(),
"display_available": await test_display(),
"disk_space": check_disk_space(),
"memory_available": check_memory()
}
healthy = all(checks.values())
return {"healthy": healthy, "checks": checks}
Alerting
def check_and_alert(metrics):
alerts = []
if metrics["error_rate"] > 0.1:
alerts.append("High error rate: >10%")
if metrics["avg_response_time"] > 30:
alerts.append("Slow responses: >30s average")
if metrics["cost_per_task"] > 1.0:
alerts.append("High costs: >$1 per task")
if alerts:
send_alert(alerts)
Dashboard Metrics
Essential metrics for your dashboard:
| Metric | Target | Alert Threshold |
|---|---|---|
| Success rate | >95% | <90% |
| Avg completion time | <60s | >120s |
| Cost per task | <$0.50 | >$1.00 |
| Error rate | <5% | >10% |
Debugging Tools
# Replay failed sessions
def replay_session(session_id: str):
session = load_session(session_id)
for i, (screenshot, action) in enumerate(session):
print(f"Step {i}: {action}")
display_screenshot(screenshot)
input("Press Enter to continue...")
Tip: Store session recordings for 7-30 days to debug issues reported by users.
Congratulations! You've completed the course. Time to build your own Computer Use agents. :::