How Computer Use Works

Computer Use operates through a continuous feedback loop between Claude and your computer. Understanding this loop is essential for building effective agents.

The Agentic Loop

┌─────────────────────────────────────────────────┐
│  1. Claude receives task + screenshot          │
│                    ↓                            │
│  2. Claude analyzes screen, decides action      │
│                    ↓                            │
│  3. Claude returns tool call (click, type, etc) │
│                    ↓                            │
│  4. Your code executes the action               │
│                    ↓                            │
│  5. New screenshot captured                     │
│                    ↓                            │
│  6. Loop continues until task complete          │
└─────────────────────────────────────────────────┘

The Computer Tool

The computer_20250124 tool version provides these actions:

Action	Description
`screenshot`	Capture current screen state
`mouse_move`	Move cursor to x,y coordinates
`left_click`	Click left mouse button
`right_click`	Click right mouse button
`double_click`	Double-click left button
`triple_click`	Select entire line/paragraph
`left_mouse_down`	Hold left button (for dragging)
`left_mouse_up`	Release left button
`scroll`	Scroll up/down/left/right
`type`	Type text string
`key`	Press keyboard key or combo
`hold_key`	Hold a key while performing action
`wait`	Pause for specified duration

Required Headers

To use Computer Use, include these beta headers:

headers = {
    "anthropic-beta": "computer-use-2025-01-24"
}

Screen Resolution

Claude works best with specific resolutions. The recommended setup:

# Optimal resolutions for Computer Use
RECOMMENDED_RESOLUTIONS = [
    (1024, 768),   # XGA - fast, efficient
    (1280, 800),   # WXGA - good balance
    (1920, 1080),  # Full HD - more detail
]

Tip: Lower resolutions mean faster screenshot processing and lower token costs.

Vision-Based Understanding

Claude uses its vision capabilities to:

Identify UI elements (buttons, text fields, menus)
Read text on screen
Understand layout and spatial relationships
Track changes between screenshots

This means Claude can work with any application without needing specialized integrations.

Next, we'll look at the Agent SDK that simplifies building these loops. :::

The Agentic Loop

The Computer Tool

Required Headers

Screen Resolution

Vision-Based Understanding

Quiz

Stay on the Nerd Track