Understanding Computer Use
How Computer Use Works
5 min read
Computer Use operates through a continuous feedback loop between Claude and your computer. Understanding this loop is essential for building effective agents.
The Agentic Loop
┌─────────────────────────────────────────────────┐
│ 1. Claude receives task + screenshot │
│ ↓ │
│ 2. Claude analyzes screen, decides action │
│ ↓ │
│ 3. Claude returns tool call (click, type, etc) │
│ ↓ │
│ 4. Your code executes the action │
│ ↓ │
│ 5. New screenshot captured │
│ ↓ │
│ 6. Loop continues until task complete │
└─────────────────────────────────────────────────┘
The Computer Tool
The computer_20250124 tool version provides these actions:
| Action | Description |
|---|---|
screenshot |
Capture current screen state |
mouse_move |
Move cursor to x,y coordinates |
left_click |
Click left mouse button |
right_click |
Click right mouse button |
double_click |
Double-click left button |
triple_click |
Select entire line/paragraph |
left_mouse_down |
Hold left button (for dragging) |
left_mouse_up |
Release left button |
scroll |
Scroll up/down/left/right |
type |
Type text string |
key |
Press keyboard key or combo |
hold_key |
Hold a key while performing action |
wait |
Pause for specified duration |
Required Headers
To use Computer Use, include these beta headers:
headers = {
"anthropic-beta": "computer-use-2025-01-24"
}
Screen Resolution
Claude works best with specific resolutions. The recommended setup:
# Optimal resolutions for Computer Use
RECOMMENDED_RESOLUTIONS = [
(1024, 768), # XGA - fast, efficient
(1280, 800), # WXGA - good balance
(1920, 1080), # Full HD - more detail
]
Tip: Lower resolutions mean faster screenshot processing and lower token costs.
Vision-Based Understanding
Claude uses its vision capabilities to:
- Identify UI elements (buttons, text fields, menus)
- Read text on screen
- Understand layout and spatial relationships
- Track changes between screenshots
This means Claude can work with any application without needing specialized integrations.
Next, we'll look at the Agent SDK that simplifies building these loops. :::