Understanding Computer Use

How Computer Use Works

5 min read

Computer Use operates through a continuous feedback loop between Claude and your computer. Understanding this loop is essential for building effective agents.

The Agentic Loop

┌─────────────────────────────────────────────────┐
│  1. Claude receives task + screenshot          │
│                    ↓                            │
│  2. Claude analyzes screen, decides action      │
│                    ↓                            │
│  3. Claude returns tool call (click, type, etc) │
│                    ↓                            │
│  4. Your code executes the action               │
│                    ↓                            │
│  5. New screenshot captured                     │
│                    ↓                            │
│  6. Loop continues until task complete          │
└─────────────────────────────────────────────────┘

The Computer Tool

The computer_20250124 tool version provides these actions:

ActionDescription
screenshotCapture current screen state
mouse_moveMove cursor to x,y coordinates
left_clickClick left mouse button
right_clickClick right mouse button
double_clickDouble-click left button
triple_clickSelect entire line/paragraph
left_mouse_downHold left button (for dragging)
left_mouse_upRelease left button
scrollScroll up/down/left/right
typeType text string
keyPress keyboard key or combo
hold_keyHold a key while performing action
waitPause for specified duration

Required Headers

To use Computer Use, include these beta headers:

headers = {
    "anthropic-beta": "computer-use-2025-01-24"
}

Screen Resolution

Claude works best with specific resolutions. The recommended setup:

# Optimal resolutions for Computer Use
RECOMMENDED_RESOLUTIONS = [
    (1024, 768),   # XGA - fast, efficient
    (1280, 800),   # WXGA - good balance
    (1920, 1080),  # Full HD - more detail
]

Tip: Lower resolutions mean faster screenshot processing and lower token costs.

Vision-Based Understanding

Claude uses its vision capabilities to:

  1. Identify UI elements (buttons, text fields, menus)
  2. Read text on screen
  3. Understand layout and spatial relationships
  4. Track changes between screenshots

This means Claude can work with any application without needing specialized integrations.

Next, we'll look at the Agent SDK that simplifies building these loops. :::

Quiz

Module 1: Understanding Computer Use

Take Quiz
Was this lesson helpful?

Sign in to rate

FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.