Back to Course|Claude Computer Use: Building Autonomous Desktop & Browser Agents
Lab

Build a Web Data Extraction Agent

45 min
Intermediate
3 Free Attempts

Instructions

Objective

Build a Python agent that uses Claude's Computer Use capability to navigate a website and extract structured data (like product listings) into JSON format.

Background

Unlike traditional web scraping that parses HTML, Computer Use agents interact with websites visually. This works even on JavaScript-heavy sites where traditional scrapers fail.

Requirements

Create a class WebDataExtractor with the following methods:

1. generate_navigation_sequence(url: str) -> list[dict]

Returns Computer Use tool calls to:

  • Focus the URL bar (Ctrl+L)
  • Type the URL
  • Press Enter to navigate
  • Wait for page load

2. generate_scroll_sequence(direction: str, amount: int) -> list[dict]

Returns tool calls to scroll:

  • direction: "up" or "down"
  • amount: number of scroll units

Use the scroll action from computer_20250124.

3. parse_screenshot_data(screenshot_description: str) -> list[dict]

Given a text description of what Claude sees on screen, extract structured data:

[
    {
        "title": "Product Name",
        "price": "$99.99",
        "rating": "4.5/5",
        "in_stock": True
    }
]

4. generate_extraction_workflow(url: str, scroll_pages: int) -> list[dict]

Returns a complete workflow:

  1. Navigate to URL
  2. Take screenshot
  3. Scroll down
  4. Take screenshot
  5. Repeat for scroll_pages iterations

Tool Call Formats

# URL bar focus
{"type": "tool_use", "name": "computer", "input": {"action": "key", "text": "ctrl+l"}}

# Type URL
{"type": "tool_use", "name": "computer", "input": {"action": "type", "text": "https://example.com"}}

# Press Enter
{"type": "tool_use", "name": "computer", "input": {"action": "key", "text": "Return"}}

# Wait for load
{"type": "tool_use", "name": "computer", "input": {"action": "wait", "duration": 3000}}

# Scroll down
{"type": "tool_use", "name": "computer", "input": {"action": "scroll", "coordinate": [512, 384], "direction": "down", "amount": 3}}

# Take screenshot
{"type": "tool_use", "name": "computer", "input": {"action": "screenshot"}}

Hints

  • Always wait after navigation for page to load
  • Use coordinate in scroll to specify where to scroll
  • Screenshots should be taken after wait actions
  • Handle both Mac (cmd) and Linux (ctrl) keyboard shortcuts

Grading Rubric

generate_navigation_sequence produces valid URL navigation sequence25 points
generate_scroll_sequence uses correct scroll action format20 points
parse_screenshot_data extracts structured data correctly25 points
generate_extraction_workflow combines all steps correctly30 points

Your Solution

This lab requires Python
🐍Python(required)
3 free attempts remaining