Build a Web Data Extraction Agent

Objective

Build a Python agent that uses Claude's Computer Use capability to navigate a website and extract structured data (like product listings) into JSON format.

Background

Unlike traditional web scraping that parses HTML, Computer Use agents interact with websites visually. This works even on JavaScript-heavy sites where traditional scrapers fail.

Requirements

Create a class WebDataExtractor with the following methods:

1. `generate_navigation_sequence(url: str) -> list[dict]`

Returns Computer Use tool calls to:

Focus the URL bar (Ctrl+L)
Type the URL
Press Enter to navigate
Wait for page load

2. `generate_scroll_sequence(direction: str, amount: int) -> list[dict]`

Returns tool calls to scroll:

direction: "up" or "down"
amount: number of scroll units

Use the scroll action from computer_20250124.

3. `parse_screenshot_data(screenshot_description: str) -> list[dict]`

Given a text description of what Claude sees on screen, extract structured data:

[
    {
        "title": "Product Name",
        "price": "$99.99",
        "rating": "4.5/5",
        "in_stock": True
    }
]

4. `generate_extraction_workflow(url: str, scroll_pages: int) -> list[dict]`

Returns a complete workflow:

Navigate to URL
Take screenshot
Scroll down
Take screenshot
Repeat for scroll_pages iterations

Tool Call Formats

# URL bar focus
{"type": "tool_use", "name": "computer", "input": {"action": "key", "text": "ctrl+l"}}

# Type URL
{"type": "tool_use", "name": "computer", "input": {"action": "type", "text": "https://example.com"}}

# Press Enter
{"type": "tool_use", "name": "computer", "input": {"action": "key", "text": "Return"}}

# Wait for load
{"type": "tool_use", "name": "computer", "input": {"action": "wait", "duration": 3000}}

# Scroll down
{"type": "tool_use", "name": "computer", "input": {"action": "scroll", "coordinate": [512, 384], "direction": "down", "amount": 3}}

# Take screenshot
{"type": "tool_use", "name": "computer", "input": {"action": "screenshot"}}

Hints

Always wait after navigation for page to load
Use coordinate in scroll to specify where to scroll
Screenshots should be taken after wait actions
Handle both Mac (cmd) and Linux (ctrl) keyboard shortcuts

What to Submit

Your submission should contain 1 file section in the editor below: a Python file with the complete WebDataExtractor class.

Instructions