Lab
Build a Web Data Extraction Agent
45 min
Intermediate3 Free Attempts
Instructions
Objective
Build a Python agent that uses Claude's Computer Use capability to navigate a website and extract structured data (like product listings) into JSON format.
Background
Unlike traditional web scraping that parses HTML, Computer Use agents interact with websites visually. This works even on JavaScript-heavy sites where traditional scrapers fail.
Requirements
Create a class WebDataExtractor with the following methods:
1. generate_navigation_sequence(url: str) -> list[dict]
Returns Computer Use tool calls to:
- Focus the URL bar (Ctrl+L)
- Type the URL
- Press Enter to navigate
- Wait for page load
2. generate_scroll_sequence(direction: str, amount: int) -> list[dict]
Returns tool calls to scroll:
direction: "up" or "down"amount: number of scroll units
Use the scroll action from computer_20250124.
3. parse_screenshot_data(screenshot_description: str) -> list[dict]
Given a text description of what Claude sees on screen, extract structured data:
[
{
"title": "Product Name",
"price": "$99.99",
"rating": "4.5/5",
"in_stock": True
}
]
4. generate_extraction_workflow(url: str, scroll_pages: int) -> list[dict]
Returns a complete workflow:
- Navigate to URL
- Take screenshot
- Scroll down
- Take screenshot
- Repeat for
scroll_pagesiterations
Tool Call Formats
# URL bar focus
{"type": "tool_use", "name": "computer", "input": {"action": "key", "text": "ctrl+l"}}
# Type URL
{"type": "tool_use", "name": "computer", "input": {"action": "type", "text": "https://example.com"}}
# Press Enter
{"type": "tool_use", "name": "computer", "input": {"action": "key", "text": "Return"}}
# Wait for load
{"type": "tool_use", "name": "computer", "input": {"action": "wait", "duration": 3000}}
# Scroll down
{"type": "tool_use", "name": "computer", "input": {"action": "scroll", "coordinate": [512, 384], "direction": "down", "amount": 3}}
# Take screenshot
{"type": "tool_use", "name": "computer", "input": {"action": "screenshot"}}
Hints
- Always wait after navigation for page to load
- Use
coordinatein scroll to specify where to scroll - Screenshots should be taken after wait actions
- Handle both Mac (cmd) and Linux (ctrl) keyboard shortcuts
Grading Rubric
generate_navigation_sequence produces valid URL navigation sequence25 points
generate_scroll_sequence uses correct scroll action format20 points
parse_screenshot_data extracts structured data correctly25 points
generate_extraction_workflow combines all steps correctly30 points
Your Solution
This lab requires Python
🐍Python(required)
3 free attempts remaining