Browser Automation with Claude
Data Extraction from Web Pages
5 min read
Claude's vision capabilities make it excellent at reading and extracting structured data from any web page.
Basic Extraction
task = """
Go to the product listing page and extract:
1. Product names
2. Prices
3. Ratings (if shown)
4. Return as a structured list
"""
Claude returns structured data:
1. Wireless Mouse - $29.99 - 4.5 stars
2. Mechanical Keyboard - $79.99 - 4.8 stars
3. USB-C Hub - $45.99 - 4.2 stars
Table Extraction
task = """
Find the pricing table and extract:
1. All plan names
2. Prices for each plan
3. Key features listed
Format as a comparison table
"""
Multi-Page Extraction
task = """
Go through the first 3 pages of search results:
1. On each page, extract all article titles and dates
2. Click 'Next' to go to the next page
3. Compile all results into a single list
4. Report total count
"""
Structured Output
Request specific formats:
task = """
Extract company information and return as JSON:
- Company name
- Address
- Phone number
- Email
- Business hours
"""
Claude responds:
{
"name": "Acme Corp",
"address": "123 Main St, City, ST 12345",
"phone": "(555) 123-4567",
"email": "contact@acme.com",
"hours": "Mon-Fri 9AM-5PM"
}
Handling Dynamic Content
task = """
On the stock dashboard:
1. Wait for live data to load
2. Extract current prices for AAPL, GOOGL, MSFT
3. Note the timestamp of the data
4. If prices change while watching, report updates
"""
Image and Media Extraction
Claude can describe visual content:
task = """
On the product page:
1. Describe the main product image
2. List all visible product features from images
3. Note any badges or promotional overlays
"""
Comparison Extraction
task = """
Compare these two product pages:
1. Open first product in current tab
2. Open second product in new tab
3. Extract specs from both
4. Create a comparison table
5. Highlight differences
"""
Error Handling
task = """
Extract pricing information:
1. If price is not visible, check for 'See Price' button
2. If login required, report that
3. If price varies, note all options
4. If page error, try refreshing once
"""
Best Practices
| Practice | Benefit |
|---|---|
| Verify page fully loaded | Complete data |
| Scroll to reveal content | Catch hidden items |
| Request structured format | Easier to process |
| Handle missing data | Robust extraction |
Tip: For large datasets, break extraction into smaller chunks to avoid context limits.
In the final module, we'll cover production deployment and safety. :::