Browser Automation with Claude

Data Extraction from Web Pages

5 min read

Claude's vision capabilities make it excellent at reading and extracting structured data from any web page.

Basic Extraction

task = """
Go to the product listing page and extract:
1. Product names
2. Prices
3. Ratings (if shown)
4. Return as a structured list
"""

Claude returns structured data:

1. Wireless Mouse - $29.99 - 4.5 stars
2. Mechanical Keyboard - $79.99 - 4.8 stars
3. USB-C Hub - $45.99 - 4.2 stars

Table Extraction

task = """
Find the pricing table and extract:
1. All plan names
2. Prices for each plan
3. Key features listed
Format as a comparison table
"""

Multi-Page Extraction

task = """
Go through the first 3 pages of search results:
1. On each page, extract all article titles and dates
2. Click 'Next' to go to the next page
3. Compile all results into a single list
4. Report total count
"""

Structured Output

Request specific formats:

task = """
Extract company information and return as JSON:
- Company name
- Address
- Phone number
- Email
- Business hours
"""

Claude responds:

{
  "name": "Acme Corp",
  "address": "123 Main St, City, ST 12345",
  "phone": "(555) 123-4567",
  "email": "contact@acme.com",
  "hours": "Mon-Fri 9AM-5PM"
}

Handling Dynamic Content

task = """
On the stock dashboard:
1. Wait for live data to load
2. Extract current prices for AAPL, GOOGL, MSFT
3. Note the timestamp of the data
4. If prices change while watching, report updates
"""

Image and Media Extraction

Claude can describe visual content:

task = """
On the product page:
1. Describe the main product image
2. List all visible product features from images
3. Note any badges or promotional overlays
"""

Comparison Extraction

task = """
Compare these two product pages:
1. Open first product in current tab
2. Open second product in new tab
3. Extract specs from both
4. Create a comparison table
5. Highlight differences
"""

Error Handling

task = """
Extract pricing information:
1. If price is not visible, check for 'See Price' button
2. If login required, report that
3. If price varies, note all options
4. If page error, try refreshing once
"""

Best Practices

Practice Benefit
Verify page fully loaded Complete data
Scroll to reveal content Catch hidden items
Request structured format Easier to process
Handle missing data Robust extraction

Tip: For large datasets, break extraction into smaller chunks to avoid context limits.

In the final module, we'll cover production deployment and safety. :::

Quiz

Module 4: Browser Automation

Take Quiz