Analyze a Computer Use API Response

Objective

In this lab, you'll write a Python function that parses Computer Use API responses and extracts key information about the agent's actions.

When Claude uses the Computer Use tool, it returns structured JSON describing actions like mouse clicks, keyboard input, and screenshots. Understanding this structure is essential for building monitoring and debugging systems.

Requirements

Create a function parse_computer_use_response(response: dict) -> dict that:

Extracts action type: Identify whether it's a mouse_move, left_click, type, key, screenshot, scroll, or wait action
Extracts coordinates: For mouse actions, return {"x": int, "y": int}
Extracts text content: For type actions, return the typed text
Identifies tool version: Extract computer_20250124 or older version

Returns structured output:

{
    "action_type": str,
    "coordinates": {"x": int, "y": int} | None,
    "text": str | None,
    "tool_version": str,
    "is_screenshot_request": bool
}

Example Input

response = {
    "type": "tool_use",
    "name": "computer",
    "input": {
        "action": "left_click",
        "coordinate": [512, 384]
    }
}

Example Output

{
    "action_type": "left_click",
    "coordinates": {"x": 512, "y": 384},
    "text": None,
    "tool_version": "computer_20250124",
    "is_screenshot_request": False
}

Hints

The coordinate field is a list [x, y], not a dict
Screenshot actions have action: "screenshot"
The type action has a text field with the content to type
Handle missing fields gracefully with defaults

What to Submit

Your submission should contain 1 file section in the editor below: a Python file with the complete implementation.

Instructions

Objective

Background

Requirements

Example Input

Example Output

Hints

What to Submit

Grading Rubric

Checklist

Your Solution

Stay on the Nerd Track