Analyze a Computer Use API Response
Instructions
Objective
In this lab, you'll write a Python function that parses Computer Use API responses and extracts key information about the agent's actions.
Background
When Claude uses the Computer Use tool, it returns structured JSON describing actions like mouse clicks, keyboard input, and screenshots. Understanding this structure is essential for building monitoring and debugging systems.
Requirements
Create a function parse_computer_use_response(response: dict) -> dict that:
-
Extracts action type: Identify whether it's a
mouse_move,left_click,type,key,screenshot,scroll, orwaitaction -
Extracts coordinates: For mouse actions, return
{"x": int, "y": int} -
Extracts text content: For
typeactions, return the typed text -
Identifies tool version: Extract
computer_20250124or older version -
Returns structured output:
{ "action_type": str, "coordinates": {"x": int, "y": int} | None, "text": str | None, "tool_version": str, "is_screenshot_request": bool }
Example Input
response = {
"type": "tool_use",
"name": "computer",
"input": {
"action": "left_click",
"coordinate": [512, 384]
}
}
Example Output
{
"action_type": "left_click",
"coordinates": {"x": 512, "y": 384},
"text": None,
"tool_version": "computer_20250124",
"is_screenshot_request": False
}
Hints
- The
coordinatefield is a list[x, y], not a dict - Screenshot actions have
action: "screenshot" - The
typeaction has atextfield with the content to type - Handle missing fields gracefully with defaults