Build a Tool-Calling Agent Framework

In this lab, you'll build the foundation of every agentic system: a tool-calling agent in Python. Your agent will maintain a registry of tools, use an LLM to decide which tools to call, validate parameters, execute tools safely, and handle errors gracefully.

This is the pattern used by every major agent framework (LangGraph, CrewAI, OpenAI Agents SDK). By building it from scratch, you'll understand exactly how tool calling works under the hood.

Architecture Overview

User Message
     ↓
Agent Loop ←──────────────────┐
     ↓                        │
LLM (decides: respond or     │
     call tool)               │
     ↓                        │
[If tool call]                │
     ↓                        │
Validate Parameters           │
     ↓                        │
Execute Tool (with timeout)   │
     ↓                        │
Inject Result → back to LLM ─┘
     ↓
[If final response]
     ↓
Return to User

Step 1: Tool Registry

Build a ToolRegistry class that manages tool definitions. Each tool has:

A unique name (string identifier)
A description (what the tool does — this is sent to the LLM)
A parameters JSON Schema (defines expected input types)
A handler function (the actual implementation)

The registry should support:

register(tool) — Add a tool to the registry
unregister(name) — Remove a tool by name
get(name) — Retrieve a tool by name
list_tools() — Return all tool definitions (for sending to the LLM)

Step 2: Parameter Validation

Before executing any tool, validate the provided arguments against the tool's JSON Schema:

Check required fields are present
Check types match (string, number, boolean, array, object)
Return a clear error message if validation fails

You can implement a simple validator or use the jsonschema library pattern.

Step 3: Tool Executor

Build a ToolExecutor that safely runs tool handlers:

Execute the tool's handler function with the validated arguments
Enforce a configurable timeout (default: 30 seconds)
Catch exceptions and return structured error results
Track execution metadata: start time, end time, duration, success/failure

Step 4: LLM Tool Selection

Build a ToolSelector that asks the LLM which tool(s) to call:

Format the available tools as a structured prompt for the LLM
Parse the LLM's response to extract tool call decisions
Support the LLM deciding to call no tools (direct response)
Support the LLM deciding to call multiple tools in sequence

Use structured output parsing: the LLM should return JSON with tool name and arguments.

Step 5: Agent Conversation Loop

Build the main Agent class that orchestrates everything:

Accept a user message
Send it to the LLM along with available tool definitions
If the LLM decides to call a tool: validate, execute, inject result, loop back
If the LLM provides a final response: return it to the user
Support multi-step reasoning (the LLM may call multiple tools before responding)
Enforce a maximum number of tool calls per turn (to prevent infinite loops)

Step 6: Audit Logging

Add an AuditLogger that records every action:

Tool call attempts (tool name, arguments, timestamp)
Tool execution results (success/failure, duration, output)
LLM decisions (tool call vs. direct response)
Error events (validation failures, timeouts, exceptions)

What to Submit

The editor has 6 file sections with TODO comments. Replace each TODO with your Python code. The AI grader will evaluate each section against the rubric.

Hints

For the JSON Schema validator, focus on type and required checks — you don't need a full JSON Schema implementation
For the timeout, use Python's concurrent.futures.ThreadPoolExecutor with a timeout parameter
For LLM tool selection, define a clear prompt format and expected JSON response structure
The agent loop should have a max_iterations parameter (default: 10) to prevent runaway execution

Instructions

Architecture Overview

Step 1: Tool Registry

Step 2: Parameter Validation

Step 3: Tool Executor

Step 4: LLM Tool Selection

Step 5: Agent Conversation Loop

Step 6: Audit Logging

What to Submit

Hints

Grading Rubric

Checklist

Your Solution

Stay on the Nerd Track