Build a Tool-Calling Agent Framework
Instructions
In this lab, you'll build the foundation of every agentic system: a tool-calling agent in Python. Your agent will maintain a registry of tools, use an LLM to decide which tools to call, validate parameters, execute tools safely, and handle errors gracefully.
This is the pattern used by every major agent framework (LangGraph, CrewAI, OpenAI Agents SDK). By building it from scratch, you'll understand exactly how tool calling works under the hood.
Architecture Overview
User Message
↓
Agent Loop ←──────────────────┐
↓ │
LLM (decides: respond or │
call tool) │
↓ │
[If tool call] │
↓ │
Validate Parameters │
↓ │
Execute Tool (with timeout) │
↓ │
Inject Result → back to LLM ─┘
↓
[If final response]
↓
Return to User
Step 1: Tool Registry
Build a ToolRegistry class that manages tool definitions. Each tool has:
- A unique
name(string identifier) - A
description(what the tool does — this is sent to the LLM) - A
parametersJSON Schema (defines expected input types) - A
handlerfunction (the actual implementation)
The registry should support:
register(tool)— Add a tool to the registryunregister(name)— Remove a tool by nameget(name)— Retrieve a tool by namelist_tools()— Return all tool definitions (for sending to the LLM)
Step 2: Parameter Validation
Before executing any tool, validate the provided arguments against the tool's JSON Schema:
- Check required fields are present
- Check types match (string, number, boolean, array, object)
- Return a clear error message if validation fails
You can implement a simple validator or use the jsonschema library pattern.
Step 3: Tool Executor
Build a ToolExecutor that safely runs tool handlers:
- Execute the tool's handler function with the validated arguments
- Enforce a configurable timeout (default: 30 seconds)
- Catch exceptions and return structured error results
- Track execution metadata: start time, end time, duration, success/failure
Step 4: LLM Tool Selection
Build a ToolSelector that asks the LLM which tool(s) to call:
- Format the available tools as a structured prompt for the LLM
- Parse the LLM's response to extract tool call decisions
- Support the LLM deciding to call no tools (direct response)
- Support the LLM deciding to call multiple tools in sequence
Use structured output parsing: the LLM should return JSON with tool name and arguments.
Step 5: Agent Conversation Loop
Build the main Agent class that orchestrates everything:
- Accept a user message
- Send it to the LLM along with available tool definitions
- If the LLM decides to call a tool: validate, execute, inject result, loop back
- If the LLM provides a final response: return it to the user
- Support multi-step reasoning (the LLM may call multiple tools before responding)
- Enforce a maximum number of tool calls per turn (to prevent infinite loops)
Step 6: Audit Logging
Add an AuditLogger that records every action:
- Tool call attempts (tool name, arguments, timestamp)
- Tool execution results (success/failure, duration, output)
- LLM decisions (tool call vs. direct response)
- Error events (validation failures, timeouts, exceptions)
What to Submit
The editor has 6 file sections with TODO comments. Replace each TODO with your Python code. The AI grader will evaluate each section against the rubric.
Hints
- For the JSON Schema validator, focus on
typeandrequiredchecks — you don't need a full JSON Schema implementation - For the timeout, use Python's
concurrent.futures.ThreadPoolExecutorwith a timeout parameter - For LLM tool selection, define a clear prompt format and expected JSON response structure
- The agent loop should have a
max_iterationsparameter (default: 10) to prevent runaway execution