Back to Course|AI Agent Engineer Interviews: Design, Build & Deploy Production Agentic Systems
Lab

Build a Tool-Calling Agent Framework

35 min
Advanced
Unlimited free attempts

Instructions

In this lab, you'll build the foundation of every agentic system: a tool-calling agent in Python. Your agent will maintain a registry of tools, use an LLM to decide which tools to call, validate parameters, execute tools safely, and handle errors gracefully.

This is the pattern used by every major agent framework (LangGraph, CrewAI, OpenAI Agents SDK). By building it from scratch, you'll understand exactly how tool calling works under the hood.

Architecture Overview

User Message
Agent Loop ←──────────────────┐
     ↓                        │
LLM (decides: respond or     │
     call tool)               │
     ↓                        │
[If tool call]                │
     ↓                        │
Validate Parameters           │
     ↓                        │
Execute Tool (with timeout)   │
     ↓                        │
Inject Result → back to LLM ─┘
[If final response]
Return to User

Step 1: Tool Registry

Build a ToolRegistry class that manages tool definitions. Each tool has:

  • A unique name (string identifier)
  • A description (what the tool does — this is sent to the LLM)
  • A parameters JSON Schema (defines expected input types)
  • A handler function (the actual implementation)

The registry should support:

  • register(tool) — Add a tool to the registry
  • unregister(name) — Remove a tool by name
  • get(name) — Retrieve a tool by name
  • list_tools() — Return all tool definitions (for sending to the LLM)

Step 2: Parameter Validation

Before executing any tool, validate the provided arguments against the tool's JSON Schema:

  • Check required fields are present
  • Check types match (string, number, boolean, array, object)
  • Return a clear error message if validation fails

You can implement a simple validator or use the jsonschema library pattern.

Step 3: Tool Executor

Build a ToolExecutor that safely runs tool handlers:

  • Execute the tool's handler function with the validated arguments
  • Enforce a configurable timeout (default: 30 seconds)
  • Catch exceptions and return structured error results
  • Track execution metadata: start time, end time, duration, success/failure

Step 4: LLM Tool Selection

Build a ToolSelector that asks the LLM which tool(s) to call:

  • Format the available tools as a structured prompt for the LLM
  • Parse the LLM's response to extract tool call decisions
  • Support the LLM deciding to call no tools (direct response)
  • Support the LLM deciding to call multiple tools in sequence

Use structured output parsing: the LLM should return JSON with tool name and arguments.

Step 5: Agent Conversation Loop

Build the main Agent class that orchestrates everything:

  • Accept a user message
  • Send it to the LLM along with available tool definitions
  • If the LLM decides to call a tool: validate, execute, inject result, loop back
  • If the LLM provides a final response: return it to the user
  • Support multi-step reasoning (the LLM may call multiple tools before responding)
  • Enforce a maximum number of tool calls per turn (to prevent infinite loops)

Step 6: Audit Logging

Add an AuditLogger that records every action:

  • Tool call attempts (tool name, arguments, timestamp)
  • Tool execution results (success/failure, duration, output)
  • LLM decisions (tool call vs. direct response)
  • Error events (validation failures, timeouts, exceptions)

What to Submit

The editor has 6 file sections with TODO comments. Replace each TODO with your Python code. The AI grader will evaluate each section against the rubric.

Hints

  • For the JSON Schema validator, focus on type and required checks — you don't need a full JSON Schema implementation
  • For the timeout, use Python's concurrent.futures.ThreadPoolExecutor with a timeout parameter
  • For LLM tool selection, define a clear prompt format and expected JSON response structure
  • The agent loop should have a max_iterations parameter (default: 10) to prevent runaway execution

Grading Rubric

ToolRegistry implements register (with duplicate name check), unregister, get (both raise appropriate errors), and list_tools that returns serializable dicts without handler20 points
ParameterValidator validates required fields and type checking for string, number, integer, boolean, array, object types, collects all errors, and raises ValidationError with descriptive messages15 points
ToolExecutor runs handler with ThreadPoolExecutor timeout, records start_time/end_time/duration_ms, returns structured ExecutionResult on success, timeout, and exception cases15 points
ToolSelector formats a clear prompt with tools and user message, parses LLM JSON response for both tool_call and respond actions, handles malformed JSON gracefully as direct response20 points
Agent.run implements full conversation loop: gets tools, calls selector, handles tool calls (validate → execute → inject result), loops back for multi-step, respects max_iterations, handles errors gracefully15 points
AuditLogger implements log_tool_call, log_result, log_decision, log_error with proper AuditEntry creation (timestamp, event_type, data), and get_entries with optional filtering15 points

Checklist

0/6

Your Solution

Unlimited free attempts
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.