Back to Course|AI Agent Engineer Interviews: Design, Build & Deploy Production Agentic Systems
Lab

Build an Autonomous Task Execution Agent

45 min
Advanced
Unlimited free attempts

Instructions

In this lab, you will build a complete autonomous task execution agent in Python. Your agent will decompose goals into subtask DAGs, execute them using the ReAct pattern, detect when it is stuck, replan when execution diverges, consult humans at checkpoints, and respect execution budgets.

This is the architecture behind production coding agents, research assistants, and workflow automation systems.

Architecture Overview

User Goal
     |
TaskPlanner: decompose into subtask DAG
     |
ReactLoop: for each subtask
     |
+---> Thought -> Action -> Observation ---+
|                                         |
|    StepExecutor: invoke tool, parse     |
|                                         |
+---- SelfReflection: stuck? looping? ----+
            |
       [If stuck]
            |
       Replanner: generate new plan
            |
       HumanCheckpoint: ask if needed
            |
       BudgetManager: enforce limits
            |
       [Continue or terminate gracefully]

Step 1: Task Planner

Build a TaskPlanner that breaks a high-level goal into a directed acyclic graph (DAG) of subtasks. Each subtask has:

  • A unique task_id
  • A description of what to accomplish
  • A list of dependencies (other task IDs that must complete first)
  • A status (pending, in_progress, completed, failed)

The planner should:

  • Accept a goal string and produce a list of SubTask objects
  • Determine execution order from the dependency graph
  • Return which tasks are ready to execute (all dependencies satisfied)

Step 2: ReAct Loop

Build a ReactLoop that executes a single subtask using the Thought-Action-Observation cycle:

  • Thought: The LLM reasons about what to do next for this subtask
  • Action: The LLM selects and calls a tool
  • Observation: The tool result is captured
  • The cycle repeats until the LLM produces a final answer or the step limit is reached
  • Each cycle step is recorded in a structured history

Step 3: Step Executor

Build a StepExecutor that handles tool invocation:

  • Accept an action (tool name + arguments) from the ReAct loop
  • Look up the tool, execute it, and parse the result
  • Return a structured StepResult with the output, success/failure status, and execution duration
  • Handle tool errors gracefully (missing tools, execution failures, timeouts)

Step 4: Self-Reflection

Build a SelfReflection module that evaluates execution progress:

  • Analyze the history of Thought/Action/Observation steps
  • Detect stuck states: repeated identical actions, oscillating between two actions, no progress
  • Detect goal drift: actions are diverging from the subtask objective
  • Return a ReflectionResult with a diagnosis and recommendation (continue, replan, or escalate)

Step 5: Replanner

Build a Replanner that generates a new plan when the original fails:

  • Accept the original plan, failed steps, and the reflection diagnosis
  • Generate a revised plan that avoids the failed approaches
  • Track a plan diff: what changed between the original and revised plans
  • Maintain a history of plan revisions to prevent cycling between plans

Step 6: Human Checkpoint

Build a HumanCheckpoint module that pauses execution for human input:

  • Evaluate whether a checkpoint is needed based on: confidence score, action category (irreversible?), and number of replans so far
  • Format a clear summary of the current state for the human
  • Accept human decisions: approve, modify plan, or abort
  • Configurable confidence threshold (default: 0.7)

Step 7: Budget Manager

Build a BudgetManager that enforces execution limits:

  • Track steps taken, tokens consumed, and elapsed wall-clock time
  • Configurable limits for each dimension (max_steps, max_tokens, max_time_seconds)
  • At each step, check whether any budget is exceeded
  • When budget runs low (within 20% of a limit), signal the agent to wrap up
  • When budget is exceeded, trigger graceful termination: summarize progress so far rather than failing silently

What to Submit

The editor has 7 file sections with TODO comments. Replace each TODO with your Python code. The AI grader will evaluate each section against the rubric.

Hints

  • For the DAG, a simple topological sort determines execution order. Use Kahn's algorithm or recursive DFS
  • For stuck detection, compare the last N actions in history: if they repeat, the agent is looping
  • For the plan diff, store plans as lists and compute added/removed/modified steps
  • The budget manager should return a BudgetStatus enum: OK, WARNING, EXCEEDED
  • Use time.time() for wall-clock tracking and a simple counter for step/token budgets

Grading Rubric

TaskPlanner decomposes goals into SubTask DAGs with plan_goal (LLM-based decomposition with JSON parsing), get_ready_tasks (returns tasks with all dependencies completed), and topological_sort (valid ordering with cycle detection via Kahn's algorithm or DFS)20 points
ReactLoop implements the full Thought-Action-Observation cycle with run method (loops up to max_steps, parses LLM JSON for thought/action/args, handles finish action, calls execute_action, records ReactStep history) and format_history for prompt building20 points
SelfReflection detects stuck states (repeated identical actions, oscillation between two actions, identical observations) and goal drift (keywords missing from observations, optional LLM-based analysis), returning ReflectionResult with appropriate recommendation and confidence15 points
Replanner generates revised plans via LLM with replan method (includes original plan, failures, diagnosis, and revision history in prompt), compute_diff (identifies added/removed/modified tasks), and get_revision_summary (human-readable history to prevent plan cycling)15 points
HumanCheckpoint implements should_checkpoint (checks confidence threshold, irreversible actions, replan count), request_checkpoint (formats message, calls prompt_human, parses response to HumanDecision), and format_checkpoint_message with clear status summary10 points
BudgetManager tracks steps/tokens/time with record_step, check_budget (returns EXCEEDED/WARNING/OK with BudgetReport), format_progress_report (percentage-based summary with warnings), and get_remaining (dict of remaining budget per dimension)10 points
Progress reporting throughout: ReactLoop logs each step with step_number and status, StepExecutor records duration_ms, BudgetManager produces per-step budget reports, and all modules provide human-readable status messages suitable for logging10 points

Checklist

0/7

Your Solution

Unlimited free attempts
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.