Back to Course|AI Agent Engineer Interviews: Design, Build & Deploy Production Agentic Systems

Lab

Build an Autonomous Task Execution Agent

45 min

Advanced

Unlimited free attempts

Instructions

In this lab, you will build a complete autonomous task execution agent in Python. Your agent will decompose goals into subtask DAGs, execute them using the ReAct pattern, detect when it is stuck, replan when execution diverges, consult humans at checkpoints, and respect execution budgets.

This is the architecture behind production coding agents, research assistants, and workflow automation systems.

Architecture Overview

User Goal
     |
TaskPlanner: decompose into subtask DAG
     |
ReactLoop: for each subtask
     |
+---> Thought -> Action -> Observation ---+
|                                         |
|    StepExecutor: invoke tool, parse     |
|                                         |
+---- SelfReflection: stuck? looping? ----+
            |
       [If stuck]
            |
       Replanner: generate new plan
            |
       HumanCheckpoint: ask if needed
            |
       BudgetManager: enforce limits
            |
       [Continue or terminate gracefully]

Step 1: Task Planner

Build a TaskPlanner that breaks a high-level goal into a directed acyclic graph (DAG) of subtasks. Each subtask has:

A unique task_id
A description of what to accomplish
A list of dependencies (other task IDs that must complete first)
A status (pending, in_progress, completed, failed)

The planner should:

Accept a goal string and produce a list of SubTask objects
Determine execution order from the dependency graph
Return which tasks are ready to execute (all dependencies satisfied)

Step 2: ReAct Loop

Build a ReactLoop that executes a single subtask using the Thought-Action-Observation cycle:

Thought: The LLM reasons about what to do next for this subtask
Action: The LLM selects and calls a tool
Observation: The tool result is captured
The cycle repeats until the LLM produces a final answer or the step limit is reached
Each cycle step is recorded in a structured history

Step 3: Step Executor

Build a StepExecutor that handles tool invocation:

Accept an action (tool name + arguments) from the ReAct loop
Look up the tool, execute it, and parse the result
Return a structured StepResult with the output, success/failure status, and execution duration
Handle tool errors gracefully (missing tools, execution failures, timeouts)

Step 4: Self-Reflection

Build a SelfReflection module that evaluates execution progress:

Analyze the history of Thought/Action/Observation steps
Detect stuck states: repeated identical actions, oscillating between two actions, no progress
Detect goal drift: actions are diverging from the subtask objective
Return a ReflectionResult with a diagnosis and recommendation (continue, replan, or escalate)

Step 5: Replanner

Build a Replanner that generates a new plan when the original fails:

Accept the original plan, failed steps, and the reflection diagnosis
Generate a revised plan that avoids the failed approaches
Track a plan diff: what changed between the original and revised plans
Maintain a history of plan revisions to prevent cycling between plans

Step 6: Human Checkpoint

Build a HumanCheckpoint module that pauses execution for human input:

Evaluate whether a checkpoint is needed based on: confidence score, action category (irreversible?), and number of replans so far
Format a clear summary of the current state for the human
Accept human decisions: approve, modify plan, or abort
Configurable confidence threshold (default: 0.7)

Step 7: Budget Manager

Build a BudgetManager that enforces execution limits:

Track steps taken, tokens consumed, and elapsed wall-clock time
Configurable limits for each dimension (max_steps, max_tokens, max_time_seconds)
At each step, check whether any budget is exceeded
When budget runs low (within 20% of a limit), signal the agent to wrap up
When budget is exceeded, trigger graceful termination: summarize progress so far rather than failing silently

What to Submit

The editor has 7 file sections with TODO comments. Replace each TODO with your Python code. The AI grader will evaluate each section against the rubric.

Hints

For the DAG, a simple topological sort determines execution order. Use Kahn's algorithm or recursive DFS
For stuck detection, compare the last N actions in history: if they repeat, the agent is looping
For the plan diff, store plans as lists and compute added/removed/modified steps
The budget manager should return a BudgetStatus enum: OK, WARNING, EXCEEDED
Use time.time() for wall-clock tracking and a simple counter for step/token budgets

Grading Rubric

TaskPlanner decomposes goals into SubTask DAGs with plan_goal (LLM-based decomposition with JSON parsing), get_ready_tasks (returns tasks with all dependencies completed), and topological_sort (valid ordering with cycle detection via Kahn's algorithm or DFS)20 points

ReactLoop implements the full Thought-Action-Observation cycle with run method (loops up to max_steps, parses LLM JSON for thought/action/args, handles finish action, calls execute_action, records ReactStep history) and format_history for prompt building20 points

SelfReflection detects stuck states (repeated identical actions, oscillation between two actions, identical observations) and goal drift (keywords missing from observations, optional LLM-based analysis), returning ReflectionResult with appropriate recommendation and confidence15 points

Replanner generates revised plans via LLM with replan method (includes original plan, failures, diagnosis, and revision history in prompt), compute_diff (identifies added/removed/modified tasks), and get_revision_summary (human-readable history to prevent plan cycling)15 points

HumanCheckpoint implements should_checkpoint (checks confidence threshold, irreversible actions, replan count), request_checkpoint (formats message, calls prompt_human, parses response to HumanDecision), and format_checkpoint_message with clear status summary10 points

BudgetManager tracks steps/tokens/time with record_step, check_budget (returns EXCEEDED/WARNING/OK with BudgetReport), format_progress_report (percentage-based summary with warnings), and get_remaining (dict of remaining budget per dimension)10 points

Progress reporting throughout: ReactLoop logs each step with step_number and status, StepExecutor records duration_ms, BudgetManager produces per-step budget reports, and all modules provide human-readable status messages suitable for logging10 points

Checklist

0/7