Build an Autonomous Task Execution Agent
Instructions
In this lab, you will build a complete autonomous task execution agent in Python. Your agent will decompose goals into subtask DAGs, execute them using the ReAct pattern, detect when it is stuck, replan when execution diverges, consult humans at checkpoints, and respect execution budgets.
This is the architecture behind production coding agents, research assistants, and workflow automation systems.
Architecture Overview
User Goal
|
TaskPlanner: decompose into subtask DAG
|
ReactLoop: for each subtask
|
+---> Thought -> Action -> Observation ---+
| |
| StepExecutor: invoke tool, parse |
| |
+---- SelfReflection: stuck? looping? ----+
|
[If stuck]
|
Replanner: generate new plan
|
HumanCheckpoint: ask if needed
|
BudgetManager: enforce limits
|
[Continue or terminate gracefully]
Step 1: Task Planner
Build a TaskPlanner that breaks a high-level goal into a directed acyclic graph (DAG) of subtasks. Each subtask has:
- A unique
task_id - A
descriptionof what to accomplish - A list of
dependencies(other task IDs that must complete first) - A
status(pending, in_progress, completed, failed)
The planner should:
- Accept a goal string and produce a list of
SubTaskobjects - Determine execution order from the dependency graph
- Return which tasks are ready to execute (all dependencies satisfied)
Step 2: ReAct Loop
Build a ReactLoop that executes a single subtask using the Thought-Action-Observation cycle:
- Thought: The LLM reasons about what to do next for this subtask
- Action: The LLM selects and calls a tool
- Observation: The tool result is captured
- The cycle repeats until the LLM produces a final answer or the step limit is reached
- Each cycle step is recorded in a structured history
Step 3: Step Executor
Build a StepExecutor that handles tool invocation:
- Accept an action (tool name + arguments) from the ReAct loop
- Look up the tool, execute it, and parse the result
- Return a structured
StepResultwith the output, success/failure status, and execution duration - Handle tool errors gracefully (missing tools, execution failures, timeouts)
Step 4: Self-Reflection
Build a SelfReflection module that evaluates execution progress:
- Analyze the history of Thought/Action/Observation steps
- Detect stuck states: repeated identical actions, oscillating between two actions, no progress
- Detect goal drift: actions are diverging from the subtask objective
- Return a
ReflectionResultwith a diagnosis and recommendation (continue, replan, or escalate)
Step 5: Replanner
Build a Replanner that generates a new plan when the original fails:
- Accept the original plan, failed steps, and the reflection diagnosis
- Generate a revised plan that avoids the failed approaches
- Track a plan diff: what changed between the original and revised plans
- Maintain a history of plan revisions to prevent cycling between plans
Step 6: Human Checkpoint
Build a HumanCheckpoint module that pauses execution for human input:
- Evaluate whether a checkpoint is needed based on: confidence score, action category (irreversible?), and number of replans so far
- Format a clear summary of the current state for the human
- Accept human decisions: approve, modify plan, or abort
- Configurable confidence threshold (default: 0.7)
Step 7: Budget Manager
Build a BudgetManager that enforces execution limits:
- Track steps taken, tokens consumed, and elapsed wall-clock time
- Configurable limits for each dimension (max_steps, max_tokens, max_time_seconds)
- At each step, check whether any budget is exceeded
- When budget runs low (within 20% of a limit), signal the agent to wrap up
- When budget is exceeded, trigger graceful termination: summarize progress so far rather than failing silently
What to Submit
The editor has 7 file sections with TODO comments. Replace each TODO with your Python code. The AI grader will evaluate each section against the rubric.
Hints
- For the DAG, a simple topological sort determines execution order. Use Kahn's algorithm or recursive DFS
- For stuck detection, compare the last N actions in history: if they repeat, the agent is looping
- For the plan diff, store plans as lists and compute added/removed/modified steps
- The budget manager should return a
BudgetStatusenum: OK, WARNING, EXCEEDED - Use
time.time()for wall-clock tracking and a simple counter for step/token budgets