Multi-Agent Systems with LangGraph
The Supervisor Pattern
Why Multi-Agent Systems Matter
Real Production Scenario (January 2026):
A fintech company needed to analyze loan applications with multiple specialized checks: credit score verification, income analysis, fraud detection, and regulatory compliance. A single monolithic agent became unmanageable at 5,000 lines of code. By refactoring to a supervisor pattern with 4 specialized sub-agents, they reduced code complexity by 60%, improved accuracy by 25%, and could independently update each specialist without redeploying the entire system.
This lesson teaches you: How to implement the supervisor pattern in pure LangGraph for coordinating multiple specialized agents in production workflows.
What is the Supervisor Pattern?
The supervisor pattern is a multi-agent architecture where:
- One supervisor agent coordinates the workflow
- Multiple specialist agents handle specific tasks
- The supervisor decides which agent to invoke next based on task requirements
- Results flow back to the supervisor for final synthesis
┌─────────────────┐
│ SUPERVISOR │
│ (Coordinator) │
└────────┬────────┘
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Research │ │ Analysis │ │ Writer │
│ Agent │ │ Agent │ │ Agent │
└──────────┘ └──────────┘ └──────────┘
Key Insight: The supervisor never does the actual work. It only coordinates, routes, and quality-checks. This separation is crucial for maintainability.
Production Supervisor Implementation (LangGraph 1.0.5)
from typing import TypedDict, Annotated, Literal, List, Optional
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
import operator
# -----------------------------------------------------------------------------
# State Definition
# -----------------------------------------------------------------------------
class SupervisorState(TypedDict):
"""State shared between supervisor and all agents."""
# Input
task: str
# Supervisor decisions
next_agent: Optional[str]
agent_instructions: Optional[str]
# Agent outputs (accumulate from all agents)
agent_outputs: Annotated[List[dict], operator.add]
# Workflow control
completed_agents: Annotated[List[str], operator.add]
iteration: int
max_iterations: int
# Final output
final_response: Optional[str]
# -----------------------------------------------------------------------------
# LLM Setup
# -----------------------------------------------------------------------------
supervisor_llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)
# -----------------------------------------------------------------------------
# Supervisor Node
# -----------------------------------------------------------------------------
def supervisor_node(state: SupervisorState) -> dict:
"""
The supervisor analyzes the task and decides which agent to invoke next.
This is the 'brain' of the multi-agent system.
"""
task = state["task"]
completed = state.get("completed_agents", [])
outputs = state.get("agent_outputs", [])
iteration = state.get("iteration", 0)
max_iterations = state.get("max_iterations", 10)
# Check termination conditions
if iteration >= max_iterations:
return {
"next_agent": "synthesizer",
"agent_instructions": "Maximum iterations reached. Synthesize available results.",
"iteration": iteration + 1
}
# Build context from previous outputs
context = ""
for output in outputs:
context += f"\n[{output['agent']}]: {output['result'][:500]}..."
# Supervisor prompt
supervisor_prompt = f"""You are a supervisor coordinating a multi-agent research team.
TASK: {task}
COMPLETED AGENTS: {completed}
PREVIOUS OUTPUTS:
{context if context else "None yet"}
AVAILABLE AGENTS:
- researcher: Gathers information and facts
- analyst: Analyzes data and identifies patterns
- writer: Synthesizes findings into coherent response
- synthesizer: Creates final output (use when all needed work is done)
RULES:
1. Don't repeat agents unless necessary
2. Choose 'synthesizer' when you have enough information
3. Provide clear instructions for the chosen agent
Respond in this exact format:
NEXT_AGENT: [agent_name]
INSTRUCTIONS: [specific instructions for that agent]"""
response = supervisor_llm.invoke(supervisor_prompt)
content = response.content
# Parse response
lines = content.strip().split("\n")
next_agent = "synthesizer" # Default to end
instructions = ""
for line in lines:
if line.startswith("NEXT_AGENT:"):
next_agent = line.replace("NEXT_AGENT:", "").strip().lower()
elif line.startswith("INSTRUCTIONS:"):
instructions = line.replace("INSTRUCTIONS:", "").strip()
# Validate agent name
valid_agents = ["researcher", "analyst", "writer", "synthesizer"]
if next_agent not in valid_agents:
next_agent = "synthesizer"
return {
"next_agent": next_agent,
"agent_instructions": instructions,
"iteration": iteration + 1
}
# -----------------------------------------------------------------------------
# Specialist Agent Nodes
# -----------------------------------------------------------------------------
def researcher_node(state: SupervisorState) -> dict:
"""Research agent gathers information."""
task = state["task"]
instructions = state.get("agent_instructions", "")
prompt = f"""You are a research specialist.
MAIN TASK: {task}
SPECIFIC INSTRUCTIONS: {instructions}
Gather relevant information, facts, and data. Be thorough and cite sources when possible.
Provide your findings in a structured format."""
response = agent_llm.invoke(prompt)
return {
"agent_outputs": [{
"agent": "researcher",
"result": response.content,
"instructions_received": instructions
}],
"completed_agents": ["researcher"]
}
def analyst_node(state: SupervisorState) -> dict:
"""Analysis agent identifies patterns and insights."""
task = state["task"]
instructions = state.get("agent_instructions", "")
previous_outputs = state.get("agent_outputs", [])
# Get research outputs for analysis
research_context = ""
for output in previous_outputs:
if output["agent"] == "researcher":
research_context += output["result"] + "\n"
prompt = f"""You are an analysis specialist.
MAIN TASK: {task}
SPECIFIC INSTRUCTIONS: {instructions}
RESEARCH DATA TO ANALYZE:
{research_context if research_context else "No prior research available."}
Analyze the data, identify patterns, draw insights, and highlight key findings.
Provide your analysis in a structured format."""
response = agent_llm.invoke(prompt)
return {
"agent_outputs": [{
"agent": "analyst",
"result": response.content,
"instructions_received": instructions
}],
"completed_agents": ["analyst"]
}
def writer_node(state: SupervisorState) -> dict:
"""Writer agent synthesizes findings into coherent text."""
task = state["task"]
instructions = state.get("agent_instructions", "")
previous_outputs = state.get("agent_outputs", [])
# Compile all previous outputs
context = ""
for output in previous_outputs:
context += f"\n## {output['agent'].upper()} OUTPUT:\n{output['result']}\n"
prompt = f"""You are a writing specialist.
MAIN TASK: {task}
SPECIFIC INSTRUCTIONS: {instructions}
TEAM OUTPUTS:
{context if context else "No prior outputs available."}
Write a clear, coherent synthesis of the findings. Focus on clarity and actionability."""
response = agent_llm.invoke(prompt)
return {
"agent_outputs": [{
"agent": "writer",
"result": response.content,
"instructions_received": instructions
}],
"completed_agents": ["writer"]
}
def synthesizer_node(state: SupervisorState) -> dict:
"""Final synthesis node - creates the final response."""
task = state["task"]
previous_outputs = state.get("agent_outputs", [])
# Compile all outputs
context = ""
for output in previous_outputs:
context += f"\n## {output['agent'].upper()}:\n{output['result']}\n"
prompt = f"""Create a final, comprehensive response to this task:
TASK: {task}
TEAM CONTRIBUTIONS:
{context}
Synthesize all contributions into a single, well-structured final response.
Ensure nothing important is missed and the response directly addresses the original task."""
response = supervisor_llm.invoke(prompt)
return {
"final_response": response.content,
"completed_agents": ["synthesizer"]
}
# -----------------------------------------------------------------------------
# Routing Logic
# -----------------------------------------------------------------------------
def route_supervisor(state: SupervisorState) -> str:
"""Route to the next agent based on supervisor decision."""
next_agent = state.get("next_agent", "synthesizer")
routing = {
"researcher": "researcher",
"analyst": "analyst",
"writer": "writer",
"synthesizer": "synthesizer"
}
return routing.get(next_agent, "synthesizer")
# -----------------------------------------------------------------------------
# Build the Graph
# -----------------------------------------------------------------------------
def build_supervisor_graph():
"""Build the complete supervisor multi-agent graph."""
graph = StateGraph(SupervisorState)
# Add all nodes
graph.add_node("supervisor", supervisor_node)
graph.add_node("researcher", researcher_node)
graph.add_node("analyst", analyst_node)
graph.add_node("writer", writer_node)
graph.add_node("synthesizer", synthesizer_node)
# Entry point
graph.set_entry_point("supervisor")
# Supervisor routes to agents
graph.add_conditional_edges(
"supervisor",
route_supervisor,
{
"researcher": "researcher",
"analyst": "analyst",
"writer": "writer",
"synthesizer": "synthesizer"
}
)
# All agents return to supervisor (except synthesizer)
graph.add_edge("researcher", "supervisor")
graph.add_edge("analyst", "supervisor")
graph.add_edge("writer", "supervisor")
# Synthesizer ends the workflow
graph.add_edge("synthesizer", END)
return graph.compile()
# -----------------------------------------------------------------------------
# Usage
# -----------------------------------------------------------------------------
if __name__ == "__main__":
app = build_supervisor_graph()
result = app.invoke({
"task": "Analyze the impact of AI on software development productivity in 2026",
"agent_outputs": [],
"completed_agents": [],
"iteration": 0,
"max_iterations": 10
})
print(result["final_response"])
Key Supervisor Pattern Principles
| Principle | Implementation |
|---|---|
| Single coordinator | One supervisor node makes all routing decisions |
| Specialist isolation | Each agent has clear, bounded responsibility |
| Shared state | All agents read/write to common state structure |
| Iteration control | Max iterations prevent infinite loops |
| Result accumulation | operator.add reducer collects all agent outputs |
Supervisor Decision Strategies
Strategy 1: Sequential Pipeline
# Supervisor always routes: researcher -> analyst -> writer -> synthesizer
def sequential_supervisor(state):
completed = state.get("completed_agents", [])
if "researcher" not in completed:
return {"next_agent": "researcher"}
elif "analyst" not in completed:
return {"next_agent": "analyst"}
elif "writer" not in completed:
return {"next_agent": "writer"}
return {"next_agent": "synthesizer"}
Strategy 2: LLM-Driven (Dynamic)
# Supervisor uses LLM to decide based on task and context
# (Shown in main implementation above)
# Best for complex tasks where routing depends on content
Strategy 3: Conditional Routing
# Route based on task type keywords
def conditional_supervisor(state):
task = state["task"].lower()
if "analyze" in task or "data" in task:
return {"next_agent": "analyst"}
elif "write" in task or "report" in task:
return {"next_agent": "writer"}
return {"next_agent": "researcher"}
Quality Checking in Supervisor
Production supervisors should validate agent outputs before proceeding:
def supervisor_with_quality_check(state: SupervisorState) -> dict:
"""Supervisor that validates agent outputs."""
outputs = state.get("agent_outputs", [])
# Check last agent's output quality
if outputs:
last_output = outputs[-1]
result = last_output.get("result", "")
# Quality checks
if len(result) < 100:
# Output too short - retry with more specific instructions
return {
"next_agent": last_output["agent"],
"agent_instructions": f"Previous output was too brief. Please provide more detail. Original task: {state['task']}"
}
if "error" in result.lower() or "unable" in result.lower():
# Agent reported difficulty - try alternative approach
return {
"next_agent": "researcher",
"agent_instructions": "Previous approach had issues. Try a different research strategy."
}
# Proceed with normal routing
return normal_routing_logic(state)
Adding Custom Specialist Agents
def create_specialist_agent(
name: str,
system_prompt: str,
llm: ChatOpenAI = None
) -> callable:
"""Factory function to create specialist agents."""
llm = llm or ChatOpenAI(model="gpt-4o-mini", temperature=0.3)
def agent_node(state: SupervisorState) -> dict:
task = state["task"]
instructions = state.get("agent_instructions", "")
previous_outputs = state.get("agent_outputs", [])
# Build context from previous outputs
context = "\n".join([
f"[{o['agent']}]: {o['result'][:500]}"
for o in previous_outputs
])
prompt = f"""{system_prompt}
TASK: {task}
INSTRUCTIONS: {instructions}
CONTEXT: {context}"""
response = llm.invoke(prompt)
return {
"agent_outputs": [{
"agent": name,
"result": response.content,
"instructions_received": instructions
}],
"completed_agents": [name]
}
return agent_node
# Create custom specialists
code_reviewer = create_specialist_agent(
name="code_reviewer",
system_prompt="You are a senior code reviewer. Analyze code for bugs, security issues, and best practices."
)
security_analyst = create_specialist_agent(
name="security_analyst",
system_prompt="You are a security specialist. Identify vulnerabilities and recommend mitigations."
)
# Add to graph
graph.add_node("code_reviewer", code_reviewer)
graph.add_node("security_analyst", security_analyst)
Error Handling in Supervisor Pattern
def robust_agent_node(state: SupervisorState) -> dict:
"""Agent with comprehensive error handling."""
try:
# Normal agent logic
response = agent_llm.invoke(prompt)
return {
"agent_outputs": [{
"agent": "researcher",
"result": response.content,
"status": "success"
}],
"completed_agents": ["researcher"]
}
except Exception as e:
# Record failure but don't crash workflow
return {
"agent_outputs": [{
"agent": "researcher",
"result": f"Error: {str(e)}",
"status": "failed",
"error_type": type(e).__name__
}],
"completed_agents": ["researcher"] # Mark as attempted
}
def supervisor_handles_failures(state: SupervisorState) -> dict:
"""Supervisor that handles agent failures gracefully."""
outputs = state.get("agent_outputs", [])
# Check for failures
failed_agents = [o for o in outputs if o.get("status") == "failed"]
if failed_agents:
last_failure = failed_agents[-1]
# Decide: retry, skip, or use alternative
if state["iteration"] < 3: # Retry up to 3 times
return {
"next_agent": last_failure["agent"],
"agent_instructions": "Previous attempt failed. Please try again with a simpler approach."
}
else:
# Skip and continue with what we have
return {
"next_agent": "synthesizer",
"agent_instructions": "Some agents failed. Synthesize with available results."
}
return normal_routing(state)
Common Interview Questions
Q1: "Why use the supervisor pattern instead of a single agent with tools?"
Strong Answer:
"The supervisor pattern is better when you have complex tasks requiring specialized expertise. A single agent with tools becomes unwieldy with 20+ tools and loses focus. The supervisor pattern provides: (1) Clear separation of concerns - each specialist excels at one thing, (2) Independent scaling and updating of agents, (3) Better observability - you can trace which agent did what, (4) Easier testing - test each specialist in isolation. Use a single agent for simple tasks; use supervisor for complex multi-step workflows."
Q2: "How do you prevent infinite loops in a supervisor pattern?"
Answer:
"Three mechanisms: (1) Hard iteration limit via
max_iterationsin state, checked by the supervisor before each decision. (2) Trackcompleted_agentsand avoid unnecessary repetition. (3) LangGraph'srecursion_limitat compile time as a safety net. The supervisor itself should have logic to route to 'synthesizer' when enough work is done or when iterations are exhausted."
Q3: "How do you handle agent failures in production?"
Answer:
"Wrap each agent node in try-catch, recording failures in state with a 'status' field. The supervisor can then decide whether to retry with different instructions, skip to another agent, or proceed to synthesis with partial results. Never let one agent failure crash the entire workflow. Store failure details in
agent_outputsfor debugging and monitoring."
Q4: "When would you choose LLM-driven vs rule-based supervisor routing?"
Answer:
"Rule-based for predictable, well-defined workflows where the path is known in advance - it's faster and cheaper. LLM-driven for complex tasks where routing depends on content analysis, quality assessment, or dynamic conditions. In practice, use a hybrid: rule-based for obvious decisions, LLM for ambiguous cases."
Key Takeaways
- Supervisor Pattern = one coordinator + multiple specialists
- State accumulation with
operator.addpreserves all agent outputs - Iteration limits prevent infinite supervisor-agent loops
- LLM-driven routing enables dynamic task decomposition
- Factory functions make adding new specialists easy
- Each agent returns to supervisor for next decision (except final synthesizer)
- Quality checks in supervisor ensure output standards
- Error handling allows graceful degradation
Next: Learn hierarchical multi-agent teams where supervisors manage sub-supervisors in Lesson 2.
:::