Multi-Agent Systems with LangGraph
Hierarchical Teams & Nested Subgraphs
When Single Supervisor Isn't Enough
Real Production Scenario (January 2026):
An enterprise SaaS company needed to handle complex customer support tickets that required multiple specialized teams: a research team to find relevant documentation, a technical team to diagnose issues, and a communication team to draft responses. A single supervisor with 15+ workers became unmanageable. By restructuring into hierarchical teams with sub-supervisors, they achieved:
- 40% reduction in response time
- 85% first-contact resolution rate
- Clear ownership and accountability per team
- Independent team updates without system-wide redeployment
This lesson teaches you: How to build hierarchical multi-agent systems where supervisors manage teams of workers, and how to use LangGraph subgraphs to encapsulate team logic.
Hierarchical Architecture
In a hierarchical system:
- Main supervisor coordinates high-level strategy
- Team supervisors manage their specialized workers
- Workers execute specific tasks
- Results flow up through the hierarchy
┌────────────────────┐
│ MAIN SUPERVISOR │
│ (Strategic Router) │
└─────────┬──────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ RESEARCH TEAM│ │TECHNICAL TEAM│ │ COMMS TEAM │
│ Supervisor │ │ Supervisor │ │ Supervisor │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
┌───┼───┐ ┌───┼───┐ ┌───┼───┐
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
Doc Web API Debug Test Fix Draft Edit Review
Key Insight: Each team is a complete subgraph with its own supervisor pattern. The main supervisor only interacts with team supervisors, not individual workers.
Building Team Subgraphs
Each team is implemented as a self-contained subgraph:
from typing import TypedDict, Annotated, List, Optional
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
import operator
# =============================================================================
# Research Team Subgraph
# =============================================================================
class ResearchTeamState(TypedDict):
"""State for research team operations."""
query: str
context: Optional[dict]
# Worker outputs
doc_results: Optional[str]
web_results: Optional[str]
api_results: Optional[str]
# Team output
research_summary: Optional[str]
confidence: float
def research_supervisor(state: ResearchTeamState) -> dict:
"""Research team supervisor decides which worker to invoke."""
query = state["query"]
# Determine what kind of research is needed
if "documentation" in query.lower() or "docs" in query.lower():
return {"next_worker": "doc_searcher"}
elif "api" in query.lower() or "endpoint" in query.lower():
return {"next_worker": "api_researcher"}
else:
return {"next_worker": "web_searcher"}
def doc_searcher(state: ResearchTeamState) -> dict:
"""Search internal documentation."""
llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke(f"Search documentation for: {state['query']}")
return {"doc_results": response.content}
def web_searcher(state: ResearchTeamState) -> dict:
"""Search web for information."""
llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke(f"Search web for: {state['query']}")
return {"web_results": response.content}
def api_researcher(state: ResearchTeamState) -> dict:
"""Research API endpoints and usage."""
llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke(f"Research API for: {state['query']}")
return {"api_results": response.content}
def research_synthesizer(state: ResearchTeamState) -> dict:
"""Synthesize all research into summary."""
llm = ChatOpenAI(model="gpt-4o")
all_results = []
if state.get("doc_results"):
all_results.append(f"Documentation: {state['doc_results']}")
if state.get("web_results"):
all_results.append(f"Web: {state['web_results']}")
if state.get("api_results"):
all_results.append(f"API: {state['api_results']}")
combined = "\n\n".join(all_results)
response = llm.invoke(f"Synthesize this research:\n{combined}")
return {
"research_summary": response.content,
"confidence": 0.85
}
def route_research_worker(state: ResearchTeamState) -> str:
return state.get("next_worker", "web_searcher")
def create_research_team() -> StateGraph:
"""Create the research team subgraph."""
graph = StateGraph(ResearchTeamState)
# Add nodes
graph.add_node("supervisor", research_supervisor)
graph.add_node("doc_searcher", doc_searcher)
graph.add_node("web_searcher", web_searcher)
graph.add_node("api_researcher", api_researcher)
graph.add_node("synthesizer", research_synthesizer)
# Supervisor routes to workers
graph.add_conditional_edges(
"supervisor",
route_research_worker,
{
"doc_searcher": "doc_searcher",
"web_searcher": "web_searcher",
"api_researcher": "api_researcher"
}
)
# All workers go to synthesizer
graph.add_edge("doc_searcher", "synthesizer")
graph.add_edge("web_searcher", "synthesizer")
graph.add_edge("api_researcher", "synthesizer")
# Synthesizer ends
graph.add_edge("synthesizer", END)
graph.set_entry_point("supervisor")
# CRITICAL: Compile WITHOUT checkpointer
return graph.compile()
# =============================================================================
# Technical Team Subgraph
# =============================================================================
class TechnicalTeamState(TypedDict):
issue: str
context: Optional[dict]
# Worker outputs
diagnosis: Optional[str]
test_results: Optional[str]
fix_proposal: Optional[str]
# Team output
technical_summary: Optional[str]
severity: str
def create_technical_team() -> StateGraph:
"""Create the technical team subgraph."""
# Similar structure to research team...
graph = StateGraph(TechnicalTeamState)
graph.add_node("supervisor", technical_supervisor)
graph.add_node("debugger", debugger_node)
graph.add_node("tester", tester_node)
graph.add_node("fixer", fixer_node)
graph.add_node("synthesizer", technical_synthesizer)
# ... edges ...
return graph.compile() # NO checkpointer
# =============================================================================
# Communication Team Subgraph
# =============================================================================
class CommsTeamState(TypedDict):
content: str
tone: str
audience: str
# Worker outputs
draft: Optional[str]
edited: Optional[str]
reviewed: Optional[str]
# Team output
final_message: Optional[str]
def create_comms_team() -> StateGraph:
"""Create the communications team subgraph."""
graph = StateGraph(CommsTeamState)
graph.add_node("supervisor", comms_supervisor)
graph.add_node("drafter", drafter_node)
graph.add_node("editor", editor_node)
graph.add_node("reviewer", reviewer_node)
graph.add_node("finalizer", finalizer_node)
# ... edges ...
return graph.compile() # NO checkpointer
Main Supervisor with Team Nodes
The main graph treats each team as a single node:
# =============================================================================
# Main Hierarchical Graph
# =============================================================================
class MainState(TypedDict):
"""State for the main supervisor."""
# Input
customer_request: str
ticket_id: str
# Team assignments
teams_needed: List[str]
current_team: Optional[str]
# Team results (accumulate from all teams)
team_results: Annotated[List[dict], operator.add]
# Control
iteration: int
max_iterations: int
# Output
final_response: Optional[str]
resolution_status: str
# Instantiate team subgraphs
research_team = create_research_team()
technical_team = create_technical_team()
comms_team = create_comms_team()
def main_supervisor(state: MainState) -> dict:
"""
Main supervisor decides which team to engage.
Only coordinates teams, never individual workers.
"""
llm = ChatOpenAI(model="gpt-4o", temperature=0)
request = state["customer_request"]
completed_teams = [r["team"] for r in state.get("team_results", [])]
# Determine which teams are needed
prompt = f"""Analyze this customer request and decide which team to engage next.
REQUEST: {request}
TEAMS ALREADY CONSULTED: {completed_teams}
AVAILABLE TEAMS:
- research_team: Find documentation, articles, API references
- technical_team: Diagnose issues, test solutions, propose fixes
- comms_team: Draft customer-facing response
Respond with the team name or 'synthesize' if all needed work is done."""
response = llm.invoke(prompt)
next_team = response.content.strip().lower()
if next_team not in ["research_team", "technical_team", "comms_team"]:
next_team = "synthesize"
return {
"current_team": next_team,
"iteration": state.get("iteration", 0) + 1
}
def research_team_node(state: MainState) -> dict:
"""Execute research team subgraph."""
# Prepare input for research team
team_input = {
"query": state["customer_request"],
"context": {"ticket_id": state["ticket_id"]}
}
# Run the subgraph
result = research_team.invoke(team_input)
# Return result in main state format
return {
"team_results": [{
"team": "research_team",
"summary": result.get("research_summary", ""),
"confidence": result.get("confidence", 0.5),
"raw_results": {
"docs": result.get("doc_results"),
"web": result.get("web_results"),
"api": result.get("api_results")
}
}]
}
def technical_team_node(state: MainState) -> dict:
"""Execute technical team subgraph."""
# Get context from previous research if available
research_context = None
for r in state.get("team_results", []):
if r["team"] == "research_team":
research_context = r["summary"]
team_input = {
"issue": state["customer_request"],
"context": {
"ticket_id": state["ticket_id"],
"research": research_context
}
}
result = technical_team.invoke(team_input)
return {
"team_results": [{
"team": "technical_team",
"summary": result.get("technical_summary", ""),
"severity": result.get("severity", "medium"),
"fix": result.get("fix_proposal")
}]
}
def comms_team_node(state: MainState) -> dict:
"""Execute communications team subgraph."""
# Gather all previous team results for context
all_context = []
for r in state.get("team_results", []):
all_context.append(f"{r['team']}: {r['summary']}")
team_input = {
"content": "\n".join(all_context),
"tone": "professional and helpful",
"audience": "customer"
}
result = comms_team.invoke(team_input)
return {
"team_results": [{
"team": "comms_team",
"message": result.get("final_message", ""),
"draft": result.get("draft")
}]
}
def final_synthesizer(state: MainState) -> dict:
"""Create final response from all team contributions."""
llm = ChatOpenAI(model="gpt-4o")
team_contributions = []
for r in state.get("team_results", []):
team_contributions.append(f"## {r['team'].upper()}\n{r.get('summary', r.get('message', ''))}")
prompt = f"""Create a final customer response based on team contributions:
{chr(10).join(team_contributions)}
Create a professional, helpful response that addresses the customer's needs."""
response = llm.invoke(prompt)
return {
"final_response": response.content,
"resolution_status": "resolved"
}
def route_to_team(state: MainState) -> str:
"""Route to appropriate team based on supervisor decision."""
return state.get("current_team", "synthesize")
# =============================================================================
# Build Main Graph
# =============================================================================
def build_hierarchical_system():
"""Build the complete hierarchical multi-agent system."""
graph = StateGraph(MainState)
# Add main supervisor
graph.add_node("main_supervisor", main_supervisor)
# Add team nodes (each wraps a subgraph)
graph.add_node("research_team", research_team_node)
graph.add_node("technical_team", technical_team_node)
graph.add_node("comms_team", comms_team_node)
graph.add_node("synthesize", final_synthesizer)
# Entry point
graph.set_entry_point("main_supervisor")
# Supervisor routes to teams
graph.add_conditional_edges(
"main_supervisor",
route_to_team,
{
"research_team": "research_team",
"technical_team": "technical_team",
"comms_team": "comms_team",
"synthesize": "synthesize"
}
)
# Teams return to supervisor for next decision
graph.add_edge("research_team", "main_supervisor")
graph.add_edge("technical_team", "main_supervisor")
graph.add_edge("comms_team", "main_supervisor")
# Synthesize ends the workflow
graph.add_edge("synthesize", END)
# CRITICAL: Only parent graph has checkpointer
from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver.from_conn_string("postgresql://...")
return graph.compile(checkpointer=checkpointer)
Critical: Parent-Only Checkpointing (January 2026)
This is one of the most important patterns in hierarchical LangGraph systems.
# ============================================================================
# CORRECT: Only parent has checkpointer
# ============================================================================
# Subgraphs - NO checkpointer
research_team = create_research_team() # .compile() with no args
technical_team = create_technical_team() # .compile() with no args
# Parent graph - HAS checkpointer
main_app = main_graph.compile(checkpointer=PostgresSaver(...))
# ============================================================================
# WRONG: Nested checkpointers cause problems
# ============================================================================
# DON'T DO THIS
research_team = create_research_team().compile(
checkpointer=PostgresSaver(...) # BAD - creates conflicts
)
Why Parent-Only Checkpointing:
- State Consistency: Parent checkpointer captures complete state including all subgraph results at each step
- No Race Conditions: Multiple checkpointers would try to save state independently
- Single Source of Truth: Resume and time-travel work correctly with unified state
- Simpler Debugging: One place to inspect workflow history
Parallel Team Execution
For independent teams, use the Send API:
from langgraph.constants import Send
class ParallelMainState(TypedDict):
request: str
team_results: Annotated[List[dict], operator.add]
final_response: Optional[str]
def dispatch_to_all_teams(state: ParallelMainState) -> List[Send]:
"""Launch all teams in parallel."""
return [
Send("research_team", {
"query": state["request"],
"context": {}
}),
Send("technical_team", {
"issue": state["request"],
"context": {}
}),
Send("comms_team", {
"content": state["request"],
"tone": "professional",
"audience": "customer"
})
]
def collect_and_synthesize(state: ParallelMainState) -> dict:
"""Collect all parallel results and create final response."""
results = state.get("team_results", [])
# All teams ran in parallel, results collected via operator.add
synthesis = f"Collected results from {len(results)} teams"
return {"final_response": synthesis}
# Build parallel graph
graph = StateGraph(ParallelMainState)
graph.add_node("dispatcher", lambda s: {}) # Placeholder
graph.add_node("research_team", research_team_node)
graph.add_node("technical_team", technical_team_node)
graph.add_node("comms_team", comms_team_node)
graph.add_node("synthesizer", collect_and_synthesize)
# Fan out to all teams
graph.add_conditional_edges("dispatcher", dispatch_to_all_teams)
# All teams converge to synthesizer
graph.add_edge("research_team", "synthesizer")
graph.add_edge("technical_team", "synthesizer")
graph.add_edge("comms_team", "synthesizer")
graph.add_edge("synthesizer", END)
graph.set_entry_point("dispatcher")
State Passing Patterns
Pattern 1: Context Injection
def team_node_with_context(state: MainState) -> dict:
"""Pass context from main state to team."""
team_input = {
"query": state["request"],
"context": {
"user_id": state.get("user_id"),
"previous_results": state.get("team_results", []),
"preferences": state.get("user_preferences", {})
}
}
result = team_subgraph.invoke(team_input)
return {"team_results": [{"team": "name", "result": result}]}
Pattern 2: Result Transformation
def transform_team_result(state: MainState) -> dict:
"""Transform team output to main state format."""
team_input = {"query": state["request"]}
raw_result = team_subgraph.invoke(team_input)
# Transform to standardized format
return {
"team_results": [{
"team": "research",
"summary": raw_result.get("research_summary"),
"confidence": raw_result.get("confidence", 0.5),
"metadata": {
"sources": raw_result.get("sources", []),
"timestamp": datetime.now().isoformat()
}
}]
}
Error Handling Across Hierarchy
def robust_team_node(state: MainState) -> dict:
"""Team node with comprehensive error handling."""
try:
team_input = {"query": state["request"]}
result = team_subgraph.invoke(team_input)
return {
"team_results": [{
"team": "research",
"status": "success",
"result": result
}]
}
except TimeoutError:
return {
"team_results": [{
"team": "research",
"status": "timeout",
"error": "Team execution timed out"
}]
}
except Exception as e:
return {
"team_results": [{
"team": "research",
"status": "failed",
"error": str(e)
}]
}
def main_supervisor_handles_failures(state: MainState) -> dict:
"""Main supervisor adapts to team failures."""
results = state.get("team_results", [])
# Check for failures
failures = [r for r in results if r.get("status") == "failed"]
if failures:
failed_team = failures[-1]["team"]
# Decide: retry, skip, or use alternative
if state["iteration"] < 3:
return {"current_team": failed_team} # Retry
else:
return {"current_team": "synthesize"} # Skip and finish
return normal_routing(state)
Common Interview Questions
Q1: "Why use hierarchical teams instead of a flat supervisor pattern?"
Strong Answer:
"Hierarchical teams are better when you have: (1) Distinct domains of expertise that require specialized coordination - a research team operates differently than a technical team. (2) More than 5-7 workers - cognitive load on a single supervisor becomes too high. (3) Need for independent team updates - you can modify the research team without touching technical team. (4) Clear team boundaries - when workers naturally group by function. Flat is simpler and better for smaller systems with cross-cutting concerns."
Q2: "Why must checkpointing be parent-only in hierarchical systems?"
Answer:
"Three critical reasons: (1) State consistency - the parent checkpointer captures the complete state including all subgraph results at each main step. (2) No race conditions - multiple checkpointers would try to save state independently, creating conflicts. (3) Single source of truth - resume and time-travel debugging work correctly because there's one unified state history. Nested checkpointers would create fragmented, potentially inconsistent state."
Q3: "How do you handle a team that consistently fails?"
Answer:
"Implement circuit breaker at the main supervisor level. Track team failures in state. After N consecutive failures, mark that team as 'degraded' and either: (1) Route to an alternative team if available, (2) Skip that team and synthesize with partial results, (3) Escalate to human operator. Never let one team's issues block the entire workflow. Log all failures for post-mortem analysis."
Q4: "When would you use parallel vs sequential team execution?"
Answer:
"Parallel when teams are independent - research, technical analysis, and initial draft can all run simultaneously. Sequential when there are dependencies - comms team needs research results before drafting. Hybrid approach: run independent teams in parallel first, then sequence dependent teams. Use Send API for parallel, edge chains for sequential."
Key Takeaways
- Hierarchical teams = main supervisor + team supervisors + workers
- Subgraphs encapsulate each team's complete logic
- Parent-only checkpointing is critical for state consistency
- State passed explicitly between hierarchy levels
- Send API enables parallel team execution
- Error handling at each level with graceful degradation
- Team nodes wrap subgraphs - main graph sees teams as single nodes
Next: Learn inter-agent communication patterns for complex message passing in Lesson 3.
:::