Hierarchical Teams & Nested Subgraphs

When Single Supervisor Isn't Enough

Real Production Scenario (January 2026):

An enterprise SaaS company needed to handle complex customer support tickets that required multiple specialized teams: a research team to find relevant documentation, a technical team to diagnose issues, and a communication team to draft responses. A single supervisor with 15+ workers became unmanageable. By restructuring into hierarchical teams with sub-supervisors, they achieved:

40% reduction in response time
85% first-contact resolution rate
Clear ownership and accountability per team
Independent team updates without system-wide redeployment

This lesson teaches you: How to build hierarchical multi-agent systems where supervisors manage teams of workers, and how to use LangGraph subgraphs to encapsulate team logic.

Hierarchical Architecture

In a hierarchical system:

Main supervisor coordinates high-level strategy
Team supervisors manage their specialized workers
Workers execute specific tasks
Results flow up through the hierarchy

                    ┌────────────────────┐
                    │   MAIN SUPERVISOR  │
                    │ (Strategic Router) │
                    └─────────┬──────────┘
                              │
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
    │ RESEARCH TEAM│  │TECHNICAL TEAM│  │ COMMS TEAM   │
    │  Supervisor  │  │  Supervisor  │  │  Supervisor  │
    └──────┬───────┘  └──────┬───────┘  └──────┬───────┘
           │                 │                 │
       ┌───┼───┐         ┌───┼───┐         ┌───┼───┐
       ▼   ▼   ▼         ▼   ▼   ▼         ▼   ▼   ▼
      Doc Web API      Debug Test Fix    Draft Edit Review

Key Insight: Each team is a complete subgraph with its own supervisor pattern. The main supervisor only interacts with team supervisors, not individual workers.

Building Team Subgraphs

Each team is implemented as a self-contained subgraph:

from typing import TypedDict, Annotated, List, Optional
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
import operator

# =============================================================================
# Research Team Subgraph
# =============================================================================
class ResearchTeamState(TypedDict):
    """State for research team operations."""
    query: str
    context: Optional[dict]

    # Worker outputs
    doc_results: Optional[str]
    web_results: Optional[str]
    api_results: Optional[str]

    # Team output
    research_summary: Optional[str]
    confidence: float

def research_supervisor(state: ResearchTeamState) -> dict:
    """Research team supervisor decides which worker to invoke."""
    query = state["query"]

    # Determine what kind of research is needed
    if "documentation" in query.lower() or "docs" in query.lower():
        return {"next_worker": "doc_searcher"}
    elif "api" in query.lower() or "endpoint" in query.lower():
        return {"next_worker": "api_researcher"}
    else:
        return {"next_worker": "web_searcher"}

def doc_searcher(state: ResearchTeamState) -> dict:
    """Search internal documentation."""
    llm = ChatOpenAI(model="gpt-4o-mini")
    response = llm.invoke(f"Search documentation for: {state['query']}")
    return {"doc_results": response.content}

def web_searcher(state: ResearchTeamState) -> dict:
    """Search web for information."""
    llm = ChatOpenAI(model="gpt-4o-mini")
    response = llm.invoke(f"Search web for: {state['query']}")
    return {"web_results": response.content}

def api_researcher(state: ResearchTeamState) -> dict:
    """Research API endpoints and usage."""
    llm = ChatOpenAI(model="gpt-4o-mini")
    response = llm.invoke(f"Research API for: {state['query']}")
    return {"api_results": response.content}

def research_synthesizer(state: ResearchTeamState) -> dict:
    """Synthesize all research into summary."""
    llm = ChatOpenAI(model="gpt-4o")

    all_results = []
    if state.get("doc_results"):
        all_results.append(f"Documentation: {state['doc_results']}")
    if state.get("web_results"):
        all_results.append(f"Web: {state['web_results']}")
    if state.get("api_results"):
        all_results.append(f"API: {state['api_results']}")

    combined = "\n\n".join(all_results)
    response = llm.invoke(f"Synthesize this research:\n{combined}")

    return {
        "research_summary": response.content,
        "confidence": 0.85
    }

def route_research_worker(state: ResearchTeamState) -> str:
    return state.get("next_worker", "web_searcher")

def create_research_team() -> StateGraph:
    """Create the research team subgraph."""
    graph = StateGraph(ResearchTeamState)

    # Add nodes
    graph.add_node("supervisor", research_supervisor)
    graph.add_node("doc_searcher", doc_searcher)
    graph.add_node("web_searcher", web_searcher)
    graph.add_node("api_researcher", api_researcher)
    graph.add_node("synthesizer", research_synthesizer)

    # Supervisor routes to workers
    graph.add_conditional_edges(
        "supervisor",
        route_research_worker,
        {
            "doc_searcher": "doc_searcher",
            "web_searcher": "web_searcher",
            "api_researcher": "api_researcher"
        }
    )

    # All workers go to synthesizer
    graph.add_edge("doc_searcher", "synthesizer")
    graph.add_edge("web_searcher", "synthesizer")
    graph.add_edge("api_researcher", "synthesizer")

    # Synthesizer ends
    graph.add_edge("synthesizer", END)

    graph.set_entry_point("supervisor")

    # CRITICAL: Compile WITHOUT checkpointer
    return graph.compile()

# =============================================================================
# Technical Team Subgraph
# =============================================================================
class TechnicalTeamState(TypedDict):
    issue: str
    context: Optional[dict]

    # Worker outputs
    diagnosis: Optional[str]
    test_results: Optional[str]
    fix_proposal: Optional[str]

    # Team output
    technical_summary: Optional[str]
    severity: str

def create_technical_team() -> StateGraph:
    """Create the technical team subgraph."""
    # Similar structure to research team...
    graph = StateGraph(TechnicalTeamState)

    graph.add_node("supervisor", technical_supervisor)
    graph.add_node("debugger", debugger_node)
    graph.add_node("tester", tester_node)
    graph.add_node("fixer", fixer_node)
    graph.add_node("synthesizer", technical_synthesizer)

    # ... edges ...

    return graph.compile()  # NO checkpointer

# =============================================================================
# Communication Team Subgraph
# =============================================================================
class CommsTeamState(TypedDict):
    content: str
    tone: str
    audience: str

    # Worker outputs
    draft: Optional[str]
    edited: Optional[str]
    reviewed: Optional[str]

    # Team output
    final_message: Optional[str]

def create_comms_team() -> StateGraph:
    """Create the communications team subgraph."""
    graph = StateGraph(CommsTeamState)

    graph.add_node("supervisor", comms_supervisor)
    graph.add_node("drafter", drafter_node)
    graph.add_node("editor", editor_node)
    graph.add_node("reviewer", reviewer_node)
    graph.add_node("finalizer", finalizer_node)

    # ... edges ...

    return graph.compile()  # NO checkpointer

Main Supervisor with Team Nodes

The main graph treats each team as a single node:

# =============================================================================
# Main Hierarchical Graph
# =============================================================================
class MainState(TypedDict):
    """State for the main supervisor."""
    # Input
    customer_request: str
    ticket_id: str

    # Team assignments
    teams_needed: List[str]
    current_team: Optional[str]

    # Team results (accumulate from all teams)
    team_results: Annotated[List[dict], operator.add]

    # Control
    iteration: int
    max_iterations: int

    # Output
    final_response: Optional[str]
    resolution_status: str

# Instantiate team subgraphs
research_team = create_research_team()
technical_team = create_technical_team()
comms_team = create_comms_team()

def main_supervisor(state: MainState) -> dict:
    """
    Main supervisor decides which team to engage.
    Only coordinates teams, never individual workers.
    """
    llm = ChatOpenAI(model="gpt-4o", temperature=0)

    request = state["customer_request"]
    completed_teams = [r["team"] for r in state.get("team_results", [])]

    # Determine which teams are needed
    prompt = f"""Analyze this customer request and decide which team to engage next.

REQUEST: {request}

TEAMS ALREADY CONSULTED: {completed_teams}

AVAILABLE TEAMS:
- research_team: Find documentation, articles, API references
- technical_team: Diagnose issues, test solutions, propose fixes
- comms_team: Draft customer-facing response

Respond with the team name or 'synthesize' if all needed work is done."""

    response = llm.invoke(prompt)
    next_team = response.content.strip().lower()

    if next_team not in ["research_team", "technical_team", "comms_team"]:
        next_team = "synthesize"

    return {
        "current_team": next_team,
        "iteration": state.get("iteration", 0) + 1
    }

def research_team_node(state: MainState) -> dict:
    """Execute research team subgraph."""
    # Prepare input for research team
    team_input = {
        "query": state["customer_request"],
        "context": {"ticket_id": state["ticket_id"]}
    }

    # Run the subgraph
    result = research_team.invoke(team_input)

    # Return result in main state format
    return {
        "team_results": [{
            "team": "research_team",
            "summary": result.get("research_summary", ""),
            "confidence": result.get("confidence", 0.5),
            "raw_results": {
                "docs": result.get("doc_results"),
                "web": result.get("web_results"),
                "api": result.get("api_results")
            }
        }]
    }

def technical_team_node(state: MainState) -> dict:
    """Execute technical team subgraph."""
    # Get context from previous research if available
    research_context = None
    for r in state.get("team_results", []):
        if r["team"] == "research_team":
            research_context = r["summary"]

    team_input = {
        "issue": state["customer_request"],
        "context": {
            "ticket_id": state["ticket_id"],
            "research": research_context
        }
    }

    result = technical_team.invoke(team_input)

    return {
        "team_results": [{
            "team": "technical_team",
            "summary": result.get("technical_summary", ""),
            "severity": result.get("severity", "medium"),
            "fix": result.get("fix_proposal")
        }]
    }

def comms_team_node(state: MainState) -> dict:
    """Execute communications team subgraph."""
    # Gather all previous team results for context
    all_context = []
    for r in state.get("team_results", []):
        all_context.append(f"{r['team']}: {r['summary']}")

    team_input = {
        "content": "\n".join(all_context),
        "tone": "professional and helpful",
        "audience": "customer"
    }

    result = comms_team.invoke(team_input)

    return {
        "team_results": [{
            "team": "comms_team",
            "message": result.get("final_message", ""),
            "draft": result.get("draft")
        }]
    }

def final_synthesizer(state: MainState) -> dict:
    """Create final response from all team contributions."""
    llm = ChatOpenAI(model="gpt-4o")

    team_contributions = []
    for r in state.get("team_results", []):
        team_contributions.append(f"## {r['team'].upper()}\n{r.get('summary', r.get('message', ''))}")

    prompt = f"""Create a final customer response based on team contributions:

{chr(10).join(team_contributions)}

Create a professional, helpful response that addresses the customer's needs."""

    response = llm.invoke(prompt)

    return {
        "final_response": response.content,
        "resolution_status": "resolved"
    }

def route_to_team(state: MainState) -> str:
    """Route to appropriate team based on supervisor decision."""
    return state.get("current_team", "synthesize")

# =============================================================================
# Build Main Graph
# =============================================================================
def build_hierarchical_system():
    """Build the complete hierarchical multi-agent system."""
    graph = StateGraph(MainState)

    # Add main supervisor
    graph.add_node("main_supervisor", main_supervisor)

    # Add team nodes (each wraps a subgraph)
    graph.add_node("research_team", research_team_node)
    graph.add_node("technical_team", technical_team_node)
    graph.add_node("comms_team", comms_team_node)
    graph.add_node("synthesize", final_synthesizer)

    # Entry point
    graph.set_entry_point("main_supervisor")

    # Supervisor routes to teams
    graph.add_conditional_edges(
        "main_supervisor",
        route_to_team,
        {
            "research_team": "research_team",
            "technical_team": "technical_team",
            "comms_team": "comms_team",
            "synthesize": "synthesize"
        }
    )

    # Teams return to supervisor for next decision
    graph.add_edge("research_team", "main_supervisor")
    graph.add_edge("technical_team", "main_supervisor")
    graph.add_edge("comms_team", "main_supervisor")

    # Synthesize ends the workflow
    graph.add_edge("synthesize", END)

    # CRITICAL: Only parent graph has checkpointer
    from langgraph.checkpoint.postgres import PostgresSaver
    checkpointer = PostgresSaver.from_conn_string("postgresql://...")

    return graph.compile(checkpointer=checkpointer)

Critical: Parent-Only Checkpointing (January 2026)

This is one of the most important patterns in hierarchical LangGraph systems.

# ============================================================================
# CORRECT: Only parent has checkpointer
# ============================================================================
# Subgraphs - NO checkpointer
research_team = create_research_team()     # .compile() with no args
technical_team = create_technical_team()   # .compile() with no args

# Parent graph - HAS checkpointer
main_app = main_graph.compile(checkpointer=PostgresSaver(...))


# ============================================================================
# WRONG: Nested checkpointers cause problems
# ============================================================================
# DON'T DO THIS
research_team = create_research_team().compile(
    checkpointer=PostgresSaver(...)  # BAD - creates conflicts
)

Why Parent-Only Checkpointing:

State Consistency: Parent checkpointer captures complete state including all subgraph results at each step
No Race Conditions: Multiple checkpointers would try to save state independently
Single Source of Truth: Resume and time-travel work correctly with unified state
Simpler Debugging: One place to inspect workflow history

Parallel Team Execution

For independent teams, use the Send API:

from langgraph.constants import Send

class ParallelMainState(TypedDict):
    request: str
    team_results: Annotated[List[dict], operator.add]
    final_response: Optional[str]

def dispatch_to_all_teams(state: ParallelMainState) -> List[Send]:
    """Launch all teams in parallel."""
    return [
        Send("research_team", {
            "query": state["request"],
            "context": {}
        }),
        Send("technical_team", {
            "issue": state["request"],
            "context": {}
        }),
        Send("comms_team", {
            "content": state["request"],
            "tone": "professional",
            "audience": "customer"
        })
    ]

def collect_and_synthesize(state: ParallelMainState) -> dict:
    """Collect all parallel results and create final response."""
    results = state.get("team_results", [])

    # All teams ran in parallel, results collected via operator.add
    synthesis = f"Collected results from {len(results)} teams"

    return {"final_response": synthesis}

# Build parallel graph
graph = StateGraph(ParallelMainState)
graph.add_node("dispatcher", lambda s: {})  # Placeholder
graph.add_node("research_team", research_team_node)
graph.add_node("technical_team", technical_team_node)
graph.add_node("comms_team", comms_team_node)
graph.add_node("synthesizer", collect_and_synthesize)

# Fan out to all teams
graph.add_conditional_edges("dispatcher", dispatch_to_all_teams)

# All teams converge to synthesizer
graph.add_edge("research_team", "synthesizer")
graph.add_edge("technical_team", "synthesizer")
graph.add_edge("comms_team", "synthesizer")

graph.add_edge("synthesizer", END)
graph.set_entry_point("dispatcher")

State Passing Patterns

Pattern 1: Context Injection

def team_node_with_context(state: MainState) -> dict:
    """Pass context from main state to team."""
    team_input = {
        "query": state["request"],
        "context": {
            "user_id": state.get("user_id"),
            "previous_results": state.get("team_results", []),
            "preferences": state.get("user_preferences", {})
        }
    }
    result = team_subgraph.invoke(team_input)
    return {"team_results": [{"team": "name", "result": result}]}

Pattern 2: Result Transformation

def transform_team_result(state: MainState) -> dict:
    """Transform team output to main state format."""
    team_input = {"query": state["request"]}
    raw_result = team_subgraph.invoke(team_input)

    # Transform to standardized format
    return {
        "team_results": [{
            "team": "research",
            "summary": raw_result.get("research_summary"),
            "confidence": raw_result.get("confidence", 0.5),
            "metadata": {
                "sources": raw_result.get("sources", []),
                "timestamp": datetime.now().isoformat()
            }
        }]
    }

Error Handling Across Hierarchy

def robust_team_node(state: MainState) -> dict:
    """Team node with comprehensive error handling."""
    try:
        team_input = {"query": state["request"]}
        result = team_subgraph.invoke(team_input)

        return {
            "team_results": [{
                "team": "research",
                "status": "success",
                "result": result
            }]
        }
    except TimeoutError:
        return {
            "team_results": [{
                "team": "research",
                "status": "timeout",
                "error": "Team execution timed out"
            }]
        }
    except Exception as e:
        return {
            "team_results": [{
                "team": "research",
                "status": "failed",
                "error": str(e)
            }]
        }

def main_supervisor_handles_failures(state: MainState) -> dict:
    """Main supervisor adapts to team failures."""
    results = state.get("team_results", [])

    # Check for failures
    failures = [r for r in results if r.get("status") == "failed"]

    if failures:
        failed_team = failures[-1]["team"]

        # Decide: retry, skip, or use alternative
        if state["iteration"] < 3:
            return {"current_team": failed_team}  # Retry
        else:
            return {"current_team": "synthesize"}  # Skip and finish

    return normal_routing(state)

Common Interview Questions

Q1: "Why use hierarchical teams instead of a flat supervisor pattern?"

Strong Answer:

"Hierarchical teams are better when you have: (1) Distinct domains of expertise that require specialized coordination - a research team operates differently than a technical team. (2) More than 5-7 workers - cognitive load on a single supervisor becomes too high. (3) Need for independent team updates - you can modify the research team without touching technical team. (4) Clear team boundaries - when workers naturally group by function. Flat is simpler and better for smaller systems with cross-cutting concerns."

Q2: "Why must checkpointing be parent-only in hierarchical systems?"

Answer:

"Three critical reasons: (1) State consistency - the parent checkpointer captures the complete state including all subgraph results at each main step. (2) No race conditions - multiple checkpointers would try to save state independently, creating conflicts. (3) Single source of truth - resume and time-travel debugging work correctly because there's one unified state history. Nested checkpointers would create fragmented, potentially inconsistent state."

Q3: "How do you handle a team that consistently fails?"

Answer:

"Implement circuit breaker at the main supervisor level. Track team failures in state. After N consecutive failures, mark that team as 'degraded' and either: (1) Route to an alternative team if available, (2) Skip that team and synthesize with partial results, (3) Escalate to human operator. Never let one team's issues block the entire workflow. Log all failures for post-mortem analysis."

Q4: "When would you use parallel vs sequential team execution?"

Answer:

"Parallel when teams are independent - research, technical analysis, and initial draft can all run simultaneously. Sequential when there are dependencies - comms team needs research results before drafting. Hybrid approach: run independent teams in parallel first, then sequence dependent teams. Use Send API for parallel, edge chains for sequential."

Key Takeaways

Hierarchical teams = main supervisor + team supervisors + workers
Subgraphs encapsulate each team's complete logic
Parent-only checkpointing is critical for state consistency
State passed explicitly between hierarchy levels
Send API enables parallel team execution
Error handling at each level with graceful degradation
Team nodes wrap subgraphs - main graph sees teams as single nodes

Next: Learn inter-agent communication patterns for complex message passing in Lesson 3.

:::

When Single Supervisor Isn't Enough

Hierarchical Architecture

Building Team Subgraphs

Main Supervisor with Team Nodes

Critical: Parent-Only Checkpointing (January 2026)

Parallel Team Execution

State Passing Patterns

Error Handling Across Hierarchy

Common Interview Questions

Q1: "Why use hierarchical teams instead of a flat supervisor pattern?"

Q2: "Why must checkpointing be parent-only in hierarchical systems?"

Q3: "How do you handle a team that consistently fails?"

Q4: "When would you use parallel vs sequential team execution?"

Key Takeaways

Quiz

Stay on the Nerd Track