Multi-Agent Orchestration
Multi-Agent Architectures and Coordination
Why Multi-Agent Systems
A single agent with one LLM call can handle straightforward tasks. But real-world problems quickly outgrow what a single agent can do well. Multi-agent systems address four fundamental challenges:
Task Decomposition — Complex tasks naturally break into subtasks. A "research and write a report" request involves searching, reading, analyzing, and writing. A single agent with one massive prompt tends to lose focus. Multiple specialized agents, each handling one subtask, produce better results.
Specialization — Different tasks require different configurations. A coding agent needs access to a code interpreter and should use a model optimized for code generation. A summarization agent needs a large context window but doesn't need tool access. By giving each agent its own system prompt, tool set, and model selection, you optimize for each subtask.
Parallel Execution — Independent subtasks can run simultaneously. If a customer inquiry needs both order lookup and product information, two agents can fetch that data in parallel rather than sequentially, cutting latency significantly.
Reliability Through Redundancy — If one agent fails or produces poor results, the system can retry with a different agent, escalate to a human, or use a fallback strategy. A single-agent system has a single point of failure.
Orchestration Patterns
Supervisor Pattern
A central supervisor agent receives the user request, breaks it into subtasks, routes each subtask to the appropriate specialist agent, and aggregates results into a final response.
User Request → Supervisor → [Agent A, Agent B, Agent C] → Supervisor → Response
The supervisor makes all routing decisions. It sees every intermediate result and decides what happens next. This is the most common pattern in production systems.
Strengths: Centralized control, easy to reason about, straightforward error handling. Weaknesses: The supervisor is a bottleneck and a single point of failure. All traffic flows through it, increasing latency.
Hierarchical Pattern
A tree of supervisors where a top-level supervisor delegates to mid-level supervisors, who further delegate to worker agents. This extends the supervisor pattern to handle more complex task decomposition.
Top Supervisor → [Team Lead A, Team Lead B]
Team Lead A → [Worker A1, Worker A2]
Team Lead B → [Worker B1, Worker B2]
Strengths: Scales to complex tasks, natural division of responsibility. Weaknesses: Deep hierarchies add latency. Debugging failures across multiple levels is difficult.
Peer-to-Peer Pattern
Agents communicate directly with each other without a central coordinator. Each agent decides when to hand off work to another agent based on its own assessment of the task.
Agent A ←→ Agent B ←→ Agent C
The OpenAI Swarm pattern uses this approach: agents have handoff functions that transfer the conversation to another agent when they determine the task is outside their specialization.
Strengths: No bottleneck, agents are loosely coupled, easy to add new agents. Weaknesses: Hard to track overall progress, potential for circular handoffs, no single point of control for error recovery.
Pipeline Pattern
Agents are arranged in a fixed sequence where each agent's output becomes the next agent's input. This works well for workflows with clear stages.
Input → Agent A → Agent B → Agent C → Output
Example: A content moderation pipeline where Agent A classifies content type, Agent B checks against policy rules, and Agent C generates the moderation decision.
Strengths: Predictable execution flow, easy to test each stage independently. Weaknesses: Rigid — hard to skip stages or loop back. Not suitable for dynamic tasks.
Market-Based Pattern
Agents "bid" on tasks based on their capabilities and current load. A task broker assigns work to the best-fit agent. This is less common in LLM systems but appears in distributed computing.
Strengths: Dynamic load balancing, self-organizing. Weaknesses: Complex to implement, overkill for most LLM agent use cases.
State Management Approaches
How agents share information is a critical architectural decision:
Shared State
All agents read from and write to a common state object. This is the approach used by LangGraph, where a typed state dictionary is passed through the graph and each node (agent) can read and update it.
interface SharedState {
messages: Message[];
currentAgent: string;
taskResults: Record<string, any>;
metadata: Record<string, any>;
}
Trade-offs: Simple to implement. But concurrent writes can cause conflicts. You need a conflict resolution strategy (last-write-wins, merge functions, or optimistic locking).
Message Passing
Agents communicate by sending structured messages to each other. Each agent maintains its own internal state and only exposes information through messages.
interface AgentMessage {
from: string;
to: string;
type: "request" | "response" | "handoff";
payload: any;
conversationId: string;
}
Trade-offs: Clean separation of concerns. Agents are independently testable. But message routing adds complexity, and you need a message broker or event bus.
Blackboard Architecture
A shared workspace (the "blackboard") where agents post partial results. Any agent can read the blackboard and contribute when it has something relevant to add. A controller monitors the blackboard and activates agents as needed.
Trade-offs: Flexible and extensible. Good for problems where the solution emerges incrementally. But the control logic can become complex.
Agent Handoff Protocols
A handoff occurs when one agent transfers control of a conversation to another agent. Designing handoff protocols well is critical for user experience and system reliability.
What Must Transfer During Handoff
- Conversation history — The full message thread so the receiving agent has context
- Task state — What has been accomplished so far and what remains
- User intent — Why the handoff is happening (the first agent's assessment)
- Metadata — User identity, session ID, priority level, any accumulated tool results
Handoff Triggers
- Capability boundary — The current agent lacks a required tool or knowledge domain
- Confidence threshold — The agent's confidence in handling the request drops below a threshold
- Explicit routing — The supervisor directs the handoff based on task classification
- Escalation — The task requires human intervention or a higher-capability model
Handoff Implementation Pattern
interface HandoffRequest {
sourceAgent: string;
targetAgent: string;
reason: string;
conversationHistory: Message[];
taskState: Record<string, any>;
priority: "low" | "medium" | "high" | "critical";
}
The receiving agent should validate that it can handle the request before accepting. If it cannot, it should reject the handoff with a reason, allowing the orchestrator to try an alternative agent.
Failure Modes in Multi-Agent Systems
Understanding failure modes is essential for interviews. You should be able to identify these proactively during system design discussions:
| Failure Mode | Description | Mitigation |
|---|---|---|
| Infinite Delegation Loop | Agent A hands off to Agent B, which hands back to Agent A | Track handoff history; limit max handoffs per request |
| State Corruption | Concurrent agents overwrite each other's state updates | Use versioned state with conflict resolution |
| Resource Exhaustion | Agents spawn too many sub-tasks, consuming all available tokens or API calls | Set per-request token budgets and task limits |
| Deadlock | Agent A waits for Agent B's result while B waits for A | Use timeouts on all inter-agent communication |
| Cascading Failure | One agent's failure causes downstream agents to fail | Circuit breakers isolate failing agents |
| Context Window Overflow | Accumulated conversation history exceeds the model's context limit | Summarize or truncate history before handoff |
Real-World Multi-Agent Architectures
Anthropic Claude — Tool Use and MCP
Anthropic's approach centers on a single capable model with extensive tool access via the Model Context Protocol (MCP). Rather than multiple LLM agents, the architecture gives one agent access to many tools through MCP servers. The model decides which tools to call, executes them, and reasons over results in a loop.
This is technically a single-agent architecture with multi-tool orchestration, but the pattern scales to multi-agent when MCP servers themselves contain agent logic.
OpenAI Swarm Pattern
The Swarm pattern (open-sourced by OpenAI) implements peer-to-peer agent handoffs. Each agent is defined by a system prompt and a set of functions, including handoff_to_* functions that transfer the conversation. The framework manages conversation state and routes messages to the active agent.
Key design principle: agents are lightweight and stateless. All state lives in the conversation context. This makes agents easy to test in isolation and easy to compose.
LangGraph Supervisor
LangGraph implements the supervisor pattern using a directed graph. The supervisor is a node that routes to specialist nodes based on the current state. Each node updates the shared state and returns control to the supervisor, which decides the next step.
The graph structure makes the control flow explicit and visualizable. State is typed and validated at each transition.
Interview Deep-Dive: Customer Support Multi-Agent System
This is one of the most common multi-agent design questions in interviews. Here is how to approach it:
Requirements
- Route customer inquiries to the right specialist (billing, technical support, returns, general)
- Handle multi-topic inquiries (e.g., "I have a billing question AND a technical issue")
- Escalate to humans when agents cannot resolve the issue
- Maintain conversation continuity across handoffs
- Track resolution metrics
Proposed Architecture
Pattern choice: Supervisor with specialist agents
Customer Message
↓
Router Agent (classifier)
↓
Supervisor
↓ routes to
┌─────────────────────────────────────┐
│ Billing Agent │ Tech Agent │ Returns │
│ │ │ Agent │
└─────────────────────────────────────┘
↓ if unresolved
Escalation Agent → Human Queue
Router Agent — Classifies the inquiry type using a fast, inexpensive model. For multi-topic inquiries, it identifies all topics and creates subtasks.
Specialist Agents — Each has a focused system prompt, access to relevant tools (billing system API, knowledge base, order management), and domain-specific few-shot examples.
Escalation Logic:
- Agent confidence below threshold after 3 attempts
- Customer explicitly requests a human
- Sensitive topics (legal, safety) detected by the classifier
- Token budget exceeded without resolution
State Management: Shared state with the conversation history, current agent assignment, resolution status per topic, and escalation history.
Key interview points to mention:
- Cost optimization: Use a smaller model for routing, a capable model for specialists
- Latency: Parallel execution when multiple topics are detected
- Observability: Trace every routing decision and agent response for quality review
- Testing: Each specialist agent can be tested independently with golden datasets
Interview tip: When designing multi-agent systems, always address three things: how agents discover each other's capabilities, how state flows between agents, and what happens when an agent fails. These three concerns separate junior designs from senior ones.
In the lab, you'll implement a multi-agent orchestration framework in TypeScript — building the supervisor, handoff protocol, shared state, and circuit breaker patterns from scratch. :::