Multi-Agent Architectures and Coordination

Why Multi-Agent Systems

A single agent with one LLM call can handle straightforward tasks. But real-world problems quickly outgrow what a single agent can do well. Multi-agent systems address four fundamental challenges:

Task Decomposition — Complex tasks naturally break into subtasks. A "research and write a report" request involves searching, reading, analyzing, and writing. A single agent with one massive prompt tends to lose focus. Multiple specialized agents, each handling one subtask, produce better results.

Specialization — Different tasks require different configurations. A coding agent needs access to a code interpreter and should use a model optimized for code generation. A summarization agent needs a large context window but doesn't need tool access. By giving each agent its own system prompt, tool set, and model selection, you optimize for each subtask.

Parallel Execution — Independent subtasks can run simultaneously. If a customer inquiry needs both order lookup and product information, two agents can fetch that data in parallel rather than sequentially, cutting latency significantly.

Reliability Through Redundancy — If one agent fails or produces poor results, the system can retry with a different agent, escalate to a human, or use a fallback strategy. A single-agent system has a single point of failure.

Orchestration Patterns

Supervisor Pattern

A central supervisor agent receives the user request, breaks it into subtasks, routes each subtask to the appropriate specialist agent, and aggregates results into a final response.

User Request → Supervisor → [Agent A, Agent B, Agent C] → Supervisor → Response

The supervisor makes all routing decisions. It sees every intermediate result and decides what happens next. This is the most common pattern in production systems.

Strengths: Centralized control, easy to reason about, straightforward error handling. Weaknesses: The supervisor is a bottleneck and a single point of failure. All traffic flows through it, increasing latency.

Hierarchical Pattern

A tree of supervisors where a top-level supervisor delegates to mid-level supervisors, who further delegate to worker agents. This extends the supervisor pattern to handle more complex task decomposition.

Top Supervisor → [Team Lead A, Team Lead B]
Team Lead A → [Worker A1, Worker A2]
Team Lead B → [Worker B1, Worker B2]

Strengths: Scales to complex tasks, natural division of responsibility. Weaknesses: Deep hierarchies add latency. Debugging failures across multiple levels is difficult.

Peer-to-Peer Pattern

Agents communicate directly with each other without a central coordinator. Each agent decides when to hand off work to another agent based on its own assessment of the task.

Agent A ←→ Agent B ←→ Agent C

The OpenAI Swarm pattern uses this approach: agents have handoff functions that transfer the conversation to another agent when they determine the task is outside their specialization.

Strengths: No bottleneck, agents are loosely coupled, easy to add new agents. Weaknesses: Hard to track overall progress, potential for circular handoffs, no single point of control for error recovery.

Pipeline Pattern

Agents are arranged in a fixed sequence where each agent's output becomes the next agent's input. This works well for workflows with clear stages.

Input → Agent A → Agent B → Agent C → Output

Example: A content moderation pipeline where Agent A classifies content type, Agent B checks against policy rules, and Agent C generates the moderation decision.

Strengths: Predictable execution flow, easy to test each stage independently. Weaknesses: Rigid — hard to skip stages or loop back. Not suitable for dynamic tasks.

Market-Based Pattern

Agents "bid" on tasks based on their capabilities and current load. A task broker assigns work to the best-fit agent. This is less common in LLM systems but appears in distributed computing.

Strengths: Dynamic load balancing, self-organizing. Weaknesses: Complex to implement, overkill for most LLM agent use cases.

State Management Approaches

How agents share information is a critical architectural decision:

Shared State

All agents read from and write to a common state object. This is the approach used by LangGraph, where a typed state dictionary is passed through the graph and each node (agent) can read and update it.

interface SharedState {
  messages: Message[];
  currentAgent: string;
  taskResults: Record<string, any>;
  metadata: Record<string, any>;
}

Trade-offs: Simple to implement. But concurrent writes can cause conflicts. You need a conflict resolution strategy (last-write-wins, merge functions, or optimistic locking).

Message Passing

Agents communicate by sending structured messages to each other. Each agent maintains its own internal state and only exposes information through messages.

interface AgentMessage {
  from: string;
  to: string;
  type: "request" | "response" | "handoff";
  payload: any;
  conversationId: string;
}

Trade-offs: Clean separation of concerns. Agents are independently testable. But message routing adds complexity, and you need a message broker or event bus.

Blackboard Architecture

A shared workspace (the "blackboard") where agents post partial results. Any agent can read the blackboard and contribute when it has something relevant to add. A controller monitors the blackboard and activates agents as needed.

Trade-offs: Flexible and extensible. Good for problems where the solution emerges incrementally. But the control logic can become complex.

Agent Handoff Protocols

A handoff occurs when one agent transfers control of a conversation to another agent. Designing handoff protocols well is critical for user experience and system reliability.

What Must Transfer During Handoff

Conversation history — The full message thread so the receiving agent has context
Task state — What has been accomplished so far and what remains
User intent — Why the handoff is happening (the first agent's assessment)
Metadata — User identity, session ID, priority level, any accumulated tool results

Handoff Triggers

Capability boundary — The current agent lacks a required tool or knowledge domain
Confidence threshold — The agent's confidence in handling the request drops below a threshold
Explicit routing — The supervisor directs the handoff based on task classification
Escalation — The task requires human intervention or a higher-capability model

Handoff Implementation Pattern

interface HandoffRequest {
  sourceAgent: string;
  targetAgent: string;
  reason: string;
  conversationHistory: Message[];
  taskState: Record<string, any>;
  priority: "low" | "medium" | "high" | "critical";
}

The receiving agent should validate that it can handle the request before accepting. If it cannot, it should reject the handoff with a reason, allowing the orchestrator to try an alternative agent.

Failure Modes in Multi-Agent Systems

Understanding failure modes is essential for interviews. You should be able to identify these proactively during system design discussions:

Failure Mode	Description	Mitigation
Infinite Delegation Loop	Agent A hands off to Agent B, which hands back to Agent A	Track handoff history; limit max handoffs per request
State Corruption	Concurrent agents overwrite each other's state updates	Use versioned state with conflict resolution
Resource Exhaustion	Agents spawn too many sub-tasks, consuming all available tokens or API calls	Set per-request token budgets and task limits
Deadlock	Agent A waits for Agent B's result while B waits for A	Use timeouts on all inter-agent communication
Cascading Failure	One agent's failure causes downstream agents to fail	Circuit breakers isolate failing agents
Context Window Overflow	Accumulated conversation history exceeds the model's context limit	Summarize or truncate history before handoff

Real-World Multi-Agent Architectures

Anthropic Claude — Tool Use and MCP

Anthropic's approach centers on a single capable model with extensive tool access via the Model Context Protocol (MCP). Rather than multiple LLM agents, the architecture gives one agent access to many tools through MCP servers. The model decides which tools to call, executes them, and reasons over results in a loop.

This is technically a single-agent architecture with multi-tool orchestration, but the pattern scales to multi-agent when MCP servers themselves contain agent logic.

OpenAI Swarm Pattern

The Swarm pattern (open-sourced by OpenAI) implements peer-to-peer agent handoffs. Each agent is defined by a system prompt and a set of functions, including handoff_to_* functions that transfer the conversation. The framework manages conversation state and routes messages to the active agent.

Key design principle: agents are lightweight and stateless. All state lives in the conversation context. This makes agents easy to test in isolation and easy to compose.

LangGraph Supervisor

LangGraph implements the supervisor pattern using a directed graph. The supervisor is a node that routes to specialist nodes based on the current state. Each node updates the shared state and returns control to the supervisor, which decides the next step.

The graph structure makes the control flow explicit and visualizable. State is typed and validated at each transition.

Interview Deep-Dive: Customer Support Multi-Agent System

This is one of the most common multi-agent design questions in interviews. Here is how to approach it:

Requirements

Route customer inquiries to the right specialist (billing, technical support, returns, general)
Handle multi-topic inquiries (e.g., "I have a billing question AND a technical issue")
Escalate to humans when agents cannot resolve the issue
Maintain conversation continuity across handoffs
Track resolution metrics

Proposed Architecture

Pattern choice: Supervisor with specialist agents

Customer Message
      ↓
  Router Agent (classifier)
      ↓
  Supervisor
      ↓ routes to
  ┌─────────────────────────────────────┐
  │ Billing Agent │ Tech Agent │ Returns │
  │               │            │ Agent   │
  └─────────────────────────────────────┘
      ↓ if unresolved
  Escalation Agent → Human Queue

Router Agent — Classifies the inquiry type using a fast, inexpensive model. For multi-topic inquiries, it identifies all topics and creates subtasks.

Specialist Agents — Each has a focused system prompt, access to relevant tools (billing system API, knowledge base, order management), and domain-specific few-shot examples.

Escalation Logic:

Agent confidence below threshold after 3 attempts
Customer explicitly requests a human
Sensitive topics (legal, safety) detected by the classifier
Token budget exceeded without resolution

State Management: Shared state with the conversation history, current agent assignment, resolution status per topic, and escalation history.

Key interview points to mention:

Cost optimization: Use a smaller model for routing, a capable model for specialists
Latency: Parallel execution when multiple topics are detected
Observability: Trace every routing decision and agent response for quality review
Testing: Each specialist agent can be tested independently with golden datasets

Interview tip: When designing multi-agent systems, always address three things: how agents discover each other's capabilities, how state flows between agents, and what happens when an agent fails. These three concerns separate junior designs from senior ones.

In the lab, you'll implement a multi-agent orchestration framework in TypeScript — building the supervisor, handoff protocol, shared state, and circuit breaker patterns from scratch. :::