State Management Fundamentals
TypedDict Schemas & State Design Patterns
Why State Design Matters in Production LangGraph
Real Production Scenario (January 2026):
A team at a Fortune 500 company migrated from CrewAI to LangGraph because their multi-agent research system crashed after processing 10K+ documents. The issue? Unstructured state that grew unbounded, eating 64GB of memory.
After redesigning with TypedDict schemas and state size limits, they processed 100K+ documents with <2GB memory footprint.
This lesson teaches you: How to design production-ready state schemas that scale, avoid memory bloat, and pass LangGraph interviews at companies like Anthropic and LangChain.
State as the Single Source of Truth
In LangGraph, state is everything. Unlike CrewAI (which hides state in crew objects) or AutoGen (which uses message lists), LangGraph makes state explicit and controllable.
Core Principle:
# State flows through ALL nodes
# Each node reads state → transforms it → returns updates
# Graph merges updates back into state
Why This Matters:
- Debugging: You can inspect exact state at any point
- Resumability: State can be checkpointed and resumed
- Testing: You can unit test nodes with mock state
- Observability: LangSmith traces show state evolution
Interview Question (Anthropic L5):
"Why does LangGraph use explicit state instead of implicit message passing like AutoGen?"
Strong Answer:
"Explicit state enables checkpointing, time-travel debugging, and precise control over memory usage. With message-based systems, you lose state history when the process crashes. LangGraph's state-first design makes workflows resumable and production-ready, which is critical for long-running multi-agent tasks that can fail midway."
TypedDict vs Pydantic: When to Use Each
TypedDict (Recommended for Most LangGraph Use Cases)
As of January 2026, LangGraph's native state uses typing.TypedDict:
from typing import TypedDict, Annotated, Optional
from langgraph.graph import StateGraph
import operator
class AgentState(TypedDict):
"""
State for a research agent workflow.
TypedDict provides compile-time type hints without runtime overhead.
"""
# Required fields
query: str
# Optional fields with defaults
documents: Annotated[list[str], operator.add] # Accumulates with reducer
analysis: Optional[str]
final_report: Optional[str]
# Control flow fields
iteration_count: int
max_iterations: int
error_message: Optional[str]
# Usage in graph
graph = StateGraph(AgentState)
Advantages of TypedDict:
- Zero runtime overhead: Just type hints for IDE autocomplete
- Native LangGraph support: Works seamlessly with StateGraph
- Simple syntax: Familiar to Python developers
- Annotated fields: Supports reducers (more on this in Lesson 2)
When to Use TypedDict:
- ✅ Standard LangGraph workflows (95% of cases)
- ✅ Performance-critical applications
- ✅ When state is simple key-value data
Pydantic (For Advanced Validation)
from pydantic import BaseModel, Field, field_validator
from typing import Annotated
import operator
class AgentState(BaseModel):
"""
Pydantic state with runtime validation.
Use when you need strict data contracts.
"""
query: str = Field(..., min_length=10, max_length=500)
documents: Annotated[list[str], operator.add] = Field(default_factory=list)
analysis: Optional[str] = None
iteration_count: int = Field(default=0, ge=0) # >= 0
max_iterations: int = Field(default=10, le=100) # <= 100
@field_validator('query')
@classmethod
def validate_query(cls, v: str) -> str:
if any(word in v.lower() for word in ['hack', 'exploit']):
raise ValueError("Query contains forbidden terms")
return v
# Convert to dict for LangGraph
graph = StateGraph(AgentState.model_json_schema())
When to Use Pydantic:
- ✅ Need runtime validation (e.g., user input from API)
- ✅ Complex business rules (field dependencies, custom validators)
- ✅ Strict data contracts across teams
- ❌ Performance-critical inner loops (validation adds overhead)
Production Pattern (January 2026):
Use TypedDict for state, Pydantic for API input/output validation. Convert at boundaries.
State Structure: Flat vs Nested
Flat State (Recommended)
class FlatAgentState(TypedDict):
"""
Flat structure: all fields at top level.
Easier to read, update, and checkpoint.
"""
# Input
user_query: str
# Research phase
search_results: Annotated[list[str], operator.add]
research_summary: Optional[str]
# Analysis phase
key_findings: Annotated[list[str], operator.add]
analysis_report: Optional[str]
# Output
final_answer: Optional[str]
# Metadata
iteration: int
total_tokens_used: int
Advantages:
- ✅ Simple to access:
state["user_query"](notstate["input"]["query"]) - ✅ Easier checkpointing: Flat structure serializes cleanly
- ✅ Reducer functions work naturally on top-level lists
Nested State (Use Sparingly)
class NestedAgentState(TypedDict):
"""
Nested structure: grouped by concern.
More organized but harder to work with.
"""
input: dict # {"query": str, "max_results": int}
research: dict # {"results": list, "summary": str}
analysis: dict # {"findings": list, "report": str}
output: dict # {"answer": str, "confidence": float}
metadata: dict # {"iteration": int, "tokens": int}
When Nested Makes Sense:
- ✅ State has 15+ fields (grouping improves readability)
- ✅ Clear subsystem boundaries (e.g., separate agent teams)
- ❌ Most cases - prefer flat for simplicity
Production Pattern:
Start flat. Only nest if you have >15 fields or distinct subsystems.
Naming Conventions for Production
Field Naming Strategy:
class ProductionAgentState(TypedDict):
"""
Naming conventions for clarity.
"""
# Use clear, descriptive names (not abbreviations)
user_query: str # ✅ Clear
# usr_q: str # ❌ Cryptic
# Pluralize lists
documents: list[str] # ✅ Plural for list
# document: list[str] # ❌ Singular confusing
# Use Optional[] for fields that may not exist
analysis: Optional[str] # ✅ Explicit optionality
# analysis: str # ❌ Assumes always present
# Prefix control flow fields
iteration_count: int # ✅ Clear counter
max_iterations: int # ✅ Clear limit
is_finished: bool # ✅ Boolean with is_ prefix
# Suffix metadata
created_at: str # ✅ Timestamp suffix
updated_at: str # ✅ Timestamp suffix
error_message: Optional[str] # ✅ Error tracking
Common Mistakes to Avoid:
| Bad Name | Good Name | Why |
|---|---|---|
res |
search_results |
Clarity over brevity |
doc |
documents |
Plural for lists |
err |
error_message |
Descriptive |
iter |
iteration_count |
Explicit purpose |
done |
is_finished |
Boolean convention |
Production Pattern: Document Your State
Critical for Team Collaboration:
from typing import TypedDict, Annotated, Optional, Literal
import operator
class ResearchAgentState(TypedDict):
"""
State for multi-agent research workflow.
Workflow:
1. Researcher: query → documents (accumulated)
2. Analyzer: documents → analysis
3. Writer: analysis → final_report
4. Supervisor: Routes between agents, checks iteration limit
Author: AI Team
Last Updated: 2026-01-15
"""
# === INPUT ===
query: str
"""User's research question. Required. Max 500 chars."""
# === RESEARCH PHASE ===
documents: Annotated[list[str], operator.add]
"""
Accumulated research documents.
Uses operator.add reducer to append (not replace).
Max 100 documents to prevent memory bloat.
"""
# === ANALYSIS PHASE ===
analysis: Optional[str]
"""Structured analysis of documents. Generated by Analyzer agent."""
key_findings: Annotated[list[str], operator.add]
"""Top 3-5 findings. Accumulated across iterations."""
# === OUTPUT ===
final_report: Optional[str]
"""Final markdown report. Generated by Writer agent."""
# === CONTROL FLOW ===
next_agent: Literal["researcher", "analyzer", "writer", "end"]
"""Routing decision by Supervisor. Must be one of the literal values."""
iteration_count: int
"""Current iteration number. Starts at 0."""
max_iterations: int
"""Hard limit to prevent infinite loops. Default: 10."""
# === ERROR HANDLING ===
error_message: Optional[str]
"""If any node fails, error is stored here for debugging."""
Why This Matters:
- New team members understand state instantly
- Reduces onboarding time from days to hours
- Documents become self-explanatory for LangSmith traces
Production Pattern: State Size Management
Problem: State grows unbounded in long-running workflows.
Solution: Track size and prune periodically.
import sys
from typing import TypedDict, Optional
class ManagedAgentState(TypedDict):
"""State with size tracking."""
documents: list[str]
analysis: Optional[str]
final_report: Optional[str]
# Size tracking
state_size_bytes: int
max_state_size_bytes: int # e.g., 10MB
def estimate_state_size(state: dict) -> int:
"""
Estimate state size in bytes.
Used in production to prevent memory bloat.
"""
import json
return len(json.dumps(state, default=str).encode('utf-8'))
def prune_state_if_needed(state: ManagedAgentState) -> ManagedAgentState:
"""
Prune state if it exceeds max size.
Called periodically in supervisor node.
"""
current_size = estimate_state_size(state)
state["state_size_bytes"] = current_size
if current_size > state["max_state_size_bytes"]:
# Keep only last 10 documents
state["documents"] = state["documents"][-10:]
print(f"⚠️ Pruned state from {current_size} to {estimate_state_size(state)} bytes")
return state
# Usage in node
def supervisor_node(state: ManagedAgentState) -> ManagedAgentState:
"""Supervisor with state management."""
state = prune_state_if_needed(state)
# ... routing logic ...
return {"next_agent": "researcher"}
Production Limits (January 2026):
- PostgresSaver: 1GB per checkpoint (configurable)
- Redis: 512MB default (increase for large workflows)
- In-memory: Monitor with
sys.getsizeof(), prune at 100MB+
Common Interview Questions
Q1: "Should state include intermediate results or just final outputs?"
Strong Answer:
"Include intermediate results for debugging and resumability. For example, store both
documents(raw search results) andanalysis(processed insights). This allows you to inspect failures mid-workflow and resume from checkpoints without re-running expensive LLM calls. In production, I prune old intermediate results when state exceeds size limits to prevent memory bloat."
Q2: "How do you handle state conflicts in concurrent workflows?"
Answer:
"LangGraph uses thread IDs to isolate state between concurrent runs. Each workflow gets its own state namespace in the checkpointer. For shared resources (e.g., database connections), I pass them via config, not state. For team coordination, I use message-passing fields in state like
team_messages: Annotated[list, operator.add]where agents append without conflicts."
Q3: "When would you use Pydantic instead of TypedDict for state?"
Answer:
"Use Pydantic when state comes from external sources (APIs, user input) that need runtime validation. For example, if users submit queries via REST API, Pydantic validates length, format, and business rules before entering the graph. For internal state within LangGraph nodes, TypedDict is preferred because it has zero runtime overhead and works natively with StateGraph."
Key Takeaways for Production
✅ Use TypedDict for most LangGraph state (zero overhead, native support) ✅ Start with flat state (only nest if >15 fields) ✅ Document all fields with docstrings (clarity for teams) ✅ Use Optional[] for fields that may not exist ✅ Track state size in long-running workflows (prune when needed) ✅ Name fields clearly (no abbreviations, use conventions)
Next: Learn how to mutate state correctly with reducers and update semantics in Lesson 2.
:::