Recursive Loops & Cycle Detection

Why Recursive Loops Are Essential for AI Agents

Real Production Scenario (January 2026):

A customer support AI agent at a SaaS company needed to handle complex tickets that required multiple rounds of investigation. Initial implementation: a linear workflow that made one LLM call and returned. Result? 40% of tickets required manual escalation because the agent couldn't iterate to find the right solution.

After adding recursive loops with proper cycle detection and iteration limits, the agent resolved 85% of tickets autonomously by iterating until it found the correct answer or hit a safety limit.

This lesson teaches you: How to implement production-safe recursive loops in LangGraph, detect cycles, set iteration limits, and avoid the #1 production bug: infinite loops that burn through API credits.

Understanding Graph Cycles in LangGraph

What Is a Cycle?

A cycle occurs when a node can transition back to a previous node, creating a loop:

research → analyze → decide
    ↑                    ↓
    └────────────────────┘

Why Cycles Are Useful:

Agents that "think again" when results are unsatisfactory
Iterative refinement (draft → review → revise → review → done)
Search-and-refine patterns (search → evaluate → search again if needed)

Why Cycles Are Dangerous:

Infinite loops exhaust memory and API budgets
Without limits, a confused LLM can loop forever
Production systems need hard stops

Basic Recursive Loop Pattern

LangGraph 1.0.5 Implementation (January 2026)

from typing import TypedDict, Annotated, Optional, Literal
import operator
from langgraph.graph import StateGraph, END

class ResearchState(TypedDict):
    """State for iterative research workflow."""
    query: str
    research_results: Annotated[list[str], operator.add]
    analysis: Optional[str]
    is_satisfactory: bool

    # Iteration control
    iteration_count: int
    max_iterations: int

def research_node(state: ResearchState) -> dict:
    """Perform research iteration."""
    # Simulate research (replace with actual LLM call)
    new_findings = f"Research finding #{state['iteration_count'] + 1}"

    return {
        "research_results": [new_findings],
        "iteration_count": state["iteration_count"] + 1
    }

def analyze_node(state: ResearchState) -> dict:
    """Analyze research results and determine if satisfactory."""
    # Simulate analysis (replace with actual LLM call)
    combined_results = "\n".join(state["research_results"])
    analysis = f"Analysis of {len(state['research_results'])} findings"

    # Decision logic (in production, use LLM)
    is_satisfactory = len(state["research_results"]) >= 3

    return {
        "analysis": analysis,
        "is_satisfactory": is_satisfactory
    }

def should_continue(state: ResearchState) -> Literal["research", "end"]:
    """
    Conditional edge: Continue loop or end?
    CRITICAL: Always check iteration limit first!
    """
    # Safety check: Always enforce iteration limit
    if state["iteration_count"] >= state["max_iterations"]:
        print(f"Max iterations ({state['max_iterations']}) reached")
        return "end"

    # Business logic: Is the research satisfactory?
    if state["is_satisfactory"]:
        return "end"

    # Continue iterating
    return "research"

# Build graph with cycle
graph = StateGraph(ResearchState)
graph.add_node("research", research_node)
graph.add_node("analyze", analyze_node)

# Set entry point
graph.set_entry_point("research")

# Create the cycle: research -> analyze -> (research OR end)
graph.add_edge("research", "analyze")
graph.add_conditional_edges(
    "analyze",
    should_continue,
    {
        "research": "research",  # Loop back
        "end": END               # Exit
    }
)

# Compile
app = graph.compile()

# Invoke with safety limit
result = app.invoke({
    "query": "Latest AI trends 2026",
    "research_results": [],
    "analysis": None,
    "is_satisfactory": False,
    "iteration_count": 0,
    "max_iterations": 5  # Hard limit
})

print(f"Completed in {result['iteration_count']} iterations")

Production Pattern: Iteration Limits

Why Limits Are Non-Negotiable

Without limits (production incident):

# ❌ BAD: No iteration limit
def should_continue(state):
    if state["is_satisfactory"]:
        return "end"
    return "research"  # Can loop forever!

# Result: Agent ran 847 iterations in 3 hours
# Cost: $2,400 in API calls before manual kill

With limits (production-safe):

# ✅ GOOD: Always check limit first
def should_continue(state: ResearchState) -> str:
    # Priority 1: Enforce hard limit
    if state["iteration_count"] >= state["max_iterations"]:
        return "end"

    # Priority 2: Business logic
    if state["is_satisfactory"]:
        return "end"

    return "research"

Configurable Limits Pattern

class ConfigurableLoopState(TypedDict):
    query: str
    results: Annotated[list[str], operator.add]
    iteration_count: int

    # Configurable limits
    max_iterations: int       # Hard limit (e.g., 10)
    min_iterations: int       # Minimum before checking satisfaction
    timeout_seconds: float    # Time-based limit
    start_time: str           # ISO timestamp

from datetime import datetime

def should_continue_with_config(state: ConfigurableLoopState) -> str:
    """Production-grade continuation check."""

    # Check 1: Hard iteration limit
    if state["iteration_count"] >= state["max_iterations"]:
        print(f"Hard limit reached: {state['max_iterations']} iterations")
        return "end"

    # Check 2: Time-based limit
    start = datetime.fromisoformat(state["start_time"])
    elapsed = (datetime.now() - start).total_seconds()
    if elapsed > state["timeout_seconds"]:
        print(f"Timeout reached: {elapsed:.1f}s > {state['timeout_seconds']}s")
        return "end"

    # Check 3: Minimum iterations before early exit
    if state["iteration_count"] < state["min_iterations"]:
        return "continue"  # Force at least N iterations

    # Check 4: Business logic (satisfaction check)
    if is_satisfactory(state["results"]):
        return "end"

    return "continue"

Cycle Detection in Complex Graphs

Problem: Detecting Infinite Loops

In graphs with multiple cycles, detecting infinite loops requires tracking visited states:

from typing import TypedDict, Annotated
import operator
import hashlib
import json

class CycleAwareState(TypedDict):
    """State with cycle detection."""
    query: str
    results: Annotated[list[str], operator.add]
    iteration_count: int
    max_iterations: int

    # Cycle detection
    state_hashes: Annotated[list[str], operator.add]
    cycle_detected: bool

def compute_state_hash(state: dict, exclude_keys: list[str]) -> str:
    """
    Hash relevant state fields to detect repeated states.
    Exclude iteration counters and metadata.
    """
    relevant = {k: v for k, v in state.items() if k not in exclude_keys}
    state_str = json.dumps(relevant, sort_keys=True, default=str)
    return hashlib.sha256(state_str.encode()).hexdigest()[:16]

def cycle_aware_node(state: CycleAwareState) -> dict:
    """Node that detects cycles."""
    # Compute hash of current state (exclude counters)
    current_hash = compute_state_hash(
        state,
        exclude_keys=["iteration_count", "state_hashes", "cycle_detected"]
    )

    # Check for cycle
    if current_hash in state["state_hashes"]:
        print(f"Cycle detected! State hash {current_hash} seen before")
        return {
            "cycle_detected": True,
            "iteration_count": state["iteration_count"] + 1
        }

    # Process normally
    new_result = process_query(state["query"])

    return {
        "results": [new_result],
        "state_hashes": [current_hash],
        "iteration_count": state["iteration_count"] + 1
    }

def should_continue_cycle_aware(state: CycleAwareState) -> str:
    """Check for cycles before continuing."""
    if state["cycle_detected"]:
        print("Breaking out of cycle")
        return "end"

    if state["iteration_count"] >= state["max_iterations"]:
        return "end"

    return "continue"

Production Pattern: Recursion Depth Limits

LangGraph Built-in Protection

As of LangGraph 1.0.5 (January 2026), you can set recursion limits at compile time:

from langgraph.graph import StateGraph

# Build your graph
graph = StateGraph(ResearchState)
# ... add nodes and edges ...

# Compile with recursion limit
app = graph.compile(
    recursion_limit=25  # Maximum steps before forced stop
)

# This prevents infinite loops at the framework level
try:
    result = app.invoke(initial_state)
except RecursionError as e:
    print(f"Recursion limit hit: {e}")
    # Handle gracefully

Layered Protection Strategy

# Layer 1: State-level iteration counter (business logic)
class State(TypedDict):
    iteration_count: int
    max_iterations: int  # e.g., 10

# Layer 2: Compile-time recursion limit (framework safety)
app = graph.compile(recursion_limit=50)  # 5x buffer

# Layer 3: Timeout at invoke (infrastructure safety)
import asyncio

async def invoke_with_timeout(app, state, timeout_seconds=300):
    """Invoke with hard timeout."""
    try:
        result = await asyncio.wait_for(
            app.ainvoke(state),
            timeout=timeout_seconds
        )
        return result
    except asyncio.TimeoutError:
        print(f"Hard timeout after {timeout_seconds}s")
        return {"error": "Timeout", "partial_state": state}

Real-World Example: Iterative Research Agent

from typing import TypedDict, Annotated, Optional, Literal
import operator
from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic

class IterativeResearchState(TypedDict):
    """Production-ready iterative research state."""
    # Input
    query: str

    # Research accumulation
    sources: Annotated[list[str], operator.add]
    findings: Annotated[list[str], operator.add]

    # Synthesis
    draft_report: Optional[str]
    quality_score: float  # 0.0 to 1.0
    quality_threshold: float  # e.g., 0.8

    # Control
    iteration_count: int
    max_iterations: int
    phase: Literal["research", "synthesize", "review", "done"]

    # Observability
    total_tokens: Annotated[int, operator.add]

# Initialize LLM
llm = ChatAnthropic(model="claude-sonnet-4-6")

def research_node(state: IterativeResearchState) -> dict:
    """Gather more sources and findings."""
    prompt = f"""
    Query: {state['query']}

    Existing findings ({len(state['findings'])}):
    {chr(10).join(state['findings'][-5:])}  # Last 5 findings

    Find 2-3 NEW findings not already covered. Be specific and cite sources.
    """

    response = llm.invoke(prompt)
    new_findings = parse_findings(response.content)

    return {
        "findings": new_findings,
        "sources": extract_sources(response.content),
        "total_tokens": response.usage_metadata.get("total_tokens", 0),
        "iteration_count": state["iteration_count"] + 1,
        "phase": "synthesize"
    }

def synthesize_node(state: IterativeResearchState) -> dict:
    """Create or update draft report."""
    prompt = f"""
    Create a comprehensive report from these findings:
    {chr(10).join(state['findings'])}

    Previous draft (if any):
    {state['draft_report'] or 'None'}

    Write an improved, well-structured report.
    """

    response = llm.invoke(prompt)

    return {
        "draft_report": response.content,
        "total_tokens": response.usage_metadata.get("total_tokens", 0),
        "phase": "review"
    }

def review_node(state: IterativeResearchState) -> dict:
    """Evaluate report quality."""
    prompt = f"""
    Evaluate this report on a scale of 0.0 to 1.0:

    {state['draft_report']}

    Criteria:
    - Comprehensiveness (covers all aspects)
    - Accuracy (factually correct)
    - Clarity (well-written)
    - Evidence (cites sources)

    Return ONLY a number between 0.0 and 1.0.
    """

    response = llm.invoke(prompt)
    score = float(response.content.strip())

    return {
        "quality_score": score,
        "total_tokens": response.usage_metadata.get("total_tokens", 0),
        "phase": "done" if score >= state["quality_threshold"] else "research"
    }

def route_by_phase(state: IterativeResearchState) -> str:
    """Route based on phase and limits."""
    # Always check iteration limit first
    if state["iteration_count"] >= state["max_iterations"]:
        return "end"

    # Route by phase
    phase = state["phase"]
    if phase == "done":
        return "end"
    elif phase == "research":
        return "research"
    elif phase == "synthesize":
        return "synthesize"
    elif phase == "review":
        return "review"
    else:
        return "end"  # Safety fallback

# Build graph
graph = StateGraph(IterativeResearchState)
graph.add_node("research", research_node)
graph.add_node("synthesize", synthesize_node)
graph.add_node("review", review_node)

graph.set_entry_point("research")

# Add conditional routing from each node
for node_name in ["research", "synthesize", "review"]:
    graph.add_conditional_edges(
        node_name,
        route_by_phase,
        {
            "research": "research",
            "synthesize": "synthesize",
            "review": "review",
            "end": END
        }
    )

# Compile with safety limit
app = graph.compile(recursion_limit=30)

Interview Questions

Q1: "How do you prevent infinite loops in recursive LangGraph workflows?"

Strong Answer:

"I use layered protection: First, state-level iteration counters with business-logic limits (e.g., max 10 iterations). Second, LangGraph's compile-time recursion_limit as a framework safety net (e.g., 50 steps). Third, async timeouts at the invoke level for hard time limits. I always check iteration limits BEFORE business logic in conditional edges to ensure hard stops are enforced even if business logic has bugs."

Q2: "When would you use cycle detection vs. simple iteration counters?"

Answer:

"Simple iteration counters work for most cases where you just need to limit total loops. Cycle detection (tracking state hashes) is needed when the same logical state can be reached through different paths, and you want to detect when the agent is 'stuck' repeating the same reasoning. For example, in a troubleshooting agent that keeps suggesting the same solution, cycle detection catches this even if iteration count is low."

Q3: "How do you debug infinite loop issues in production?"