Checkpointing & Persistence

Resuming Workflows & Time-Travel Debugging

4 min read

Resume After Interruption

# Initial run
config = {"configurable": {"thread_id": "research-task-1"}}
result = app.invoke({"query": "AI trends 2026"}, config)

# ... system crashes or user closes browser ...

# Resume later - same thread_id
result = app.invoke({"continue": True}, config)
# Picks up from last checkpoint automatically

get_state: Inspect Current State

# Get current checkpoint
state = app.get_state(config)

print(f"Current values: {state.values}")
print(f"Next nodes: {state.next}")
print(f"Metadata: {state.metadata}")

# Example output:
# Current values: {'query': 'AI trends', 'documents': [...], 'iteration': 3}
# Next nodes: ('analyze',)
# Metadata: {'step': 5, 'timestamp': '2026-01-15T10:30:00'}

get_state_history: Time-Travel

# List all checkpoints for this thread
history = list(app.get_state_history(config))

for i, checkpoint in enumerate(history):
    print(f"Step {i}: {checkpoint.metadata.get('step')}")
    print(f"  State: {checkpoint.values}")
    print(f"  Next: {checkpoint.next}")

# Time-travel: Resume from specific checkpoint
old_config = history[3].config  # 4th checkpoint
result = app.invoke({"continue": True}, old_config)

update_state: Modify State Mid-Workflow

# Get current state
state = app.get_state(config)

# Human review finds issue - update state
app.update_state(
    config,
    {
        "documents": state.values["documents"] + ["New important doc"],
        "needs_reanalysis": True
    }
)

# Continue with modified state
result = app.invoke({"continue": True}, config)

Use Cases:

  • Human-in-the-loop corrections
  • Injecting external data
  • Debugging by modifying state

Forking: Branch from Checkpoint

# Get checkpoint to fork from
history = list(app.get_state_history(config))
fork_point = history[5]  # Fork from step 5

# Create new thread from that checkpoint
new_config = {"configurable": {"thread_id": "fork-experiment-1"}}

# Copy state to new thread
app.update_state(
    new_config,
    fork_point.values,
    as_node=fork_point.next[0] if fork_point.next else None
)

# Run alternative path
result = app.invoke({"alternative_approach": True}, new_config)

Use Cases:

  • A/B testing different approaches
  • Debugging without affecting main flow
  • "What-if" analysis

Production Pattern: Checkpoint Inspection UI

def get_workflow_timeline(thread_id: str) -> list[dict]:
    """Build timeline for debugging UI."""
    config = {"configurable": {"thread_id": thread_id}}
    history = list(app.get_state_history(config))

    timeline = []
    for checkpoint in history:
        timeline.append({
            "step": checkpoint.metadata.get("step"),
            "timestamp": checkpoint.metadata.get("timestamp"),
            "node": checkpoint.next[0] if checkpoint.next else "END",
            "state_summary": {
                "documents_count": len(checkpoint.values.get("documents", [])),
                "has_analysis": bool(checkpoint.values.get("analysis")),
                "iteration": checkpoint.values.get("iteration", 0)
            },
            "config": checkpoint.config  # For resume/fork
        })

    return timeline

Interview Questions

Q: How does time-travel debugging work in LangGraph?

"LangGraph stores complete state at each checkpoint. Using get_state_history, you can list all checkpoints for a thread. Each checkpoint has full state, metadata, and config. You can resume from any checkpoint by invoking with that checkpoint's config, or fork to a new thread for experimentation."

Q: Difference between resume and fork?

"Resume continues the same thread from its last checkpoint—the workflow picks up where it left off. Fork creates a new thread starting from a specific checkpoint—the original thread is unchanged. Use resume for crash recovery; fork for A/B testing or 'what-if' analysis."


Key Takeaways

Same thread_id = automatic resume from last checkpoint ✅ get_state inspects current state and next nodes ✅ get_state_history enables time-travel to any checkpoint ✅ update_state for human-in-the-loop modifications ✅ Fork for experimentation without affecting main flow

:::

Quiz

Module 3: Checkpointing & Persistence

Take Quiz