Checkpointing & Persistence
Resuming Workflows & Time-Travel Debugging
Resume After Interruption
# Initial run
config = {"configurable": {"thread_id": "research-task-1"}}
result = app.invoke({"query": "AI trends 2026"}, config)
# ... system crashes or user closes browser ...
# Resume later - same thread_id
result = app.invoke({"continue": True}, config)
# Picks up from last checkpoint automatically
get_state: Inspect Current State
# Get current checkpoint
state = app.get_state(config)
print(f"Current values: {state.values}")
print(f"Next nodes: {state.next}")
print(f"Metadata: {state.metadata}")
# Example output:
# Current values: {'query': 'AI trends', 'documents': [...], 'iteration': 3}
# Next nodes: ('analyze',)
# Metadata: {'step': 5, 'timestamp': '2026-01-15T10:30:00'}
get_state_history: Time-Travel
# List all checkpoints for this thread
history = list(app.get_state_history(config))
for i, checkpoint in enumerate(history):
print(f"Step {i}: {checkpoint.metadata.get('step')}")
print(f" State: {checkpoint.values}")
print(f" Next: {checkpoint.next}")
# Time-travel: Resume from specific checkpoint
old_config = history[3].config # 4th checkpoint
result = app.invoke({"continue": True}, old_config)
update_state: Modify State Mid-Workflow
# Get current state
state = app.get_state(config)
# Human review finds issue - update state
app.update_state(
config,
{
"documents": state.values["documents"] + ["New important doc"],
"needs_reanalysis": True
}
)
# Continue with modified state
result = app.invoke({"continue": True}, config)
Use Cases:
- Human-in-the-loop corrections
- Injecting external data
- Debugging by modifying state
Forking: Branch from Checkpoint
# Get checkpoint to fork from
history = list(app.get_state_history(config))
fork_point = history[5] # Fork from step 5
# Create new thread from that checkpoint
new_config = {"configurable": {"thread_id": "fork-experiment-1"}}
# Copy state to new thread
app.update_state(
new_config,
fork_point.values,
as_node=fork_point.next[0] if fork_point.next else None
)
# Run alternative path
result = app.invoke({"alternative_approach": True}, new_config)
Use Cases:
- A/B testing different approaches
- Debugging without affecting main flow
- "What-if" analysis
Production Pattern: Checkpoint Inspection UI
def get_workflow_timeline(thread_id: str) -> list[dict]:
"""Build timeline for debugging UI."""
config = {"configurable": {"thread_id": thread_id}}
history = list(app.get_state_history(config))
timeline = []
for checkpoint in history:
timeline.append({
"step": checkpoint.metadata.get("step"),
"timestamp": checkpoint.metadata.get("timestamp"),
"node": checkpoint.next[0] if checkpoint.next else "END",
"state_summary": {
"documents_count": len(checkpoint.values.get("documents", [])),
"has_analysis": bool(checkpoint.values.get("analysis")),
"iteration": checkpoint.values.get("iteration", 0)
},
"config": checkpoint.config # For resume/fork
})
return timeline
Interview Questions
Q: How does time-travel debugging work in LangGraph?
"LangGraph stores complete state at each checkpoint. Using get_state_history, you can list all checkpoints for a thread. Each checkpoint has full state, metadata, and config. You can resume from any checkpoint by invoking with that checkpoint's config, or fork to a new thread for experimentation."
Q: Difference between resume and fork?
"Resume continues the same thread from its last checkpoint—the workflow picks up where it left off. Fork creates a new thread starting from a specific checkpoint—the original thread is unchanged. Use resume for crash recovery; fork for A/B testing or 'what-if' analysis."
Key Takeaways
✅ Same thread_id = automatic resume from last checkpoint ✅ get_state inspects current state and next nodes ✅ get_state_history enables time-travel to any checkpoint ✅ update_state for human-in-the-loop modifications ✅ Fork for experimentation without affecting main flow
:::