LangGraph checkpoint round-trips silently drop typed state — four confirmed failure modes
LangGraph checkpoint round-trips silently drop typed state — four confirmed failure modes
What you expect
LangGraph’s checkpointing system is designed to persist and restore agent state reliably across interrupts and resumptions. When you store typed state (Pydantic models, Python Enums, custom classes) in a LangGraph checkpoint, you expect the state to be faithfully restored on resume — same values, same types.
GraphRAG’s v3 restructure (Jan 2026) is expected to be a performance-neutral refactor: dropping NetworkX for DataFrames reorganizes dependencies without regressing throughput.
What actually happens
LangGraph: four silent checkpoint serialization failures
Four distinct silent failure modes have been open in LangGraph since January 2026 (all confirmed in open bugs as of 2026-02-28, reviewed against v1.0.10):
- JsonPlusSerializer null-on-failure (bug #6970, 2026-02-28): deserialization failure replaces values with
None— no exception raised, no warning emitted. - StrEnum coerced to str (bug #6598, Jan 2026):
StrEnumvalues silently become plain strings after a checkpoint round-trip — type information lost. - Nested Enum fields become None (bug #6718, Feb 2026): nested Enum fields in checkpoint state deserialize as
Nonerather than raising an error. - BinaryOperatorAggregate wrapper leak (bug #6909, 2026-02-27): when a channel starts
MISSING,BinaryOperatorAggregatewithOverwritereturns the wrapper object rather than the unwrapped payload.
In each case, the graph continues executing on corrupted state with no observable signal that data was lost.
LangGraph: interrupt state snapshot reports paused graph as complete
Bug #6956 (open, 2026-02-27): get_state().next returns an empty tuple () after resuming from the first of two interrupt() calls in the same node. The graph is still paused — but the snapshot reports it as complete. Code checking state.next to determine whether execution should continue will silently misread a paused graph as finished. Human-in-the-loop workflows with chained interrupts are directly affected.
GraphRAG v3: performance regression and entity dedup bug
Issue #2250 (open, 2026-02-26): GraphRAG v3 (current: v3.0.5) is “extremely slow compared to v2.” The regression is unresolved. v3 removed NetworkX and moved to DataFrame-based graph utilities — the restructure introduced the regression, and projects that benchmarked on v2 must re-benchmark before deploying v3.
Issue #1718 (open, marked fatal): entities with identical names but different semantic types are merged into a single graph node. Multi-hop reasoning across type-differentiated entities produces hallucinated or incorrect answers. The fix requires deduplication by (name, type) tuple, not name alone. PR #2234 (open) addresses same-entity-different-name fragmentation (“Ahab” vs “Captain Ahab”) but is orthogonal to this bug.
Haystack: in-place dataclass mutation
Issue #10702 (open, 2026-02-28): Haystack components mutate dataclasses in-place during pipeline execution. Concurrent Haystack pipelines sharing state objects can see cross-contamination between runs.
What this means for you
If you use LangGraph with typed state — Pydantic models, Enums, custom classes — your checkpoint round-trips are silently lossy. The agent continues on corrupted state without raising an exception. This is not a rare edge case: StrEnum is a common Python type, and Enum fields in Pydantic models are idiomatic LangGraph state.
The interrupt snapshot bug affects any human-in-the-loop workflow that chains two interrupt() calls in a single node — state.next is the standard way to check whether a graph needs further input, and it returns the wrong answer in this case.
The GraphRAG regression means any performance measurement done on v2 is invalid for v3.0.5. If your pipeline needs v3’s DataFrame pipeline, re-benchmark before deploying.
What to do
- LangGraph typed state: validate checkpoint state after every resume. Add assertions or Pydantic validators that check expected types before using state values. Do not rely on LangGraph to raise an error if types are corrupted.
- LangGraph chained interrupts: do not use
state.next == ()as a completion signal if your nodes use multipleinterrupt()calls. Use explicit completion flags in state instead. - GraphRAG: run your own performance benchmark before migrating from v2 to v3. Issue #2250 has no fix ETA. If you depend on entity-type disambiguation, issue #1718 is unfixed in v3.0.5.
- Haystack concurrent pipelines: copy state objects before passing them into pipeline runs — do not share mutable state across concurrent executions.
Falsification criterion: This finding would be disproved by LangGraph releasing a checkpoint serialization test suite that catches all four failure modes and passes in CI, or by the four open issues (#6970, #6598, #6718, #6909) being closed as “not a bug” with a documented rationale.
Evidence
| Tool | Version | Evidence | Result |
|---|---|---|---|
| langchain-ai/langgraph | v1.0.10 | source-reviewed | #6970: JsonPlusSerializer silently returns None on deserialization failure |
| langchain-ai/langgraph | v1.0.10 | source-reviewed | #6598: StrEnum values coerced to str after checkpoint round-trip |
| langchain-ai/langgraph | v1.0.10 | source-reviewed | #6718: nested Enum fields deserialize as None |
| langchain-ai/langgraph | v1.0.10 | source-reviewed | #6909: BinaryOperatorAggregate returns wrapper object instead of payload |
| langchain-ai/langgraph | v1.0.10 | source-reviewed | #6956: get_state().next returns empty tuple for paused graph with chained interrupts |
| microsoft/graphrag | v3.0.5 | source-reviewed | #2250: extreme performance regression vs v2 unresolved |
| microsoft/graphrag | v3.0.5 | source-reviewed | #1718: fatal entity dedup bug (same-name different-type entities merged) open |
| deepset-ai/haystack | v2.25.0 | source-reviewed | #10702: in-place dataclass mutation; concurrent pipelines can cross-contaminate |
Confidence: empirical — 4 tools reviewed across 8 open GitHub issues (source-reviewed). No runtime execution performed; evidence is source-reviewed only.
Strongest case against: These are all open GitHub issues, but open issues can be user errors (as happened with LangGraph’s conditional edge issues #4968/#4891/#4226, which were closed as Python syntax errors, not library bugs). The LangGraph serialization bugs were filed by multiple independent users across different failure modes, reducing the likelihood that all four are user errors. GraphRAG #1718 is marked “fatal” by maintainers. The strongest counterfactual is that some of these bugs are already fixed in versions post-v1.0.10/v3.0.5 — check the issues for resolution status before acting.
Open questions: Have any of the LangGraph serialization bugs been closed in versions after v1.0.10? What is the reproduction rate for the interrupt snapshot bug in real multi-interrupt workflows?
Seen different? Contribute your evidence — share a repro or counter-example and we’ll review it against this finding. Reader evidence is what keeps these findings accurate.