gh-148653: refactored marshal for cycle safety and performance by mjbommar · Pull Request #148700 · python/cpython

mjbommar · 2026-04-17T18:01:55Z

This an experimental rewrite of marshal based on conversation with @serhiy-storchaka in #148652 and #148653.

Includes a number of extra docs and data generators that are provided only for reference during discussion.

So far it's green for me on test suite but need to dig in further and assume @serhiy-storchaka will have better intuition than me on any behavior or performance regressions.

In my first attempt, we hit minor single digit performance regressions in the loads path, unsurprisingly concentrated in the complicated cases.

Second attempt with improved performance coming in an hour or so.

(edit: now faster than HEAD with real performance and correctness gains)

Assisted by GPT-5.4 xhigh and Opus 4.7

Issue: Crash: marshal.loads SIGSEGV on self-referencing TYPE_TUPLE with FLAG_REF #148653

…ples

Replace the PyList-backed reference table with a raw growable PyObject ** array, and encode REF_STATE_INCOMPLETE_HASHABLE in the low bit of each ref pointer so the parallel state-byte allocation is gone. Also: - drop the allow_incomplete_hashable parameter from r_object; it lives on RFILE now, auto-reset on entry, flipped via a wrapper at the two list-element / dict-value sites. - force-inline the r_ref_* helpers so the compiler can fold the if (flag) guards into the callers as the original R_REF macro did. Misc/marshal-perf-diary.md records the full experiment ledger: each idea tested in isolation, results, and the combined stack. Benchmark harness is /tmp/marshal_bench_cpu_stable.py (200k loads x 11 repeats, taskset -c 0, best-of-3 pinned-run median). Combined deltas vs main on loads: small_tuple 14.3% faster nested_dict 6.9% faster code_obj 6.8% faster dumps is roughly flat to slightly faster. test_marshal passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Appends to Misc/marshal-perf-diary.md the results of the full test suite rerun (48,932 tests pass, including the new RecursiveGraphTest combinatoric cases) and a `pyperformance` comparison against main on the same 10-benchmark marshal-adjacent slice the design doc used. Significant results on the pyperformance slice: python_startup 1.18x faster (t=59.80) python_startup_no_site 1.03x faster (t=12.90) All other slice benchmarks within noise; no regressions. Adds Misc/marshal-perf-data/ with the raw JSON backing every table in the diary: all per-experiment microbench runs (exp0..exp9, expC, final) plus the two pyperf-slice JSONs and a README describing the layout and reproduction commands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mjbommar · 2026-04-17T18:52:55Z

@serhiy-storchaka @StanFromIreland - we're a long way from Kansas now, but I think the end result of this refactor is something much bigger: an 15-20% speedup in Python startup and a ~7-14% speedup in marshal.loads after the refactor.

(edit: the startup speed looks like a fixed 1-3ms saving, so that % is for bare startup)

I'm going to try running a few more complex tests like running some of my real apps or pipelines through this build to see if I can find regressions, but wanted to share.

I have no experience with big CPython changes like this but hopefully I didn't do anything too crazy here from an ABI or compatibility perspective.

Update: Confirmed that dill and cloudpickle test suites are unchanged (no new failures compared to 3.15 HEAD) and successfully ran a bunch of other day to day stuff without issue.

Records the outcome of an independent-library validation pass: - dill 0.4.1 test suite (30 files) — identical 29/30 pass on baseline and HEAD; the single failure is a pre-existing 3.15a8 incompatibility in dill's module-state serialization, unrelated to marshal. - cloudpickle 3.1.2 test suite (upstream) — 243/243 pass on both, identical skip/xfail breakdown. - 1,601 marshal-adjacent stdlib tests (test_importlib, test_zipimport, test_compileall, test_py_compile, test_marshal) all pass on HEAD. - compileall of CPython Lib/: +1.0% (within noise; dumps path untouched). - Cold-import stress (56 stdlib modules, fresh subprocess): flat. - Hypothesis fuzz (3500 random round-trips including cyclic shapes through mutable bridges): zero correctness regressions; acyclic round-trip -10%, list self-cycle -24%, dict value self-cycle -40%. Nothing in the third-party validation hints at a correctness or performance regression; several workloads that directly exercise the changed code path are measurably faster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wulian233 · 2026-04-17T22:48:16Z

Please remove your json and md files

mjbommar · 2026-04-17T22:52:04Z

Please remove your json and md files

Sorry about that. What's the right way to share empirical benchmark data and experimental notes if they get to be too long for GH comments? I think @serhiy-storchaka and others will want to see the full set of micro-benchmarks I ran across all experiments.

mjbommar added 3 commits April 16, 2026 13:40

pythongh-148653: Fix SIGSEGV in marshal.loads for self-referencing tu…

7c214ea

…ples

marshal: preserve safe recursive hashable refs

3835149

marshal: add benchmark summary note

eb1c4b7

mjbommar requested review from markshannon and methane as code owners April 17, 2026 18:01

bedevere-app bot added the awaiting review label Apr 17, 2026

mjbommar mentioned this pull request Apr 17, 2026

Crash: marshal.loads SIGSEGV on self-referencing TYPE_TUPLE with FLAG_REF #148653

Open

mjbommar changed the title ~~gh-148652: refactored marshal for cycle safety and performance~~ gh-148653: refactored marshal for cycle safety and performance Apr 17, 2026

mjbommar mentioned this pull request Apr 17, 2026

gh-148653: Fix SIGSEGV in marshal.loads for self-referencing tuples #148652

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-148653: refactored marshal for cycle safety and performance#148700

gh-148653: refactored marshal for cycle safety and performance#148700
mjbommar wants to merge 6 commits intopython:mainfrom
mjbommar:marshal-safe-cycle-design

mjbommar commented Apr 17, 2026 •

edited

Loading

Uh oh!

mjbommar commented Apr 17, 2026 •

edited

Loading

Uh oh!

Wulian233 commented Apr 17, 2026

Uh oh!

mjbommar commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mjbommar commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjbommar commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wulian233 commented Apr 17, 2026

Uh oh!

mjbommar commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mjbommar commented Apr 17, 2026 •

edited

Loading

mjbommar commented Apr 17, 2026 •

edited

Loading