Optimise overload consistency checks for large numbers of literal-containing overloads#21218
Optimise overload consistency checks for large numbers of literal-containing overloads#21218pelson wants to merge 3 commits intopython:masterfrom
Conversation
…n^2) consistency check. Add a fast-path at the top of the inner loop.
…ecking by preparing the list of valid items
bbac93f to
8125e31
Compare
This comment has been minimized.
This comment has been minimized.
…ents build_literal_fingerprint included optional arguments (ARG_OPT / ARG_NAMED_OPT) in the fingerprint used to detect disjoint overloads. When two overloads differed only in an optional Literal arg (e.g. as_index: Literal[True] vs Literal[False]), the fast-path treated them as mutually exclusive and skipped the full overlap check - even though a caller can omit the argument entirely, matching both overloads through their other arguments. Fix by only fingerprinting required arguments (ARG_POS and ARG_NAMED). Optional arguments can always be omitted by the caller, so they can never prove two overloads are disjoint. The real-world symptom was pandas-stubs gaining spurious unused-ignore errors: overloads that master correctly flagged as overlapping were silently skipped by the fast-path, making existing type: ignore redundant.
This was a genuine issue - there was no existing test coverage of the case where optional arguments overlapped (e.g. Note that the test that I added to cover this is also highlighting a problem with the |
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
|
Thanks for the PR! We are busy preparing for the mypy 2.0 release, so reviews may take a bit longer, but performance improvements are highly appreciated. |
Fixes #10004
Introduce an optimisation to reduce the number of calls to (the expensive)
overload_can_never_matchwithin the inner (O(N^2)) loop ofcheck_overlapping_overloadsin the case where the overloads contain literals.The approach is to track all exclusively literal argument types, and to be able to quickly determine if two signatures can't possibly overlap.
For example, given:
We can know by looking at the literals alone that there are no overlapping overloads, so the expensive pairwise check is skipped except for the last override. When the optimisation doesn't apply, we fall back to the existing (and complete) logic.
Please note that I have used Claude to help me find the key parts to change, and to help with the test validation. That said, this was a significant effort on my part to produce something that was tightly scoped and consistent with the approach I saw in
checker.py.In terms of performance:
Script to compute the runtime
Here is the script I used to compute the above runtime (fully Claude generated)
I had also prototyped an algorithmic simplification to reduce the
O(N^2)pairwise search in the case where there are many literals of the same type. This did have a reduction in the runtime, but it was not sufficiently large to justify the additional complexity, and I consider the implementation here to strike a good balance between the two.