Skip to content

gh-142254: rewrite test_deepcopy_keepaliveand _dont_memo_immutable#142255

Closed
Bobronium wants to merge 8 commits intopython:mainfrom
Bobronium:don't-test-deepcopy-implementation-details
Closed

gh-142254: rewrite test_deepcopy_keepaliveand _dont_memo_immutable#142255
Bobronium wants to merge 8 commits intopython:mainfrom
Bobronium:don't-test-deepcopy-implementation-details

Conversation

@Bobronium
Copy link
Copy Markdown
Contributor

@Bobronium Bobronium commented Dec 4, 2025

@aisk aisk added the skip news label Dec 4, 2025
@serhiy-storchaka
Copy link
Copy Markdown
Member

While I applaud the attempt to make tests more implementation independing, I think that using sys.getrefcount() make them more implementation depending and more fragile. We cannot guarantee that the reference count is decreased immediately after removing a reference, that deepcopy() increases it exactly by 1, that the value returned by sys.getrefcount() exactly equals the number of references.

Using a weak reference would be more reliable, but the current code is the simplest code to test the asked behavior, so I do not think that we should change it until we need.

@sergey-miryanov
Copy link
Copy Markdown
Contributor

Maybe marking those tests as cpython_only is sufficient?

@Bobronium
Copy link
Copy Markdown
Contributor Author

@serhiy-storchaka, thanks for the feedback. I was able to get both tests working reliably on CPython 3.10-3.15 as well as PyPy 3.11. Using weakref for keepalive test, as you suggested, and counting individual calls for __deepcopy__ that returns self-ref for memo test.

@sergey-miryanov, I think it might make sense for test_deepcopy_dont_memo_immutable, since changes that it was introduced with were purely about performance and memory pressure, not correctness. So other implementations might deal with that any way they like.

That said, maybe we actually want to test for memory pressure in that case, using tracemalloc, for instance? I'm pretty sure I can get it to work reliably, but I don't have enough context whether that would be acceptable in terms of compatibility with other implementations.

Comment thread Lib/test/test_copy.py
Comment thread Lib/test/test_copy.py Outdated
Comment thread Lib/test/test_copy.py
Comment thread Lib/test/test_copy.py Outdated
@bedevere-app
Copy link
Copy Markdown

bedevere-app bot commented Dec 5, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Dec 5, 2025

If the test is really testing the CPython internals, I'd also suggest a @cpython_only decorator. That would make the test more readable and other implementations may test their own implementation.

Copy link
Copy Markdown
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see benefit of this change. The current code is simple and it works. Until we change the internal structure of memo, there is no need to change the tests.

cc @benjaminp

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Dec 5, 2025

Yes I agree (I left my original review before checking the previous reviews). However we coule mark it cpython_only if pypi & co are bothered by the failures (though I guess they are either skipping it or changing it already)

@serhiy-storchaka
Copy link
Copy Markdown
Member

I do not see anything that would cause a failure on PyPy in the current tests. PyPy uses the same implementation. The new tests are more fragile, they need special treatment to not fail on PyPy. And they are more complicated.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Dec 5, 2025

Oh no, what I meant is: keep the old test but use a cpython_only decorator. But if you want to force other implementations to follow the same implementation for memo it's also fine (though I don't know if there are other implementations that want to re-implement this)

@Bobronium
Copy link
Copy Markdown
Contributor Author

Bobronium commented Dec 5, 2025

I should clarify that when I mentioned other implementations in #142254, I meant other implementations of deepcopy() (e.g. from PyPI), which are orthogonal to Python implementations, yet also include potential future changes to deepcopy() in CPython.

For instance, here's a simple patch that improves deepcopy performance by 13-29% on 2 out of 3 pyperformance cases, but changes memo structure:

View changes
Subject: [PATCH] Store originals directly in memo
---
Index: Lib/copy.py
<+>UTF-8
===================================================================
diff --git a/Lib/copy.py b/Lib/copy.py
--- a/Lib/copy.py	(revision 93d6e7ed048f5dacf179b31bb37e68f74202e063)
+++ b/Lib/copy.py	(date 1764952431063)
@@ -122,9 +122,9 @@
     if memo is None:
         memo = {}
     else:
-        y = memo.get(d, None)
-        if y is not None:
-            return y
+        hit = memo.get(d, None)
+        if hit is not None:
+            return hit[0]
 
     copier = _deepcopy_dispatch.get(cls)
     if copier is not None:
@@ -158,8 +158,7 @@
 
     # If is its own copy, don't memoize.
     if y is not x:
-        memo[d] = y
-        _keep_alive(x, memo) # Make sure x lives at least as long as d
+        memo[d] = y, x
     return y
 
 _atomic_types = frozenset({types.NoneType, types.EllipsisType, types.NotImplementedType,
@@ -171,7 +170,7 @@
 
 def _deepcopy_list(x, memo, deepcopy=deepcopy):
     y = []
-    memo[id(x)] = y
+    memo[id(x)] = y, x
     append = y.append
     for a in x:
         append(deepcopy(a, memo))
@@ -183,7 +182,7 @@
     # We're not going to put the tuple in the memo, but it's still important we
     # check for it, in case the tuple contains recursive mutable structures.
     try:
-        return memo[id(x)]
+        return memo[id(x)][0]
     except KeyError:
         pass
     for k, j in zip(x, y):
@@ -197,7 +196,7 @@
 
 def _deepcopy_dict(x, memo, deepcopy=deepcopy):
     y = {}
-    memo[id(x)] = y
+    memo[id(x)] = y, x
     for key, value in x.items():
         y[deepcopy(key, memo)] = deepcopy(value, memo)
     return y
@@ -209,21 +208,6 @@
 
 del d
 
-def _keep_alive(x, memo):
-    """Keeps a reference to the object x in the memo.
-
-    Because we remember objects by their id, we have
-    to assure that possibly temporary objects are kept
-    alive by referencing them.
-    We store a reference at the id of the memo, which should
-    normally not be used unless someone tries to deepcopy
-    the memo itself...
-    """
-    try:
-        memo[id(memo)].append(x)
-    except KeyError:
-        # aha, this is the first one :-)
-        memo[id(memo)]=[x]
 
 def _reconstruct(x, memo, func, args,
                  state=None, listiter=None, dictiter=None,
@@ -233,7 +217,7 @@
         args = (deepcopy(arg, memo) for arg in args)
     y = func(*args)
     if deep:
-        memo[id(x)] = y
+        memo[id(x)] = y, x
 
     if state is not None:
         if deep:

python pyperformance/pyperformance/data-files/benchmarks/bm_deepcopy/run_benchmark.py

+-----------------+---------+-----------------------+
| Benchmark       | copy    | copy_new              |
+=================+=========+=======================+
| deepcopy        | 294 us  | 260 us: 1.13x faster  |
+-----------------+---------+-----------------------+
| deepcopy_reduce | 3.60 us | 2.78 us: 1.29x faster |
+-----------------+---------+-----------------------+
| Geometric mean  | (ref)   | 1.13x faster          |
+-----------------+---------+-----------------------+

My point is: by decoupling tests from implementation, we reduce friction for potential implementation improvements.
To be clear, the idea for the patch came only after changes in tests.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Dec 5, 2025

Considering we need to be pypi specific for this test to work with this additional support.gc_collect() call I think we should:

  • either keep the current implementation as is but mark the test as cpython-only. Other implementations are responsible for their own tests in this case.
  • don't change anything.

However, in both cases, I don't want the test to change in a way that is not entirely agnostic (that is, support.gc_collect() makes assumptions on how the GC works in other implementations).

@serhiy-storchaka
Copy link
Copy Markdown
Member

The current tests are not CPython-only. They are implementation-dependent (meaning the deepcopy implementation, not the interpreter implemetation), but there is no reason to think that the implementation will change in next 30 years. And if it changes, we can change the tests.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Dec 5, 2025

They are implementation-dependent (meaning the deepcopy implementation, not the interpreter implemetation)

I thought this warranted a CPython-only thing actually (that's how I understood the decorator). But I'm also fine with leaving them as is.

@eendebakpt
Copy link
Copy Markdown
Contributor

For instance, here's a simple patch that improves deepcopy performance by 13-29% on 2 out of 3 pyperformance cases, but changes memo structure:

Part of the performance gain is due to inlining the _keep_alive method, that can be done without changing the structure of the memo. The inlining is done in #123746 (where in addition the structure is changed for another reason).

I agree with the other devs that we can change the test if needed for a different implementation. Adding a decorator to signal this test is implementation specific would be nice.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Dec 7, 2025

Closing this one as we'll possibly consider the other PR with just the decorator (though I'm really neutral on the change).

@picnixz picnixz closed this Dec 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting changes skip news tests Tests in the Lib/test dir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants