gh-114264: Optimize performance of copy.deepcopy by adding a fast path for atomic types #114266

eendebakpt · 2024-01-18T21:32:44Z

By adding a fast path for atomic types we eliminate the overhead of checking the memo argument of deepcopy and the overhead of calling the dispatch method.

Benchmark:

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [pr] 4.65 us +- 0.06 us: 1.48x faster
deepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [pr] 5.23 us +- 0.07 us: 1.27x faster
deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [pr] 924 ns +- 33 ns: 1.16x faster

Geometric mean: 1.30x faster

Benchmark script:

import pyperf
runner = pyperf.Runner()

setup="""
import copy

a={'list': [1,2,3,43], 't': (1,2,3), 'str': 'hello', 'subdict': {'a': True}}

from dataclasses import dataclass

@dataclass
class A:
    a : list
    b : str
    c : bool
    
dc=A([1,2,3], 'hello', True)

small_tuple = (1, )
"""

runner.timeit(name=f"deepcopy dict", stmt=f"b=copy.deepcopy(a)", setup=setup)
runner.timeit(name=f"deepcopy dataclass", stmt=f"b=copy.deepcopy(dc)", setup=setup)
runner.timeit(name=f"deepcopy small tuple", stmt=f"b=copy.deepcopy(small_tuple)", setup=setup)

The approach is similar to the one used in #103005

Issue: Optimize performance of copy.deepcopy by adding a fast path for atomic types #114264

…c types

sunmy2019 · 2024-01-19T06:59:22Z

Lib/copy.py

+    cls = type(x)
+
+    if cls in _atomic_types:
+        return x
+


Adding a fast path here would cause a potential behavioral change if memo is involved. This should be mentioned in NEWS.

The safer way might be simply keep the original behavior - put the fast path after memo. This is a user observable change and it might matter. For example, if the user is using the memo dict to keep track of all the objects copied.

It would also be interested to see the benchmark if we keep the memo as it is - how much performance gain is from not updating memo?

Putting the fast path after the memo results in the following (main vs. the variation):

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [pr_v2] 6.06 us +- 0.08 us: 1.13x faster deepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [pr_v2] 6.43 us +- 0.17 us: 1.03x faster deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [pr_v2] 1.01 us +- 0.02 us: 1.06x faster Geometric mean: 1.08x faster

Still worthwhile, but a smaller improvement.

I am trying to find a case where the behavior changes (so I can also add a test). But the atomic types are not added to the memo:

... # If is its own copy, don't memoize. if y is not x: memo[d] = y _keep_alive(x, memo) # Make sure x lives at least as long as d ...

The only case I could think of where behaviour changes is when users supply their own memo, and then the behavior change (different id for the same string) could be considered an implementation details:

import copy # large str, so it is not interned s='sds' * 12312312 s2='sds' * 12312312 print(id(s), id(s2)) # different id's memo= {id(s2): s} t=copy.deepcopy(s2, memo) print(id(t), id(s), id(s)==id(t), id(s2)) # with this PR id(s) and id(t) are not equal, although s and t are equal as strings

Are there any cases where behavior changes that I am missing?

I wonder how the times will change if use if copier is _deepcopy_atomic instead of if cls in _atomic_types? You can also try to use different special value for example ... instead of _deepcopy_atomic to avoid lookup in globals.

@serhiy-storchaka That is an interesting suggestion. A microbenchmark shows getting the copier is not much more expensive than the cls in _atomic_types check:

%timeit cls in _atomic_types 43.7 ns ± 0.0706 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) %timeit _deepcopy_dispatch.get(cls) 71.5 ns ± 0.0873 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Using a special value for the copier can be done after the memo check. This is quite close to current main and has performance

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [v3] 6.36 us +- 0.11 us: 1.08x faster deepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [v3] 6.26 us +- 0.05 us: 1.06x faster deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v3] 1.02 us +- 0.01 us: 1.05x faster Geometric mean: 1.06x faster

(implementation is: main...eendebakpt:deepcopy_atomic_types_v3)

Using the same approach before the memo check has performance:

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [v4] 4.96 us +- 0.08 us: 1.39x faster deepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [v4] 5.38 us +- 0.06 us: 1.23x faster deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v4] 931 ns +- 10 ns: 1.15x faster Geometric mean: 1.25x faster

(implementation: main...eendebakpt:deepcopy_atomic_types_v4)

But the atomic types are not added to the memo:

Ah you are right, I missed that. If that, then the impact is much smaller than I thought and I think it's okay to skip the memo part.

I am okay with both solutions, provided the memo change is mentioned (if there is one). Users can use this function in surprising ways, and it's a de facto public API.

Thank you for satisfying my curiosity. I see that there is a non-trivial benefit from using _atomic_types.

Could you please make benchmarks for a collection, containing a large number of non-atomic identical objects, e.g. [[1]]*100, for different variants of the code? I expect that some variants can have regression, but if it is not great, we can accept this.

I also wonder whether the same approach (with an _immutable_types set) should be used in copy.copy(). Even if it does not use memo, there may be some benefit, and it could be better for uniformity of the code.

A more extensive benchmark script:

import pyperf runner = pyperf.Runner() setup=""" import copy a={'list': [1,2,3,43], 't': (1,2,3), 'str': 'hello', 'subdict': {'a': True}} from dataclasses import dataclass @dataclass class A: a : list b : str c : bool dc=A([1,2,3], 'hello', True) @dataclass class A: a : int dc_small = A(123) small_tuple = (1, ) l = {'hi': 100} repeating_atomic = [ [1] * 100] repeating = [dc_small] * 100 """ runner.timeit(name="deepcopy dict", stmt=f"b=copy.deepcopy(a)", setup=setup) runner.timeit(name="deepcopy dataclass", stmt=f"b=copy.deepcopy(dc)", setup=setup) runner.timeit(name="deepcopy small dataclass", stmt=f"b=copy.deepcopy(dc_small)", setup=setup) runner.timeit(name="deepcopy small tuple", stmt=f"b=copy.deepcopy(small_tuple)", setup=setup) runner.timeit(name="deepcopy repeating", stmt=f"b=copy.deepcopy(repeating)", setup=setup) runner.timeit(name="deepcopy repeating_atomic", stmt=f"b=copy.deepcopy(repeating_atomic)", setup=setup)

Comparison of the three alternative versions to main. This PR:

deepcopy dict: Mean +- std dev: [main] 6.82 us +- 0.07 us -> [pr] 4.68 us +- 0.08 us: 1.46x faster deepcopy dataclass: Mean +- std dev: [main] 6.59 us +- 0.07 us -> [pr] 5.23 us +- 0.07 us: 1.26x faster deepcopy small dataclass: Mean +- std dev: [main] 4.30 us +- 0.07 us -> [pr] 3.88 us +- 0.08 us: 1.11x faster deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [pr] 925 ns +- 12 ns: 1.16x faster deepcopy repeating: Mean +- std dev: [main] 21.8 us +- 0.2 us -> [pr] 24.9 us +- 0.4 us: 1.15x slower deepcopy repeating_atomic: Mean +- std dev: [main] 24.6 us +- 0.5 us -> [pr] 10.1 us +- 0.2 us: 2.44x faster Geometric mean: 1.31x faster

main...eendebakpt:deepcopy_atomic_types_v3

deepcopy dict: Mean +- std dev: [main] 6.82 us +- 0.07 us -> [v3] 6.34 us +- 0.22 us: 1.08x faster deepcopy dataclass: Mean +- std dev: [main] 6.59 us +- 0.07 us -> [v3] 6.21 us +- 0.11 us: 1.06x faster deepcopy small dataclass: Mean +- std dev: [main] 4.30 us +- 0.07 us -> [v3] 4.16 us +- 0.04 us: 1.03x faster deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v3] 1.03 us +- 0.02 us: 1.03x faster deepcopy repeating: Mean +- std dev: [main] 21.8 us +- 0.2 us -> [v3] 22.0 us +- 0.6 us: 1.01x slower deepcopy repeating_atomic: Mean +- std dev: [main] 24.6 us +- 0.5 us -> [v3] 21.2 us +- 0.6 us: 1.16x faster Geometric mean: 1.06x faster

main...eendebakpt:deepcopy_atomic_types_v4

deepcopy dict: Mean +- std dev: [main] 6.82 us +- 0.07 us -> [v4] 4.93 us +- 0.05 us: 1.38x faster deepcopy dataclass: Mean +- std dev: [main] 6.59 us +- 0.07 us -> [v4] 5.35 us +- 0.06 us: 1.23x faster deepcopy small dataclass: Mean +- std dev: [main] 4.30 us +- 0.07 us -> [v4] 3.89 us +- 0.05 us: 1.11x faster deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v4] 938 ns +- 25 ns: 1.14x faster deepcopy repeating: Mean +- std dev: [main] 21.8 us +- 0.2 us -> [v4] 26.4 us +- 0.5 us: 1.21x slower deepcopy repeating_atomic: Mean +- std dev: [main] 24.6 us +- 0.5 us -> [v4] 12.4 us +- 0.4 us: 1.99x faster Geometric mean: 1.23x faster

Some notes:

In this PR we avoid the cost of looking into the memo for atomic types which provides the main speedup. This comes at a performance degradation for objects containing many identical (same id) values. Based on the numbers above there is no peformance reason to pick version v4. For clarify or uniformity of the code it would not hurt a lot though to replace the pr with v4. I think one could construct benchmarks where v3 is faster than this pr (thinking about non-atomic types which have a custom __copy__ and recursive calls to deepcopy such as numpy arrays), but I expect differences to be small. Version v3 has the same performance on the "deepcopy repeating" as main (the 1.01x slower is a random fluctuation I believe, on average it will be 1.00x) and a small performance improvement for some other cases.

I also considered aligning the implementations of copy.copy and copy.deepcopy (for uniformity of the code), but decided against this initially. My line of thought:

There is no memo that can be skipped (which was the main performance benefit)

Calls of copy.copy on atomic types should be relative rare (and cheap anyway), so there is less to gain.

For calls on container types (like list, dict or set) there are no recursive calls to the components. This is in contrast with deepcopy where are call on a dataclass can recurse into atomic-types.

I have not updated the news entry yet with a description of the behaviour changes, because I think the behavior changes are not really something one can notice with normal use of deepcopy.
i) The first behavior change is that the number of invocations of memo.get is less. But memo.get is a read-only operation (on normal dicts), so this is not something visible from the public interface.
ii) In the example given above the memo passed is memo= {id(s2): s}. This memo is not really a valid memo, since we require for each key-value pair in the memo id(value)==key, which is not true in the example.
If desired I can add this to the news entry though.

The results above are only micro benchmarks and it will depend on the application which version will perform best. Applications where I have found the deepcopy to take a significant amount of time are lmfit, qiskit and quantify-scheduler, but I cannot run those with current main (I could with 3.12 though).

There is another PR improving the speed of deepcopy (gh-72793: C implementation of parts of copy.deepcopy #91610). That approach converts the python implementation to C but is more complex and will take time to review (and might not be accepted).

Lib/copy.py

… into deepcopy_atomic_types

gaogaotiantian

I think this is pretty straightforward and we can get some easy gain.

Lib/copy.py

gaogaotiantian · 2024-05-01T23:20:19Z

I think the current solution is simple and effective. The benchmark shows good results. I did not see a serious downside. All the exsiting tests passed. You'll need a core dev's approval to merge it in though.

erlend-aasland

LGTM, but let's await Serhiy's thumbs-up for this, since he had remarks earlier.

eendebakpt · 2024-05-02T11:02:35Z

The _nil argument of deepcopy can be removed with this change (None works as a sentinel since the only keys for the memo are of type int):

diff --git a/Lib/copy.py b/Lib/copy.py
index 7a1907d754..4566e47a0e 100644
--- a/Lib/copy.py
+++ b/Lib/copy.py
@@ -115,7 +115,7 @@ def _copy_immutable(x):

 del d, t

-def deepcopy(x, memo=None, _nil=[]):
+def deepcopy(x, memo=None):
     """Deep copy operation on arbitrary Python objects.

     See the module's __doc__ string for more info.
@@ -130,8 +130,8 @@ def deepcopy(x, memo=None, _nil=[]):
     if memo is None:
         memo = {}
     else:
-        y = memo.get(d, _nil)
-        if y is not _nil:
+        y = memo.get(d, None)
+        if y is not None:
             return y

     copier = _deepcopy_dispatch.get(cls)

Performance should be a bit better (since there is no assignment to _nil), but the impact is small (to the extend that it is hard to measure). It does remove the _nil argument which is nice, but would add a little bit of churn to the PR. Unless everyone is +1 on this change I will leave it out.

erlend-aasland · 2024-06-06T19:26:15Z

@serhiy-storchaka, do you have time for a last look at this? It would be nice with your thumbs-up before I proceed.

serhiy-storchaka

LGTM.

…st path for atomic types (pythonGH-114266)

Optimize performance of copy.deepcopy by adding a fast path for atomi…

9990245

…c types

bedevere-app bot added the awaiting review label Jan 18, 2024

bedevere-app bot mentioned this pull request Jan 18, 2024

Optimize performance of copy.deepcopy by adding a fast path for atomic types #114264

Closed

eendebakpt and others added 3 commits January 18, 2024 22:41

match order with main

06f1980

📜🤖 Added by blurb_it.

dd9fabf

format news entry

becbb59

sunmy2019 reviewed Jan 19, 2024

View reviewed changes

gaogaotiantian reviewed Jan 19, 2024

View reviewed changes

Lib/copy.py Show resolved Hide resolved

eendebakpt added 7 commits January 19, 2024 11:20

put d variable back

339c2b9

Merge branch 'deepcopy_atomic_types' of github.com:eendebakpt/cpython…

89ecad1

… into deepcopy_atomic_types

Merge branch 'main' into deepcopy_atomic_types

9aacf87

Merge branch 'main' into deepcopy_atomic_types

5197f94

Merge branch 'main' into deepcopy_atomic_types

18914f3

Merge branch 'main' into deepcopy_atomic_types

8f294c7

Merge branch 'main' into deepcopy_atomic_types

3b57fc1

eendebakpt requested a review from sunmy2019 March 18, 2024 22:55

Merge branch 'main' into deepcopy_atomic_types

53d63a7

eendebakpt mentioned this pull request Apr 4, 2024

gh-72793: C implementation of parts of copy.deepcopy #91610

Open

eendebakpt added 6 commits April 13, 2024 23:23

improve performance of _deepcopy_list with a list comprehension

eb8e6ba

avoid generation of exception in _deepcopy_tuple

e468b6c

Merge branch 'main' into deepcopy_atomic_types

2b4abe6

revert comprehension changes

e3310f5

Merge branch 'deepcopy_atomic_types' of github.com:eendebakpt/cpython…

1186a5b

… into deepcopy_atomic_types

Merge branch 'main' into deepcopy_atomic_types

ce7273b

eendebakpt requested a review from serhiy-storchaka April 20, 2024 07:00

eendebakpt requested a review from gaogaotiantian April 29, 2024 06:28

gaogaotiantian reviewed May 1, 2024

View reviewed changes

Lib/copy.py Outdated Show resolved Hide resolved

eendebakpt commented May 1, 2024

View reviewed changes

Lib/copy.py Outdated Show resolved Hide resolved

eendebakpt added 2 commits May 2, 2024 00:00

Update Lib/copy.py

4bc74d3

Merge branch 'main' into deepcopy_atomic_types

d69d8da

eendebakpt commented May 1, 2024

View reviewed changes

Lib/copy.py Outdated Show resolved Hide resolved

eendebakpt commented May 1, 2024

View reviewed changes

Lib/copy.py Outdated Show resolved Hide resolved

Apply suggestions from code review

c66b24d

gaogaotiantian approved these changes May 1, 2024

View reviewed changes

bedevere-app bot added awaiting core review and removed awaiting review labels May 1, 2024

erlend-aasland approved these changes May 2, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels May 2, 2024

eendebakpt added 2 commits May 6, 2024 19:59

Merge branch 'main' into deepcopy_atomic_types

142b0fc

Merge branch 'main' into deepcopy_atomic_types

d6fa670

serhiy-storchaka approved these changes Jun 7, 2024

View reviewed changes

serhiy-storchaka merged commit 9d66042 into python:main Jun 7, 2024
33 checks passed

bedevere-app bot removed the awaiting merge label Jun 7, 2024

This was referenced Jun 30, 2024

Add fast path to deepcopy() for empty list/tuple/dict/set #121192

Open

gh-121192: Add fast path to deepcopy() for empty list/tuple/dict/set #121193

Open

noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024

pythongh-114264: Optimize performance of copy.deepcopy by adding a fa…

c3f3428

…st path for atomic types (pythonGH-114266)

estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024

pythongh-114264: Optimize performance of copy.deepcopy by adding a fa…

e23d38f

…st path for atomic types (pythonGH-114266)

This was referenced Dec 20, 2024

Improve performance of copy.copy #128118

Closed

gh-128118: Improve performance of copy.copy by using a fast lookup for atomic and container types #128119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-114264: Optimize performance of copy.deepcopy by adding a fast path for atomic types #114266

gh-114264: Optimize performance of copy.deepcopy by adding a fast path for atomic types #114266

eendebakpt commented Jan 18, 2024 •

edited by bedevere-app bot

Loading

sunmy2019 Jan 19, 2024

gaogaotiantian Jan 19, 2024

eendebakpt Jan 19, 2024

serhiy-storchaka Jan 19, 2024

eendebakpt Jan 19, 2024

gaogaotiantian Jan 19, 2024

sunmy2019 Jan 20, 2024

serhiy-storchaka Jan 20, 2024

eendebakpt Jan 22, 2024

gaogaotiantian left a comment

gaogaotiantian commented May 1, 2024

erlend-aasland left a comment

eendebakpt commented May 2, 2024

erlend-aasland commented Jun 6, 2024

serhiy-storchaka left a comment

gh-114264: Optimize performance of copy.deepcopy by adding a fast path for atomic types #114266

gh-114264: Optimize performance of copy.deepcopy by adding a fast path for atomic types #114266

Conversation

eendebakpt commented Jan 18, 2024 • edited by bedevere-app bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaogaotiantian left a comment

Choose a reason for hiding this comment

gaogaotiantian commented May 1, 2024

erlend-aasland left a comment

Choose a reason for hiding this comment

eendebakpt commented May 2, 2024

erlend-aasland commented Jun 6, 2024

serhiy-storchaka left a comment

Choose a reason for hiding this comment

eendebakpt commented Jan 18, 2024 •

edited by bedevere-app bot

Loading