Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-112075: Iterating a dict shouldn't require locks #115108

Merged
merged 7 commits into from
Feb 22, 2024

Conversation

DinoV
Copy link
Contributor

@DinoV DinoV commented Feb 6, 2024

Makes iteration of a dict be lock free for the forward iteration case.

Handles races against the dict as well as allowing the iterator to be used from multiple threads simultaneously.

Includes some of the shared object marking from #115109

@DinoV DinoV force-pushed the nogil_dict_iter_threadsafe branch 2 times, most recently from 1266799 to 8d3080b Compare February 6, 2024 22:47
@DinoV DinoV changed the title gh-112075: Accessing a single element should optimistically avoid locking gh-112075: Iterating a dict shouldn't require locks Feb 6, 2024
@DinoV DinoV force-pushed the nogil_dict_iter_threadsafe branch from 8d3080b to bf395f6 Compare February 6, 2024 22:54
@DinoV DinoV force-pushed the nogil_dict_iter_threadsafe branch 3 times, most recently from 2b2e75a to 0dd1a06 Compare February 7, 2024 00:14
@DinoV DinoV marked this pull request as ready for review February 7, 2024 00:48
@DinoV DinoV requested a review from colesbury February 7, 2024 20:39
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to avoid expensive atomic operations, like atomic compare exchange and atomic increments, as well as avoiding locking.

This means giving up some atomicity for list and dict iterators compared to the GIL behavior. We should still avoid crashes/memory corruption, but I think it's okay for concurrent calls to next(it) on the same iterator object to return the same object. These iterators are almost always used by only a single thread and the performance cost of making the next atomic is relatively high.

See #114843 for the list iterator changes.

Objects/dictobject.c Outdated Show resolved Hide resolved
@DinoV
Copy link
Contributor Author

DinoV commented Feb 16, 2024

I think we want to avoid expensive atomic operations, like atomic compare exchange and atomic increments, as well as avoiding locking.

This means giving up some atomicity for list and dict iterators compared to the GIL behavior. We should still avoid crashes/memory corruption, but I think it's okay for concurrent calls to next(it) on the same iterator object to return the same object. These iterators are almost always used by only a single thread and the performance cost of making the next atomic is relatively high.

I was worried some crazy person might be using these things to distribute work across threads :P. I'm happy to make relax the guarantees and see how that goes. I suppose if that ever becomes an issue we can deal with it then :)

@DinoV DinoV force-pushed the nogil_dict_iter_threadsafe branch from 0dd1a06 to 3c03084 Compare February 16, 2024 00:43
Comment on lines 4539 to 4541
#endif

#ifndef Py_GIL_DISABLED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#endif
#ifndef Py_GIL_DISABLED
#else /* Py_GIL_DISABLED */

@@ -4558,6 +4607,8 @@ dictiter_iternextkey_lock_held(PyDictObject *d, PyObject *self)
return NULL;
}

#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#endif
#endif /* Py_GIL_DISABLED */

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few code style comments below

Comment on lines 5034 to 5434
#ifdef Py_GIL_DISABLED
if (has_unique_reference(result)) {
#else
if (Py_REFCNT(result) == 1) {
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_unique_reference already has special cases for Py_GIL_DISABLED and the default build:

Suggested change
#ifdef Py_GIL_DISABLED
if (has_unique_reference(result)) {
#else
if (Py_REFCNT(result) == 1) {
#endif
if (has_unique_reference(result)) {

Comment on lines 4936 to 5333
if (values == NULL)
goto concurrent_modification;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the preferred style is to always include braces in new C code

Comment on lines 4940 to 5338
if (i >= used)
goto fail;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

// Even though we hold the lock here we may still lose a race against
// a lock-free iterator, therefore we may end up retrying our iteration.
retry:
start_pos = i = _Py_atomic_load_ssize_relaxed(&di->di_pos);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the wrappers from pycore_pyatomic_ft_wrappers.h to reduce the number of #ifdef statements.

Objects/dictobject.c Show resolved Hide resolved
@DinoV DinoV force-pushed the nogil_dict_iter_threadsafe branch 3 times, most recently from e2a13a2 to 0941e62 Compare February 21, 2024 22:06
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I think it'd be better if acquire_key_value follows the -1/0 convention. The comment above acquire_key_value would need to be updated too.

Comment on lines +5278 to +5279
static int
acquire_key_value(PyObject **key_loc, PyObject *value, PyObject **value_loc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here: -1 for error and 0 for success

@DinoV DinoV force-pushed the nogil_dict_iter_threadsafe branch from 0941e62 to a1d7718 Compare February 22, 2024 18:32
@DinoV DinoV merged commit 1002fbe into python:main Feb 22, 2024
33 checks passed
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this pull request Mar 4, 2024
)

Makes iteration of a dict be lock free for the forward iteration case.
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
)

Makes iteration of a dict be lock free for the forward iteration case.
LukasWoodtli pushed a commit to LukasWoodtli/cpython that referenced this pull request Jan 22, 2025
)

Makes iteration of a dict be lock free for the forward iteration case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants