Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-96071: fix deadlock in PyGILState_Ensure #96107

Closed
wants to merge 2 commits into from

Conversation

kumaraditya303
Copy link
Contributor

@kumaraditya303 kumaraditya303 commented Aug 19, 2022

@kumaraditya303 kumaraditya303 added type-bug An unexpected behavior, bug, or error interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker needs backport to 3.11 only security fixes labels Aug 19, 2022
@tom-pytel
Copy link
Contributor

Is there no way to move the alloc and PyThreadState init stuff to before the HEAD_LOCK() and only do critical stuff in a small block as was the case in previous versions - having only one critical section at the end of the function?

@kumaraditya303
Copy link
Contributor Author

Is there no way to move the alloc and PyThreadState init stuff to before the HEAD_LOCK() and only do critical stuff in a small block as was the case in previous versions - having only one critical section at the end of the function?

The function needs to be thread safe and safe against concurrent mutations of calling PyGILState_Ensure in other threads so it needs to be serialized.

@pablogsal
Copy link
Member

The function needs to be thread safe and safe against concurrent mutations of calling PyGILState_Ensure in other threads so it needs to be serialized.

I don't fully understand what you mean here. If you move the allocation to the beginning of the function before any of the critical sections is entered, what will happen is that the allocation will (may) call tracemalloc that in turn will call new_threadstate that in turn will allocate outside the critical section but without recursing into tracemalloc and then it will block the entire critical section. Because the critical section is what needs to be serialized, there is no problem.

@@ -829,11 +829,16 @@ new_threadstate(PyInterpreterState *interp)
// Every valid interpreter must have at least one thread.
assert(id > 1);
assert(old_head->prev == NULL);

// Unlock before allocating memory.
HEAD_UNLOCK(runtime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I prefer to move the allocation at the beginning as @tom-pytel is suggesting because is much easier to reason about. Here you are interrupting the critical section in the middle which is potentially incorrect depending on what tracemalloc does (currently it done not do anything fancy like spawning threads but you are implicitly relyi ng on that contract).

Copy link
Contributor Author

@kumaraditya303 kumaraditya303 Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever tracemalloc or any other memory allocator may do (even spawn threads), the thread unique counter is protected above by the mutex and we only unlock for memory allocation, after that we read with mutex locked so this is re-entrant safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you concerned about this I can change it to always allocate memory and free it if unused but I would like to avoid it unless necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean and is great that you want to ensure maximum performance but I personally think that is premature optimization. Creating thread states is not a performance-critical part of the code and this problem with the allocation being unused only happens if tracemalloc is in use, which impacts performance by a ton already, so this is a negligible advantage at the cost of making reasoning about the correctness of the code harder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating thread states is not a performance-critical part of the code and this problem with the allocation being unused only happens if tracemalloc is in use,

Just a clarification that the allocation can be unused even without tracemalloc if interp->threads.head is NULL which is the common case of this function. But I see your point and will change. Thanks for your feedback.

@kumaraditya303
Copy link
Contributor Author

I don't fully understand what you mean here. If you move the allocation to the beginning of the function before any of the critical sections is entered, what will happen is that the allocation will (may) call tracemalloc that in turn will call new_threadstate that in turn will allocate outside the critical section but without recursing into tracemalloc and then it will block the entire critical section. Because the critical section is what needs to be serialized, there is no problem.

If you move the allocation out of the critical section and then if you are re-using the initial thread _initial_thread state then the memory which was allocated is of no use. I didn't want to slow down the function with a useless memory allocation since this function is used widely in C extensions with callbacks. The current implementation avoids this allocation unless necessary.

@kumaraditya303
Copy link
Contributor Author

kumaraditya303 commented Aug 19, 2022

Is there no way to move the alloc and PyThreadState init stuff to before the HEAD_LOCK()

You can only allocate without lock if we don't care about the useless memory allocation not "init" since the initializaton requires reading and writing the threads linked list and that can be mutated by other threads so it requires locking.

@pablogsal
Copy link
Member

I didn't want to slow down the function with a useless memory allocation since this function is used widely in C extensions with callbacks.

That's great and is cool that you have an eye on not making performance regressions but maintenance and ease of reason (especially for locks and multi-threaded code) are very important as well.

The slowdown only happens if tracemalloc (or any other memory hook) is active, which already is affecting performance. Also, thread state creation is not a hot path so is not that important to ensure we don't regress in those cases.

@pablogsal
Copy link
Member

Being said that I don't feel very strongly about this matter, I am happy as long as the issue is fixed :)

@kumaraditya303
Copy link
Contributor Author

The slowdown only happens if tracemalloc (or any other memory hook) is active, which already is affecting performance. Also, thread state creation is not a hot path so is not that important to ensure we don't regress in those cases.

No, the slowdown will happen not only with tracemalloc but even without it since we always try to use the initial thread state and the allocation would be unused.

@pablogsal
Copy link
Member

pablogsal commented Aug 19, 2022

No, the slowdown will happen not only with tracemalloc but even without it since we always try to use the initial thread state and the allocation would be unused.

Ah, I see what you mean now 👍

@kumaraditya303 Give the alternative a go, if we see that actually is uglier because we need to free memory in many places or you are not very convinced, I am happy to consider the current version 😉

But let's explore the other possibility just in case :)

@tom-pytel
Copy link
Contributor

tom-pytel commented Aug 19, 2022

Just to clarify, I believe the allocation and even most of the initialization of the PyThreadState structure can occur outside of the critical section, but that is not the biggest thing. My main contention is that splitting a critical section in two can cause unexpected side-effects down the road regardless of how harmless you believe the split to be at this moment.

EDIT: Also, generally, the smaller and simpler a critical section the better.

@kumaraditya303
Copy link
Contributor Author

Closing in favor of #96124

@kumaraditya303 kumaraditya303 deleted the pystate branch August 19, 2022 17:22
miss-islington pushed a commit that referenced this pull request Aug 19, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Aug 19, 2022
Alternative of pythonGH-96107
(cherry picked from commit e0d54a4)

Co-authored-by: Kumar Aditya <[email protected]>
pablogsal pushed a commit that referenced this pull request Aug 19, 2022
Alternative of GH-96107
(cherry picked from commit e0d54a4)

Co-authored-by: Kumar Aditya <[email protected]>

Co-authored-by: Kumar Aditya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review interpreter-core (Objects, Python, Grammar, and Parser dirs) needs backport to 3.11 only security fixes release-blocker type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants