-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-96071: fix deadlock in PyGILState_Ensure #96107
Conversation
kumaraditya303
commented
Aug 19, 2022
•
edited by bedevere-bot
Loading
edited by bedevere-bot
- Issue: Deadlock when calling PyGILState_Ensure() from a fresh C thread #96071
a8f821e
to
4eb4ae6
Compare
Is there no way to move the alloc and PyThreadState init stuff to before the HEAD_LOCK() and only do critical stuff in a small block as was the case in previous versions - having only one critical section at the end of the function? |
The function needs to be thread safe and safe against concurrent mutations of calling |
I don't fully understand what you mean here. If you move the allocation to the beginning of the function before any of the critical sections is entered, what will happen is that the allocation will (may) call tracemalloc that in turn will call |
@@ -829,11 +829,16 @@ new_threadstate(PyInterpreterState *interp) | |||
// Every valid interpreter must have at least one thread. | |||
assert(id > 1); | |||
assert(old_head->prev == NULL); | |||
|
|||
// Unlock before allocating memory. | |||
HEAD_UNLOCK(runtime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, I prefer to move the allocation at the beginning as @tom-pytel is suggesting because is much easier to reason about. Here you are interrupting the critical section in the middle which is potentially incorrect depending on what tracemalloc does (currently it done not do anything fancy like spawning threads but you are implicitly relyi ng on that contract).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever tracemalloc or any other memory allocator may do (even spawn threads), the thread unique counter is protected above by the mutex and we only unlock for memory allocation, after that we read with mutex locked so this is re-entrant safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you concerned about this I can change it to always allocate memory and free it if unused but I would like to avoid it unless necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean and is great that you want to ensure maximum performance but I personally think that is premature optimization. Creating thread states is not a performance-critical part of the code and this problem with the allocation being unused only happens if tracemalloc is in use, which impacts performance by a ton already, so this is a negligible advantage at the cost of making reasoning about the correctness of the code harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating thread states is not a performance-critical part of the code and this problem with the allocation being unused only happens if tracemalloc is in use,
Just a clarification that the allocation can be unused even without tracemalloc if interp->threads.head is NULL
which is the common case of this function. But I see your point and will change. Thanks for your feedback.
If you move the allocation out of the critical section and then if you are re-using the initial thread |
You can only allocate without lock if we don't care about the useless memory allocation not "init" since the initializaton requires reading and writing the threads linked list and that can be mutated by other threads so it requires locking. |
That's great and is cool that you have an eye on not making performance regressions but maintenance and ease of reason (especially for locks and multi-threaded code) are very important as well. The slowdown only happens if tracemalloc (or any other memory hook) is active, which already is affecting performance. Also, thread state creation is not a hot path so is not that important to ensure we don't regress in those cases. |
Being said that I don't feel very strongly about this matter, I am happy as long as the issue is fixed :) |
No, the slowdown will happen not only with tracemalloc but even without it since we always try to use the initial thread state and the allocation would be unused. |
Ah, I see what you mean now 👍 @kumaraditya303 Give the alternative a go, if we see that actually is uglier because we need to free memory in many places or you are not very convinced, I am happy to consider the current version 😉 But let's explore the other possibility just in case :) |
Just to clarify, I believe the allocation and even most of the initialization of the EDIT: Also, generally, the smaller and simpler a critical section the better. |
Closing in favor of #96124 |
Alternative of pythonGH-96107 (cherry picked from commit e0d54a4) Co-authored-by: Kumar Aditya <[email protected]>
Alternative of GH-96107 (cherry picked from commit e0d54a4) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Kumar Aditya <[email protected]>