Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when calling PyGILState_Ensure() from a fresh C thread #96071

Closed
tom-pytel opened this issue Aug 18, 2022 · 17 comments
Closed

Deadlock when calling PyGILState_Ensure() from a fresh C thread #96071

tom-pytel opened this issue Aug 18, 2022 · 17 comments
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@tom-pytel
Copy link
Contributor

tom-pytel commented Aug 18, 2022

Bug report

This used to work with py 3.8, 3.9 and 3.10 but now fails. The failure is due to a deadlock condition due to the move of an alloc from before a HEAD_LOCK() to after causing a recursive call to PyGILState_Ensure() which eventually deadlocks on the HEAD_LOCK().

The following is a trace of calls from the initial call to PyGILState_Ensure() to the eventual deadlock:

pystate:PyGILState_Ensure()
pystate:PyThreadState_New()
pystate:new_threadstate()
  HEAD_LOCK()  <-- Initial head locked, the alloc_threadstate() call below used to happen before this in previous versions of Python.
pystate:alloc_threadstate()
obmalloc:PyMem_RawCalloc()
_tracemalloc:_PyMem_Raw.calloc()
_tracemalloc:tracemalloc_raw_calloc()
  get_reentrant() = 0
  set_reenterant(1)
pystate:PyGILState_Ensure()
pystate:PyThreadState_New()
pystate:new_threadstate()
  HEAD_LOCK()  <--   DEADLOCK HERE!

Steps to reproduce: In a Python C extension module, create a C thread then try to call PyGILState_Ensure() in that thread with no previous existing Python threadstate.

Your environment

Python 3.11.0rc1
Linux tom-VirtualBox 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

@tom-pytel tom-pytel added the type-bug An unexpected behavior, bug, or error label Aug 18, 2022
@ronaldoussoren ronaldoussoren added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Aug 18, 2022
@ronaldoussoren
Copy link
Contributor

Do you have a self-contained example that demonstrates the problem? I quickly checked using PyObjC on macOS (which basically does this when starting threads using the Cocoa interface), and that code did not deadlock.

@tom-pytel
Copy link
Contributor Author

Do you have a self-contained example that demonstrates the problem? I quickly checked using PyObjC on macOS (which basically does this when starting threads using the Cocoa interface), and that code did not deadlock.

Ok, I investigated a little deeper and admittedly I have a bit more going on in the environment where I found this. Tried duplicating until I found the difference: tracemalloc. When tracemalloc is imported and started the function flow is as shown in the header of this issue. When it is not, then after obmalloc:PyMem_RawCalloc() it goes to obmalloc:_PyMem_DebugRawCalloc() instead of _tracemalloc:_PyMem_Raw.calloc() which avoids the recursive call to PyGILState_Ensure() and works fine. I will see about getting some poc module code here tomorrow or after.

@ronaldoussoren
Copy link
Contributor

I can reproduce using PyObjC by using python3.11 -X tracemalloc script.py to start the PyObjC using script I mentioned earlier.

@pablogsal is this a release blocker?

@pablogsal
Copy link
Member

@pablogsal is this a release blocker?

Indeed it is. Thanks for pinging me.

@pablogsal
Copy link
Member

@ronaldoussoren Can you share the reproducer?

@pablogsal
Copy link
Member

Or alternatively, could you bisect the issue?

@tom-pytel
Copy link
Contributor Author

Or alternatively, could you bisect the issue?

In previous versions the memory for the new PyThreadState was allocated before the HEAD_LOCK() in pystate:new_threadstate(), if you revert to that behavior the bug is fixed.

@ronaldoussoren
Copy link
Contributor

The reproducer needs PyObjC (from v8.5-branch in https://github.com/ronaldoussoren/pyobjc because the latest release does not work with Python 3.11 yet).

The reproducer script:

from Cocoa import NSObject, NSThread
import time

class Runner(NSObject):
   def doit_(self, arg):
       print("hello world")
       print(NSThread.currentThread().isMainThread())


runner = Runner.alloc().init()

thread = NSThread.alloc().initWithTarget_selector_object_(runner, b"doit:", None)
thread.start()
print(thread)
time.sleep(1)
while thread.isExecuting():
    time.sleep(1)

This script hangs when using "python3.11 -X tracemalloc script.pyand works fine when using an older version of Python or leaving out-X tracemalloc``.

It is probably easier to reproduce this using a small C extension that uses pthread_create to launch a thread that call PyGILState_Ensure.

I can look into this during the weekend at the earliest, but also need to get a release of PyObjC out to fix Python 3.11 support (which is needed due to documented API changes, nothing to worry about there).

@pablogsal
Copy link
Member

pablogsal commented Aug 18, 2022

I will prepare a patch tomorrow. I have enough information to produce a fix.

@kumaraditya303
Copy link
Contributor

Minimal reproducer:

#include "Python.h"
#include "pthread.h"

static struct PyModuleDef deadlockmodule = {
    PyModuleDef_HEAD_INIT,
    "deadlock",
    NULL,
    -1,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL
};

static void *func(void *arg)
{
    printf("in func\n");
    PyGILState_STATE state = PyGILState_Ensure();
    printf("called PyGILState_Ensure\n");
    PyGILState_Release(state);
    printf("called PyGILState_Release\n");
    return NULL;
}

PyMODINIT_FUNC
PyInit_deadlock(void)
{
    Py_BEGIN_ALLOW_THREADS
    pthread_t thread;
    pthread_create(&thread, NULL, func, NULL);
    pthread_join(thread, NULL);
    Py_END_ALLOW_THREADS
    return PyModule_Create(&deadlockmodule);
}

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Aug 19, 2022

@pablogsal I have a fix for this #96107

@kumaraditya303 kumaraditya303 added 3.11 only security fixes 3.12 bugs and security fixes labels Aug 19, 2022
@kumaraditya303 kumaraditya303 moved this from Todo to In Progress in Release and Deferred blockers 🚫 Aug 19, 2022
@ericsnowcurrently
Copy link
Member

Thanks for the eyes on this, everyone, and for the fixes.

miss-islington pushed a commit that referenced this issue Aug 19, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 19, 2022
Alternative of pythonGH-96107
(cherry picked from commit e0d54a4)

Co-authored-by: Kumar Aditya <[email protected]>
@ericsnowcurrently
Copy link
Member

I've merged the fix into main. Now I'm wondering if it would make sense to also add a test that would have caught the bug. That wouldn't be too hard, would it @kumaraditya303?

pablogsal pushed a commit that referenced this issue Aug 19, 2022
Alternative of GH-96107
(cherry picked from commit e0d54a4)

Co-authored-by: Kumar Aditya <[email protected]>

Co-authored-by: Kumar Aditya <[email protected]>
@kumaraditya303
Copy link
Contributor

PR #96137 adds a test for this.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 22, 2022
Automerge-Triggered-By: GH:ericsnowcurrently
(cherry picked from commit 079baee)

Co-authored-by: Kumar Aditya <[email protected]>
miss-islington pushed a commit that referenced this issue Aug 22, 2022
Automerge-Triggered-By: GH:ericsnowcurrently
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 22, 2022
Automerge-Triggered-By: GH:ericsnowcurrently
(cherry picked from commit 079baee)

Co-authored-by: Kumar Aditya <[email protected]>
@ericsnowcurrently
Copy link
Member

We can close this once gh-96184 and gh-96185 are merged.

kumaraditya303 added a commit to kumaraditya303/cpython that referenced this issue Aug 23, 2022
…GH-96137)

Automerge-Triggered-By: GH:ericsnowcurrently
(cherry picked from commit 079baee)

Co-authored-by: Kumar Aditya <[email protected]>
kumaraditya303 added a commit to kumaraditya303/cpython that referenced this issue Aug 23, 2022
…GH-96137)

Automerge-Triggered-By: GH:ericsnowcurrently
(cherry picked from commit 079baee)

Co-authored-by: Kumar Aditya <[email protected]>
miss-islington added a commit that referenced this issue Aug 23, 2022
Automerge-Triggered-By: GH:ericsnowcurrently
(cherry picked from commit 079baee)

Co-authored-by: Kumar Aditya <[email protected]>
pablogsal pushed a commit that referenced this issue Aug 23, 2022
@kumaraditya303
Copy link
Contributor

Closing as I have fixed both the issue and added tests to ensure no further regression. Thanks!

Repository owner moved this from In Progress to Done in Release and Deferred blockers 🚫 Aug 24, 2022
@ericsnowcurrently
Copy link
Member

Thanks, @kumaraditya303!

mdboom pushed a commit to mdboom/cpython that referenced this issue Aug 24, 2022
Automerge-Triggered-By: GH:ericsnowcurrently
JanWielemaker added a commit to SWI-Prolog/packages-swipy that referenced this issue Jul 25, 2023
Does deadlock.  Seems this could be related to python/cpython#96071
ericsnowcurrently added a commit that referenced this issue Nov 19, 2024
…preterState Field (gh-126989)

This approach eliminates the originally reported race. It also gets rid of the deadlock reported in gh-96071, so we can remove the workaround added then.
ericsnowcurrently added a commit to ericsnowcurrently/cpython that referenced this issue Nov 21, 2024
…yInterpreterState Field (pythongh-126989)

This approach eliminates the originally reported race. It also gets rid of the deadlock reported in pythongh-96071, so we can remove the workaround added then.
ericsnowcurrently added a commit that referenced this issue Dec 2, 2024
…PyInterpreterState Field (gh-127114)

This approach eliminates the originally reported race.  It also gets rid of the deadlock reported in gh-96071, so we can remove the workaround added then.

This is mostly a cherry-pick of 1c0a104 (AKA gh-126989).  The difference is we add PyInterpreterState.threads_preallocated at the end of PyInterpreterState, instead of adding PyInterpreterState.threads.preallocated.  That avoids ABI disruption.
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
…yInterpreterState Field (pythongh-126989)

This approach eliminates the originally reported race. It also gets rid of the deadlock reported in pythongh-96071, so we can remove the workaround added then.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

5 participants