Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-127421: Fix race in test_start_new_thread_failed #127549

Merged
merged 2 commits into from
Dec 3, 2024

Conversation

mpage
Copy link
Contributor

@mpage mpage commented Dec 3, 2024

When we succeed in starting a new thread, for example if setrlimit was ineffective, we must wait for the newly spawned thread to exit. Otherwise, we run the risk that the newly spawned thread will race with runtime finalization and access memory that has already been freed. In this case, this assertion tends to fire:

assert(tstate_is_alive(tstate));

when the newly created thread attempts to bind the state after the runtime has been finalized, because all thread states are cleared and deleted as part of finalization.

The use of _thread.start_new_thread() is problematic here. It only spawns daemon threads, which the runtime does not wait for during finalization, and does not return a joinable handle. Instead, use _thread.start_joinable_thread() and join the resulting handle when the thread is started successfully.

When we succeed in starting a new thread, for example if setrlimit
was ineffective, we must wait for the newly spawned thread to exit.
Otherwise, we run the risk that the newly spawned thread will race
with runtime finalization and access memory that has already been
clobbered/freed.

`_thread.start_new_thread()` only spawns daemon threads, which the runtime
does not wait for at shutdown, and does not return a handle. Use
`_thread.start_joinable_thread()` and join the resulting handle when
the thread is started successfully.
@mpage
Copy link
Contributor Author

mpage commented Dec 3, 2024

I was able to reproduce the failure on my macbook pro. With the fix in this PR I was unable to reproduce the failure when running:

./python.exe -m test -F test_threading --match test_start_new_thread_failed

@mpage mpage requested a review from colesbury December 3, 2024 06:12
@mpage mpage marked this pull request as ready for review December 3, 2024 06:26
@serhiy-storchaka
Copy link
Member

Are you sure that the original issue, the fix of which is tested here, was reproducible with start_joinable_thread?

@colesbury
Copy link
Contributor

I can reproduce the original issue with start_joinable_thread

@mpage
Copy link
Contributor Author

mpage commented Dec 3, 2024

Does this need to be backported?

@colesbury
Copy link
Contributor

Yes, I can reproduce the issue in 3.13 and 3.12 as well.

@mpage mpage added the needs backport to 3.13 bugs and security fixes label Dec 3, 2024
@mpage
Copy link
Contributor Author

mpage commented Dec 3, 2024

Ok, 3.12 will need a different fix, because start_joinable_thread was only added in 3.13.

@mpage mpage merged commit 13b68e1 into python:main Dec 3, 2024
38 checks passed
@miss-islington-app
Copy link

Thanks @mpage for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Dec 3, 2024
…27549)

Fix race in test_start_new_thread_failed

When we succeed in starting a new thread, for example if setrlimit
was ineffective, we must wait for the newly spawned thread to exit.
Otherwise, we run the risk that the newly spawned thread will race
with runtime finalization and access memory that has already been
clobbered/freed.

`_thread.start_new_thread()` only spawns daemon threads, which the runtime
does not wait for at shutdown, and does not return a handle. Use
`_thread.start_joinable_thread()` and join the resulting handle when
the thread is started successfully.
(cherry picked from commit 13b68e1)

Co-authored-by: mpage <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Dec 3, 2024

GH-127574 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Dec 3, 2024
@mpage mpage deleted the gh-127421-flaky-test branch December 3, 2024 17:54
mpage added a commit that referenced this pull request Dec 3, 2024
#127574)

gh-127421: Fix race in test_start_new_thread_failed (GH-127549)

Fix race in test_start_new_thread_failed

When we succeed in starting a new thread, for example if setrlimit
was ineffective, we must wait for the newly spawned thread to exit.
Otherwise, we run the risk that the newly spawned thread will race
with runtime finalization and access memory that has already been
clobbered/freed.

`_thread.start_new_thread()` only spawns daemon threads, which the runtime
does not wait for at shutdown, and does not return a handle. Use
`_thread.start_joinable_thread()` and join the resulting handle when
the thread is started successfully.
(cherry picked from commit 13b68e1)

Co-authored-by: mpage <[email protected]>
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot aarch64 RHEL8 LTO 3.13 has failed when building commit 0b266aa.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1393/builds/387) and take a look at the build logs.
  4. Check if the failure is related to this commit (0b266aa) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1393/builds/387

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.13.cstratak-RHEL8-aarch64.lto/build/Lib/threading.py", line 1041, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/home/buildbot/buildarea/3.13.cstratak-RHEL8-aarch64.lto/build/Lib/threading.py", line 992, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.13.cstratak-RHEL8-aarch64.lto/build/Lib/test/test_interpreters/test_stress.py", line 47, in run
    interp = interpreters.create()
  File "/home/buildbot/buildarea/3.13.cstratak-RHEL8-aarch64.lto/build/Lib/test/support/interpreters/__init__.py", line 76, in create
    id = _interpreters.create(reqrefs=True)
interpreters.InterpreterError: interpreter creation failed
k


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.13.cstratak-RHEL8-aarch64.lto/build/Lib/threading.py", line 1041, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/home/buildbot/buildarea/3.13.cstratak-RHEL8-aarch64.lto/build/Lib/threading.py", line 992, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.13.cstratak-RHEL8-aarch64.lto/build/Lib/test/test_interpreters/test_stress.py", line 30, in task
    interp = interpreters.create()
  File "/home/buildbot/buildarea/3.13.cstratak-RHEL8-aarch64.lto/build/Lib/test/support/interpreters/__init__.py", line 76, in create
    id = _interpreters.create(reqrefs=True)
interpreters.InterpreterError: interpreter creation failed
k

@mpage
Copy link
Contributor Author

mpage commented Dec 3, 2024

The buildbot failure looks unrelated to this change.

srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
)

Fix race in test_start_new_thread_failed

When we succeed in starting a new thread, for example if setrlimit
was ineffective, we must wait for the newly spawned thread to exit.
Otherwise, we run the risk that the newly spawned thread will race
with runtime finalization and access memory that has already been
clobbered/freed.

`_thread.start_new_thread()` only spawns daemon threads, which the runtime
does not wait for at shutdown, and does not return a handle. Use
`_thread.start_joinable_thread()` and join the resulting handle when
the thread is started successfully.
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
)

Fix race in test_start_new_thread_failed

When we succeed in starting a new thread, for example if setrlimit
was ineffective, we must wait for the newly spawned thread to exit.
Otherwise, we run the risk that the newly spawned thread will race
with runtime finalization and access memory that has already been
clobbered/freed.

`_thread.start_new_thread()` only spawns daemon threads, which the runtime
does not wait for at shutdown, and does not return a handle. Use
`_thread.start_joinable_thread()` and join the resulting handle when
the thread is started successfully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants