-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frame teardown can create frame objects #99729
Comments
see also bloomberg/memray#260 cc @pablogsal |
This looks like another frame lifetime issue. Here is the stack:
|
I can confirm that still happens with 3.11 current. |
Thanks, I'll look at this today (Mark's out right now). |
I've now eliminated pytest from my reproducer: graingert/segfault-repro@34b5962 |
Oh thank God. You have no idea how helpful this is. |
@graingert, how should I run the new reproducer? I'm unable to get 3.11 head to crash with |
I see the segfault intermittently when running |
In the meantime, I'm using [tool.pytest.ini_options]
filterwarnings = ["error"] |
Let me know if you want me to open a PR with the simplified reproducers (or, even better, if you want me to just push them). |
You need to run it with |
Thanks, I can reproduce using my minimized file now. Continuing to rip stuff out bit by bit... |
Down to single file, 100 lines, no context managers or generators (except for any in |
This reproduces consistently: # python crasher.py
import sys
import threading
class DelRaises:
def __del__(self):
assert False
class MyThread(threading.Thread):
def run(self):
_ = DelRaises()
EXCEPTION = None
def capture_error(unraisable):
global EXCEPTION
EXCEPTION = unraisable.exc_value
sys.unraisablehook = capture_error
print("Running...")
MyThread().start()
print("Crashing...") If I remove the assignment to |
Humm, seems that this is a double free:
|
I think is not about |
Where are the generators, though? In |
The segfaults happen when destroying the exception in |
I am using your reproducer |
Oh wait, this may not be generators. I see. |
This is the stack:
|
I just updated the reproducer a minute ago, not sure if you saw. |
So the frame in the exception in the unraisable hook has been destroyed already |
Wanna move this to discord? I feel like we're chatting in real-time now. |
No, but same thing:
and stack:
|
Let's do discord |
No unraisablehook: import sys
import threading
class DelRaises:
def __del__(self):
global sneaky
sneaky = sys._getframe()
class MyThread(threading.Thread):
def run(self):
_ = DelRaises()
print("Running...")
MyThread().start()
print("Crashing...") |
To recap, the issue is that we clear the frame object and transfer ownership of an exiting frame before clearing the locals... but as we've seen, clearing the locals can make a new frame object. One solution is to pop the frame off of the thread state (by setting diff --git a/Python/ceval.c b/Python/ceval.c
index 8cbe838ddf..3be38934c1 100644
--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -1617,14 +1617,6 @@ trace_function_exit(PyThreadState *tstate, _PyInterpreterFrame *frame, PyObject
return 0;
}
-static _PyInterpreterFrame *
-pop_frame(PyThreadState *tstate, _PyInterpreterFrame *frame)
-{
- _PyInterpreterFrame *prev_frame = frame->previous;
- _PyEvalFrameClearAndPop(tstate, frame);
- return prev_frame;
-}
-
/* It is only between the PRECALL instruction and the following CALL,
* that this has any meaning.
*/
@@ -2441,7 +2433,9 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
DTRACE_FUNCTION_EXIT();
_Py_LeaveRecursiveCallTstate(tstate);
if (!frame->is_entry) {
- frame = cframe.current_frame = pop_frame(tstate, frame);
+ cframe.current_frame = frame->previous;
+ _PyEvalFrameClearAndPop(tstate, frame);
+ frame = cframe.current_frame;
_PyFrame_StackPush(frame, retval);
goto resume_frame;
}
@@ -5833,7 +5827,9 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
assert(tstate->cframe->current_frame == frame->previous);
return NULL;
}
- frame = cframe.current_frame = pop_frame(tstate, frame);
+ cframe.current_frame = frame->previous;
+ _PyEvalFrameClearAndPop(tstate, frame);
+ frame = cframe.current_frame;
resume_with_error:
SET_LOCALS_FROM_FRAME();
diff --git a/Python/frame.c b/Python/frame.c
index d8f2f801f3..792974b50d 100644
--- a/Python/frame.c
+++ b/Python/frame.c
@@ -137,7 +137,7 @@ _PyFrame_Clear(_PyInterpreterFrame *frame)
for (int i = 0; i < frame->stacktop; i++) {
Py_XDECREF(frame->localsplus[i]);
}
- Py_XDECREF(frame->frame_obj);
+ assert(frame->frame_obj == NULL);
Py_XDECREF(frame->f_locals);
Py_DECREF(frame->f_func);
Py_DECREF(frame->f_code) However, it's unclear to me to what extent generators are affected (if at all), and also how multiple threads are interacting here. It might make more sense to modify Either way, I need to take a break for a bit. I might be back on this in a few hours, or at least by next Tuesday. Hopefully this gives Mark and @pablogsal enough breadcrumbs to move forward. |
I still don't undetstand why this doesn't crash:
Unfortunately, I am currently dealing with a health issue so I could not find time to deal with this in detail :( |
Sorry, to hear that. :( I was able to figure this out yesterday, but was waiting to share in the meeting. I’ll post here in case you can’t come: I was able to get a single-threaded version to crash, too. It’s just easier to do with multiple threads since the frame object has a stale pointer into the thread’s frame stack. In our threaded example, after the thread finishes, the frame stack ceases to exist, so things can go south pretty quickly. When single-threaded, we just have a pointer into an old part of the current chunk, so things appear valid for longer if you don’t try to do much. I’m on my phone now, but if I remember correctly, doing something like Basically, we should mark this frame as “incomplete” (or just unlink it, not sure yet which is easier) after checking for existing frame objects, but before clearing stuff. This restores the behavior of 3.10, which made it look like I’ll have a PR up for review this week. |
Marking this as 3.11 release blocker |
PRs merged so I am closing the issue. Thanks a lot to everyone that participated in this bug, from identifying it to fixing it. You all rock 🤘 |
I've confirmed that the 3.11 patch fixes the full original reproducer, as well. |
Crash report
using https://github.com/graingert/segfault-repro running
pytest
yields a segfault in about 1 in 3 runsError messages
Enter any relevant error message caused by the crash, including a core dump if there is one.
Your environment
Python 3.11.0 (main, Oct 24 2022, 19:56:13) [GCC 11.2.0] on linux
5.15.0-53-generic #59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Linked PRs
The text was updated successfully, but these errors were encountered: