Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame teardown can create frame objects #99729

Closed
graingert opened this issue Nov 23, 2022 · 34 comments
Closed

Frame teardown can create frame objects #99729

graingert opened this issue Nov 23, 2022 · 34 comments
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@graingert
Copy link
Contributor

graingert commented Nov 23, 2022

Crash report

using https://github.com/graingert/segfault-repro running pytest yields a segfault in about 1 in 3 runs

================================================================================================================================== test session starts ===================================================================================================================================
platform linux -- Python 3.11.0, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/graingert/projects/segfault-repro, configfile: pyproject.toml
collected 1 item                                                                                                                                                                                                                                                                         

test_ssltransport.py Fatal Python error: Segmentation fault

Current thread 0x00007f874b2df000 (most recent call first):
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/unraisableexception.py", line 43 in _hook
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/python.py", line 195 in pytest_pyfunc_call
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/python.py", line 1789 in runtest
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/runner.py", line 167 in pytest_runtest_call
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/runner.py", line 260 in <lambda>
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/runner.py", line 339 in from_call
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/runner.py", line 259 in call_runtest_hook
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/runner.py", line 220 in call_and_report
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/runner.py", line 131 in runtestprotocol
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/runner.py", line 112 in pytest_runtest_protocol
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/main.py", line 349 in pytest_runtestloop
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/main.py", line 324 in _main
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/main.py", line 270 in wrap_session
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/config/__init__.py", line 167 in main
  File "/home/graingert/.virtualenvs/segfault-repro/lib/python3.11/site-packages/_pytest/config/__init__.py", line 190 in console_main
  File "/home/graingert/.virtualenvs/segfault-repro/bin/pytest", line 8 in <module>
[1]    50315 segmentation fault (core dumped)  pytest

Error messages

Enter any relevant error message caused by the crash, including a core dump if there is one.

Your environment

  • CPython versions tested on: Python 3.11.0 (main, Oct 24 2022, 19:56:13) [GCC 11.2.0] on linux
  • Operating system and architecture: 5.15.0-53-generic #59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Linked PRs

@graingert graingert added the type-crash A hard crash of the interpreter, possibly with a core dump label Nov 23, 2022
@graingert
Copy link
Contributor Author

graingert commented Nov 23, 2022

see also bloomberg/memray#260 cc @pablogsal

@pablogsal
Copy link
Member

This looks like another frame lifetime issue. Here is the stack:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7cc5740 (LWP 223005)]
0x00005555556cbc8b in PyFrame_GetCode (frame=<error reading variable: Cannot access memory at address 0x7ffff5e4c210>) at Objects/frameobject.c:1303
1303    Objects/frameobject.c: Directory not empty.
(gdb) up
#1  frame_getcode (f=<error reading variable: Cannot access memory at address 0x7ffff5e4c210>, closure=<optimized out>)
    at Objects/frameobject.c:97
97      in Objects/frameobject.c
(gdb)
#2  0x0000555555702309 in _PyObject_GenericGetAttrWithDict (Python Exception <class 'gdb.error'>: There is no member named f_frame.
obj=, name='f_code', dict=0x0, suppress=0) at ./Include/object.h:133
133     ./Include/object.h: Directory not empty.
(gdb)
#3  0x0000555555701836 in PyObject_GetAttr (Python Exception <class 'gdb.error'>: There is no member named f_frame.
v=v@entry=, name='f_code') at Objects/object.c:916
916     Objects/object.c: Directory not empty.
(gdb)
#4  0x00005555556440a1 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>)
    at Python/ceval.c:3466
3466    Python/ceval.c: Directory not empty.
(gdb)
#5  0x00005555556c1bd5 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff4840330, tstate=0x555555adfa18 <_PyRuntime+166328>)
    at ./Include/internal/pycore_ceval.h:73
73      ./Include/internal/pycore_ceval.h: Directory not empty.
(gdb)
#6  gen_send_ex2 (closing=0, exc=0, presult=<synthetic pointer>, arg=0x0, gen=0x7ffff48402e0) at Objects/genobject.c:219
219     Objects/genobject.c: Directory not empty.
(gdb)
#7  gen_iternext (gen=0x7ffff48402e0) at Objects/genobject.c:594

@pablogsal
Copy link
Member

CC: @markshannon @brandtbucher

@pablogsal
Copy link
Member

I can confirm that still happens with 3.11 current.

@brandtbucher
Copy link
Member

Thanks, I'll look at this today (Mark's out right now).

@brandtbucher brandtbucher self-assigned this Nov 23, 2022
@graingert
Copy link
Contributor Author

I've now eliminated pytest from my reproducer: graingert/segfault-repro@34b5962

@brandtbucher
Copy link
Member

I've now eliminated pytest from my reproducer: graingert/segfault-repro@34b5962

Oh thank God. You have no idea how helpful this is.

@brandtbucher
Copy link
Member

brandtbucher commented Nov 23, 2022

@graingert, how should I run the new reproducer? I'm unable to get 3.11 head to crash with pip install -r requirements.txt; python test_ssltransport.py.

@brandtbucher
Copy link
Member

I see the segfault intermittently when running pytest, though.

@brandtbucher
Copy link
Member

In the meantime, I'm using pytest to reproduce. I've gotten test_ssltransport.py down to 270 lines, removed __init__.py entirely, and gotten pyproject.toml down to just:

[tool.pytest.ini_options]
filterwarnings = ["error"]

@brandtbucher
Copy link
Member

Let me know if you want me to open a PR with the simplified reproducers (or, even better, if you want me to just push them).

@graingert
Copy link
Contributor Author

@graingert, how should I run the new reproducer? I'm unable to get 3.11 head to crash with pip install -r requirements.txt; python test_ssltransport.py.

You need to run it with python -W error

@brandtbucher
Copy link
Member

Thanks, I can reproduce using my minimized file now. Continuing to rip stuff out bit by bit...

@brandtbucher
Copy link
Member

Down to single file, 100 lines, no context managers or generators (except for any in SSLTransport of course).

@brandtbucher
Copy link
Member

brandtbucher commented Nov 23, 2022

This reproduces consistently:

# python crasher.py

import sys
import threading

class DelRaises:
    def __del__(self):
        assert False

class MyThread(threading.Thread):
    def run(self):
        _ = DelRaises()

EXCEPTION = None
def capture_error(unraisable):
    global EXCEPTION
    EXCEPTION = unraisable.exc_value

sys.unraisablehook = capture_error
print("Running...")
MyThread().start()
print("Crashing...")

If I remove the assignment to _, the segfault goes away. So I think this is happening during the teardown of the run frame, when DelRaises is deallocated.

@pablogsal
Copy link
Member

Humm, seems that this is a double free:

frame #0: 0x000000010009e57c python.exe`frame_dealloc(f=0x0000000100e6cad0) at frameobject.c:854:5 [opt]
   851 	    /* It is the responsibility of the owning generator/coroutine
   852 	     * to have cleared the generator pointer */
   853
-> 854 	    assert(f->f_frame->owner != FRAME_OWNED_BY_GENERATOR ||
   855 	        _PyFrame_GetGenerator(f->f_frame)->gi_frame_state == FRAME_CLEARED);
   856
   857 	    if (_PyObject_GC_IS_TRACKED(f)) {
(lldb) p *f
(PyFrameObject) $4 = {
  ob_base = {
    ob_refcnt = 0
    ob_type = 0x0000000100346a80
  }
  f_back = NULL
  f_frame = 0x0000000100978110
  f_trace = NULL
  f_lineno = 0
  f_trace_lines = '\x01'
  f_trace_opcodes = '\0'
  f_fast_as_locals = '\0'
  _f_frame_data = {
    [0] = 0xcdcdcdcdcdcdcdcd
  }
}

@pablogsal
Copy link
Member

I think is not about f->f_frame->owner but about a missing incref somewhere

@brandtbucher
Copy link
Member

Where are the generators, though? In threading.py? See my new reproducer above.

@pablogsal
Copy link
Member

The segfaults happen when destroying the exception in BaseException_clear

@pablogsal
Copy link
Member

Where are the generators, though? In threading.py? See my new reproducer above.

I am using your reproducer

@brandtbucher
Copy link
Member

Oh wait, this may not be generators. I see.

@pablogsal
Copy link
Member

This is the stack:

    frame #0: 0x000000010009e57c python.exe`frame_dealloc(f=0x0000000100e6cad0) at frameobject.c:854:5 [opt]
    frame #1: 0x00000001000cb998 python.exe`_Py_Dealloc(op=0x0000000100e6cad0) at object.c:2389:5 [opt]
    frame #2: 0x00000001001a8bdc python.exe`Py_DECREF(filename=<unavailable>, lineno=<unavailable>, op=<unavailable>) at object.h:527:9 [opt]
    frame #3: 0x00000001001a8a6c python.exe`Py_XDECREF(op=<unavailable>) at object.h:602:9 [opt]
    frame #4: 0x00000001001a8628 python.exe`tb_dealloc(tb=0x0000000100c95b30) at traceback.c:174:5 [opt]
    frame #5: 0x00000001000cb998 python.exe`_Py_Dealloc(op=0x0000000100c95b30) at object.c:2389:5 [opt]
    frame #6: 0x000000010008ef28 python.exe`Py_DECREF(filename=<unavailable>, lineno=<unavailable>, op=<unavailable>) at object.h:527:9 [opt]
    frame #7: 0x0000000100090ba8 python.exe`BaseException_clear(self=0x0000000100e891d0) at exceptions.c:88:5 [opt]
  * frame #8: 0x0000000100090928 python.exe`BaseException_dealloc(self=0x0000000100e891d0) at exceptions.c:102:5 [opt]
    frame #9: 0x00000001000cb998 python.exe`_Py_Dealloc(op=0x0000000100e891d0) at object.c:2389:5 [opt]
    frame #10: 0x00000001000b4fb0 python.exe`Py_DECREF(filename=<unavailable>, lineno=<unavailable>, op=<unavailable>) at object.h:527:9 [opt]
    frame #11: 0x00000001000b73dc python.exe`Py_XDECREF(op=<unavailable>) at object.h:602:9 [opt]
    frame #12: 0x00000001000bb470 python.exe`free_keys_object(keys=0x0000000100c7db40) at dictobject.c:664:13 [opt]
    frame #13: 0x00000001000b60f4 python.exe`dictkeys_decref(dk=<unavailable>) at dictobject.c:324:9 [opt]
    frame #14: 0x00000001000b6070 python.exe`PyDict_Clear(op=0x0000000100c34950) at dictobject.c:0 [opt]
    frame #15: 0x00000001000b8a90 python.exe`dict_tp_clear(op=<unavailable>) at dictobject.c:3566:5 [opt]
    frame #16: 0x00000001001b7ff4 python.exe`delete_garbage(tstate=0x0000000100456220, gcstate=0x000000010043c128, collectable=0x000000016fdfeaf0, old=0x000000010043c170) at gcmodule.c:1013:24 [opt]
    frame #17: 0x00000001001b5a74 python.exe`gc_collect_main(tstate=0x0000000100456220, generation=2, n_collected=0x0000000000000000, n_uncollectable=0x0000000000000000, nofail=1) at gcmodule.c:1287:5 [opt]
    frame #18: 0x00000001001b5814 python.exe`_PyGC_CollectNoFail(tstate=<unavailable>) at gcmodule.c:2110:9 [opt]
    frame #19: 0x0000000100190868 python.exe`finalize_modules(tstate=0x0000000100456220) at pylifecycle.c:1598:5 [opt]
    frame #20: 0x000000010019059c python.exe`Py_FinalizeEx at pylifecycle.c:1831:5 [opt]
    frame #21: 0x00000001001b4314 python.exe`Py_RunMain at main.c:682:9 [opt]
    frame #22: 0x00000001001b4588 python.exe`pymain_main(args=0x000000016fdfec80) at main.c:710:12 [opt]
    frame #23: 0x00000001001b45cc python.exe`Py_BytesMain(argc=<unavailable>, argv=<unavailable>) at main.c:734:12 [opt]
    frame #24: 0x00000001000023b8 python.exe`main(argc=<unavailable>, argv=<unavailable>) at python.c:15:12 [opt]
    frame #25: 0x000000010077108c dyld`start + 520

@brandtbucher
Copy link
Member

I just updated the reproducer a minute ago, not sure if you saw.

@pablogsal
Copy link
Member

So the frame in the exception in the unraisable hook has been destroyed already

@brandtbucher
Copy link
Member

Wanna move this to discord? I feel like we're chatting in real-time now.

@pablogsal
Copy link
Member

No, but same thing:

(lldb) r
Process 9870 launched: '/Users/pgalindo3/github/python/3.11/python.exe' (arm64)
Running...
Crashing...
Process 9870 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x100978155)
    frame #0: 0x000000010009e57c python.exe`frame_dealloc(f=0x0000000100e66f90) at frameobject.c:854:5 [opt]
   851 	    /* It is the responsibility of the owning generator/coroutine
   852 	     * to have cleared the generator pointer */
   853
-> 854 	    assert(f->f_frame->owner != FRAME_OWNED_BY_GENERATOR ||
   855 	        _PyFrame_GetGenerator(f->f_frame)->gi_frame_state == FRAME_CLEARED);
   856
   857 	    if (_PyObject_GC_IS_TRACKED(f)) {

and stack:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x100978155)
  * frame #0: 0x000000010009e57c python.exe`frame_dealloc(f=0x0000000100e66f90) at frameobject.c:854:5 [opt]
    frame #1: 0x00000001000cb998 python.exe`_Py_Dealloc(op=0x0000000100e66f90) at object.c:2389:5 [opt]
    frame #2: 0x000000010009ea98 python.exe`Py_DECREF(filename=<unavailable>, lineno=<unavailable>, op=<unavailable>) at object.h:527:9 [opt]
    frame #3: 0x000000010009e6a8 python.exe`frame_dealloc(f=0x0000000100f2c890) at frameobject.c:878:5 [opt]
    frame #4: 0x00000001000cb998 python.exe`_Py_Dealloc(op=0x0000000100f2c890) at object.c:2389:5 [opt]
    frame #5: 0x00000001001a8bdc python.exe`Py_DECREF(filename=<unavailable>, lineno=<unavailable>, op=<unavailable>) at object.h:527:9 [opt]
    frame #6: 0x00000001001a8a6c python.exe`Py_XDECREF(op=<unavailable>) at object.h:602:9 [opt]
    frame #7: 0x00000001001a8628 python.exe`tb_dealloc(tb=0x0000000100eff930) at traceback.c:174:5 [opt]
    frame #8: 0x00000001000cb998 python.exe`_Py_Dealloc(op=0x0000000100eff930) at object.c:2389:5 [opt]
    frame #9: 0x000000010008ef28 python.exe`Py_DECREF(filename=<unavailable>, lineno=<unavailable>, op=<unavailable>) at object.h:527:9 [opt]
    frame #10: 0x0000000100090ba8 python.exe`BaseException_clear(self=0x0000000100f1d780) at exceptions.c:88:5 [opt]
    frame #11: 0x0000000100090928 python.exe`BaseException_dealloc(self=0x0000000100f1d780) at exceptions.c:102:5 [opt]
    frame #12: 0x00000001000cb998 python.exe`_Py_Dealloc(op=0x0000000100f1d780) at object.c:2389:5 [opt]
    frame #13: 0x00000001000b4fb0 python.exe`Py_DECREF(filename=<unavailable>, lineno=<unavailable>, op=<unavailable>) at object.h:527:9 [opt]
    frame #14: 0x00000001000b73dc python.exe`Py_XDECREF(op=<unavailable>) at object.h:602:9 [opt]
    frame #15: 0x00000001000bb470 python.exe`free_keys_object(keys=0x0000000100e7fb50) at dictobject.c:664:13 [opt]
    frame #16: 0x00000001000b60f4 python.exe`dictkeys_decref(dk=<unavailable>) at dictobject.c:324:9 [opt]
    frame #17: 0x00000001000b6070 python.exe`PyDict_Clear(op=0x0000000100e34950) at dictobject.c:0 [opt]
    frame #18: 0x00000001000b8a90 python.exe`dict_tp_clear(op=<unavailable>) at dictobject.c:3566:5 [opt]
    frame #19: 0x00000001001b7ff4 python.exe`delete_garbage(tstate=0x0000000100456220, gcstate=0x000000010043c128, collectable=0x000000016fdfeaf0, old=0x000000010043c170) at gcmodule.c:1013:24 [opt]
    frame #20: 0x00000001001b5a74 python.exe`gc_collect_main(tstate=0x0000000100456220, generation=2, n_collected=0x0000000000000000, n_uncollectable=0x0000000000000000, nofail=1) at gcmodule.c:1287:5 [opt]
    frame #21: 0x00000001001b5814 python.exe`_PyGC_CollectNoFail(tstate=<unavailable>) at gcmodule.c:2110:9 [opt]
    frame #22: 0x0000000100190868 python.exe`finalize_modules(tstate=0x0000000100456220) at pylifecycle.c:1598:5 [opt]
    frame #23: 0x000000010019059c python.exe`Py_FinalizeEx at pylifecycle.c:1831:5 [opt]
    frame #24: 0x00000001001b4314 python.exe`Py_RunMain at main.c:682:9 [opt]
    frame #25: 0x00000001001b4588 python.exe`pymain_main(args=0x000000016fdfec80) at main.c:710:12 [opt]
    frame #26: 0x00000001001b45cc python.exe`Py_BytesMain(argc=<unavailable>, argv=<unavailable>) at main.c:734:12 [opt]
    frame #27: 0x00000001000023b8 python.exe`main(argc=<unavailable>, argv=<unavailable>) at python.c:15:12 [opt]
    frame #28: 0x000000010077108c dyld`start + 520

@pablogsal
Copy link
Member

Wanna move this to discord? I feel like we're chatting in real-time now.

Let's do discord

@brandtbucher
Copy link
Member

brandtbucher commented Nov 23, 2022

No unraisablehook:

import sys
import threading

class DelRaises:
    def __del__(self):
        global sneaky
        sneaky = sys._getframe()

class MyThread(threading.Thread):
    def run(self):
        _ = DelRaises()

print("Running...")
MyThread().start()
print("Crashing...")

@brandtbucher brandtbucher changed the title segfault in unraisablehook Frame teardown can create frame objects Nov 23, 2022
@brandtbucher
Copy link
Member

brandtbucher commented Nov 23, 2022

To recap, the issue is that we clear the frame object and transfer ownership of an exiting frame before clearing the locals... but as we've seen, clearing the locals can make a new frame object.

One solution is to pop the frame off of the thread state (by setting cframe.current_frame = cframe.current_frame->previous) before clearing it. That way a new frame object cannot be created during teardown. This patch (on the 3.11 branch) fixes the issue:

diff --git a/Python/ceval.c b/Python/ceval.c
index 8cbe838ddf..3be38934c1 100644
--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -1617,14 +1617,6 @@ trace_function_exit(PyThreadState *tstate, _PyInterpreterFrame *frame, PyObject
     return 0;
 }
 
-static _PyInterpreterFrame *
-pop_frame(PyThreadState *tstate, _PyInterpreterFrame *frame)
-{
-    _PyInterpreterFrame *prev_frame = frame->previous;
-    _PyEvalFrameClearAndPop(tstate, frame);
-    return prev_frame;
-}
-
 /* It is only between the PRECALL instruction and the following CALL,
  * that this has any meaning.
  */
@@ -2441,7 +2433,9 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
             DTRACE_FUNCTION_EXIT();
             _Py_LeaveRecursiveCallTstate(tstate);
             if (!frame->is_entry) {
-                frame = cframe.current_frame = pop_frame(tstate, frame);
+                cframe.current_frame = frame->previous;
+                _PyEvalFrameClearAndPop(tstate, frame);
+                frame = cframe.current_frame;
                 _PyFrame_StackPush(frame, retval);
                 goto resume_frame;
             }
@@ -5833,7 +5827,9 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
         assert(tstate->cframe->current_frame == frame->previous);
         return NULL;
     }
-    frame = cframe.current_frame = pop_frame(tstate, frame);
+    cframe.current_frame = frame->previous;
+    _PyEvalFrameClearAndPop(tstate, frame);
+    frame = cframe.current_frame;
 
 resume_with_error:
     SET_LOCALS_FROM_FRAME();
diff --git a/Python/frame.c b/Python/frame.c
index d8f2f801f3..792974b50d 100644
--- a/Python/frame.c
+++ b/Python/frame.c
@@ -137,7 +137,7 @@ _PyFrame_Clear(_PyInterpreterFrame *frame)
     for (int i = 0; i < frame->stacktop; i++) {
         Py_XDECREF(frame->localsplus[i]);
     }
-    Py_XDECREF(frame->frame_obj);
+    assert(frame->frame_obj == NULL);
     Py_XDECREF(frame->f_locals);
     Py_DECREF(frame->f_func);
     Py_DECREF(frame->f_code)

However, it's unclear to me to what extent generators are affected (if at all), and also how multiple threads are interacting here. It might make more sense to modify _PyFrame_Clear to clear the frame_obj and possibly transfer ownership of itself after clearing its locals... but I'm not sure if that would actually work, since clearing other parts of the frame (like the function) can have the same effect, and by that point the frame is in a totally invalid state.

Either way, I need to take a break for a bit. I might be back on this in a few hours, or at least by next Tuesday. Hopefully this gives Mark and @pablogsal enough breadcrumbs to move forward.

@pablogsal
Copy link
Member

pablogsal commented Nov 30, 2022

I still don't undetstand why this doesn't crash:

import sys

class DelRaises:
    def __del__(self):
        global sneaky
        sneaky = sys._getframe()

def run():
    _ = DelRaises()

print("Running...")
run()
print("Crashing...")

Unfortunately, I am currently dealing with a health issue so I could not find time to deal with this in detail :(

@brandtbucher
Copy link
Member

Sorry, to hear that. :(

I was able to figure this out yesterday, but was waiting to share in the meeting. I’ll post here in case you can’t come:

I was able to get a single-threaded version to crash, too. It’s just easier to do with multiple threads since the frame object has a stale pointer into the thread’s frame stack. In our threaded example, after the thread finishes, the frame stack ceases to exist, so things can go south pretty quickly. When single-threaded, we just have a pointer into an old part of the current chunk, so things appear valid for longer if you don’t try to do much.

I’m on my phone now, but if I remember correctly, doing something like sneaky.f_back.f_locals after the run() makes the single-threaded version crash.

Basically, we should mark this frame as “incomplete” (or just unlink it, not sure yet which is easier) after checking for existing frame objects, but before clearing stuff. This restores the behavior of 3.10, which made it look like __del__ was called from run’s caller, not from run itself.

I’ll have a PR up for review this week.

@pablogsal
Copy link
Member

Marking this as 3.11 release blocker

@pablogsal
Copy link
Member

pablogsal commented Dec 6, 2022

PRs merged so I am closing the issue. Thanks a lot to everyone that participated in this bug, from identifying it to fixing it. You all rock 🤘

Repository owner moved this from Todo to Done in Release and Deferred blockers 🚫 Dec 6, 2022
@brandtbucher
Copy link
Member

I've confirmed that the 3.11 patch fixes the full original reproducer, as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Development

No branches or pull requests

3 participants