Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: 2.11.0 regression on PyPy3: pybind11::handle::dec_ref() is being called while the GIL is either not held or invalid. while running tests #4748

Closed
2 of 3 tasks
mgorny opened this issue Jul 15, 2023 · 5 comments · Fixed by #4751
Labels
triage New bug, unverified

Comments

@mgorny
Copy link
Contributor

mgorny commented Jul 15, 2023

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

2.11.0

Problem description

At the very end of pybind11's test suite I'm getting the following exception:

pybind11::handle::dec_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
The failing pybind11::handle::dec_ref() call was triggered on a type object.
terminate called after throwing an instance of 'std::runtime_error'
  what():  pybind11::handle::dec_ref() PyGILState_Check() failure.

This is with PyPy3.10 7.3.12. In pybind11 2.10.4, the test suite passes without any exceptions.

Full build/test log: dev-python:pybind11-2.11.0:20230715-024418.log

I have to leave now,, I can investigate further later.

Reproducible example code

mkdir build
cd build
pypy3 -m venv .venv
. .venv/bin/activate
pip install -r ../tests/requirements.txt
cmake -DPYBIND11_LTO_CXX_FLAGS= -DPYBIND11_TEST=ON -DCMAKE_BUILD_TYPE=Debug -G Ninja ..
ninja
ninja check

Is this a regression? Put the last known working version here if it is.

2.10.4

@mgorny mgorny added the triage New bug, unverified label Jul 15, 2023
@mgorny
Copy link
Contributor Author

mgorny commented Jul 15, 2023

bisect blames the following commit:

commit 65374c8
Author: Ralf W. Grosse-Kunstleve [email protected]
Date: Thu Dec 8 22:06:51 2022 -0800

`pybind11::handle` `inc_ref()` & `dec_ref()` `PyGILState_Check()` **excluding** `nullptr` (#4246)

* pybind11/pytypes.h `inc_ref()`, `dec_ref()` `PyGILState_Check()` **excluding** `nullptr`

Guarded by `PYBIND11_ASSERT_GIL_HELD_INCREF_DECREF`

* Disable `PYBIND11_ASSERT_GIL_HELD_INCREF_DECREF` for PyPy under Windows.

* Add reference to PR #4268 (PyPy Windows)

include/pybind11/detail/common.h | 9 +++++++++
include/pybind11/pytypes.h | 10 ++++++++++
2 files changed, 19 insertions(+)

@mgorny
Copy link
Contributor Author

mgorny commented Jul 15, 2023

FWICS you don't have to actually run any tests to trigger it — it's sufficient to import pybind11_tests extension:

$ PYTHONPATH=tests pypy3 -c 'import pybind11_tests'
pybind11::handle::dec_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
The failing pybind11::handle::dec_ref() call was triggered on a type object.
terminate called after throwing an instance of 'std::runtime_error'
  what():  pybind11::handle::dec_ref() PyGILState_Check() failure.
Aborted (core dumped)

Apparently the problem occurs when destructors are called while the interpreter is exiting:

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007f4c84a8f00f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007f4c84a3fef2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f4c84a294ad in __GI_abort () at abort.c:79
#4  0x00007f4c828a0c53 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#5  0x00007f4c828b3306 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#6  0x00007f4c828b2289 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#7  0x00007f4c828b2a42 in __gxx_personality_v0 () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#8  0x00007f4c84c04b6b in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libgcc_s.so.1
#9  0x00007f4c84c05464 in _Unwind_Resume () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libgcc_s.so.1
#10 0x00007f4c82d4b913 in std::allocator<char>::~allocator (this=<optimized out>, __in_chrg=<optimized out>)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/allocator.h:184
#11 pybind11::handle::dec_ref() const & (this=0x7f4c83afce88 <pybind11::detail::get_exception_object<MyException5_1>()::ex>)
    at /tmp/pybind11/include/pybind11/detail/../detail/../pytypes.h:270
#12 0x00007f4c82d4bb6e in pybind11::object::~object (
    this=0x7f4c83afce88 <pybind11::detail::get_exception_object<MyException5_1>()::ex>, __in_chrg=<optimized out>)
    at /tmp/pybind11/include/pybind11/detail/../detail/../pytypes.h:356
#13 0x00007f4c830127c4 in pybind11::exception<MyException5_1>::~exception (
    this=0x7f4c83afce88 <pybind11::detail::get_exception_object<MyException5_1>()::ex>, __in_chrg=<optimized out>)
    at /tmp/pybind11/include/pybind11/pybind11.h:2536
#14 0x00007f4c84a425a5 in __run_exit_handlers (status=0, listp=0x7f4c84bdc840 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:111
#15 0x00007f4c84a426fa in __GI_exit (status=<optimized out>) at exit.c:141
#16 0x00007f4c84a2a991 in __libc_start_call_main (main=main@entry=0x55dc073144d0 <main>, argc=argc@entry=3, 
    argv=argv@entry=0x7ffe2ee89aa8) at ../sysdeps/nptl/libc_start_call_main.h:74
#17 0x00007f4c84a2aa45 in __libc_start_main_impl (main=0x55dc073144d0 <main>, argc=3, argv=0x7ffe2ee89aa8, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe2ee89a98) at ../csu/libc-start.c:360
#18 0x000055dc07314501 in _start ()

@mgorny mgorny changed the title [BUG]: 2.11.0 regression on PyPy3.10: pybind11::handle::dec_ref() is being called while the GIL is either not held or invalid. while running tests [BUG]: 2.11.0 regression on PyPy3: pybind11::handle::dec_ref() is being called while the GIL is either not held or invalid. while running tests Jul 15, 2023
@mgorny
Copy link
Contributor Author

mgorny commented Jul 15, 2023

I have also reproduced this with PyPy3.9 (7.3.12 also).

@FindDefinition
Copy link

FindDefinition commented Jul 18, 2023

This bug also exists in cpython 3.8 with pybind 2.11.1, with following bind functions:

void BoxOps::transform_pc(pybind11::array_t<float> out_points, pybind11::array_t<float> points, pybind11::array_t<float> mat)   {
  
  using DType = float;
  auto N = points.shape(0);
  auto out_points_data = out_points.mutable_data();
  auto points_data = points.mutable_data();
  auto stride = points.shape(1);
  DType* out_ptr, *ptr;
  DType x, y, z;
  DType m00 = mat.at(0, 0);
  DType m01 = mat.at(0, 1);
  DType m02 = mat.at(0, 2);
  DType m10 = mat.at(1, 0);
  DType m11 = mat.at(1, 1);
  DType m12 = mat.at(1, 2);
  DType m20 = mat.at(2, 0);
  DType m21 = mat.at(2, 1);
  DType m22 = mat.at(2, 2);
  if (mat.shape(0) == 3 && mat.shape(1) == 3){
      for (size_t i = 0; i < N; ++i){
          out_ptr = out_points_data + i * stride;
          ptr = points_data + i * stride;
          x = ptr[0];
          y = ptr[1];
          z = ptr[2];
          out_ptr[0] = m00 * x + m01 * y + m02 * z;
          out_ptr[1] = m10 * x + m11 * y + m12 * z;
          out_ptr[2] = m20 * x + m21 * y + m22 * z;
      }
  }else{
      DType m03 = mat.at(0, 3);
      DType m13 = mat.at(1, 3);
      DType m23 = mat.at(2, 3);
      for (size_t i = 0; i < N; ++i){
          out_ptr = out_points_data + i * stride;
          ptr = points_data + i * stride;
          x = ptr[0];
          y = ptr[1];
          z = ptr[2];
          out_ptr[0] = m00 * x + m01 * y + m02 * z + m03;
          out_ptr[1] = m10 * x + m11 * y + m12 * z + m13;
          out_ptr[2] = m20 * x + m21 * y + m22 * z + m23;
      }
  }
}
  module_BoxOps.def_static("transform_pc", &csrc::boxops::BoxOps::transform_pc, pybind11::arg("out_points"), pybind11::arg("points"), pybind11::arg("mat"), pybind11::return_value_policy::automatic, pybind11::call_guard<pybind11::gil_scoped_release>());

@rwgk
Is this a expected behavior? We can't release GIL with all functions include array_t argument?

@rwgk
Copy link
Collaborator

rwgk commented Jul 18, 2023

Is this a expected behavior?

Yes, almost certainly.

We can't release GIL with all functions include array_t argument?

Not like your doing it right now (assuming you don't want undefined behavior). You have to be more careful: you have to ensure you're holding the GIL again before the destructors run. This is a classical trap that is very easy to fall into (I had a couple myself before I added the guards). The fix is probably subtle. Maybe (!), don't release the GIL with pybind11::call_guard<>, but divide the body of your transform_pc() function into a first part that constructs the temporary py::object (of some sort), then a subscope (simply { ... }) with py::gil_scoped_release release; at the top. That way, when the subscope goes of of scope, the py::object destructors don't run before the GIL is acquired again.

Inside the subscope you cannot have any code calling back into the Python C API. The simple guards we have just cover the most common oversights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage New bug, unverified
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants