Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change implementation of the __init__() must be called when overriding __init__ safety feature to work for any metaclass. #30095

Merged
merged 16 commits into from
Feb 1, 2024

Conversation

rwgk
Copy link
Contributor

@rwgk rwgk commented Jan 26, 2024

Description

The safety feature introduced with pybind/pybind11#2152 (__init__() must be called when overriding __init__ ) only works for the default py::metaclass(). This PR changes the implementation of the safety feature so that it works for any metaclass (except when using PyPy; see below). (This has already uncovered missing super.__init__() calls in jax.)

The main driving force for this PR is that PyCLIF-pybind11 needs to use py::metaclass((PyObject *) &PyType_Type). Without this PR, the equivalent safety feature in PyCLIF would be lost. Note that it was a significant effort to clean up the Google codebase before the PyCLIF safety feature could be enabled, with this PyCLIF commit; i.e. it would be a very concerning loss.

This PR is based on a mechanism originally introduced with that PyCLIF commit. However, the original PyCLIF mechanism lacks the weakref-based cleanup of the internal registry backing the mechanism (derived_tp_init_registry). While this has not been a problem in practice, it is easy enough to add the cleanup feature here. The corresponding code is very similar to existing code here, which is a critical part of the core pybind11 functionality; in other words, it is a heavily tested and time-tested approach.


Details, based on the original PyCLIF commit message linked above:

Situation:

CppBase is a py::class_-wrapped C++ object.

What happens when the Python interpreter processes the following code (usually at import time)?

class PC(CppBase):
  pass

When the native Python PC class is built:

  • PC tp_new is set to use CppBase tp_new, but

  • PC tp_init does NOT in any way involve CppBase tp_init.

It is the responsibility of PC.__init__ to call CppBase.__init__, but this is not checked.

The approach used in this PR is:

  • PC tp_init is replaced with an "intercept" function.

  • The intercept function calls the original PC tp_init.

  • After that call finishes (and if it was successful), the intercept function checks if the CppBase wrapped C++ object was initialized.

Note that the derived_tp_init_registry()->count(type) == 0 condition in pybind11_object_new() enables daisy-chaining similar intercept functions, e.g. in other pybind11 extensions built with hidden visibility, or potentially other binding systems.

The intercept mechanism turns out to not be compatible with PyPy, therefore the safety feature cannot be enabled for metaclasses other than the default py::metaclass().

Suggested changelog entry:

@rwgk rwgk force-pushed the tp_init_intercepted_pywrapcc branch from 7084009 to 7b66ebe Compare January 28, 2024 19:10
Ralf W. Grosse-Kunstleve added 7 commits January 28, 2024 11:40
…in).

```
==6380==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5611589c9a58 in Py_DECREF third_party/python_runtime/v3_11/Include/object.h:537:9
...

  Uninitialized value was created by a heap deallocation
    #0 0x5611552757b0 in free third_party/llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:218:3
    google#1 0x56115898e06b in _PyMem_RawFree third_party/python_runtime/v3_11/Objects/obmalloc.c:154:5
    google#2 0x56115898f6ad in PyObject_Free third_party/python_runtime/v3_11/Objects/obmalloc.c:769:5
    google#3 0x561158271bcc in PyObject_GC_Del third_party/python_runtime/v3_11/Modules/gcmodule.c:2407:5
    google#4 0x7f21224b070c in pybind11_object_dealloc third_party/pybind11/include/pybind11/detail/class.h:483:5
    google#5 0x5611589c2ed0 in subtype_dealloc third_party/python_runtime/v3_11/Objects/typeobject.c:1463:5
...
```
copybara-service bot pushed a commit to jax-ml/jax that referenced this pull request Jan 30, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602787434
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 30, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602787434
copybara-service bot pushed a commit to jax-ml/jax that referenced this pull request Jan 30, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602787434
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 30, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602787434
copybara-service bot pushed a commit to jax-ml/jax that referenced this pull request Jan 30, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602787434
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 30, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602787434
copybara-service bot pushed a commit to jax-ml/jax that referenced this pull request Jan 30, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602787434
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 30, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602787434
copybara-service bot pushed a commit to jax-ml/jax that referenced this pull request Jan 31, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602787434
copybara-service bot pushed a commit to jax-ml/jax that referenced this pull request Jan 31, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602884828
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 31, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602884828
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 31, 2024
…n/sharding.cc

This change is to unblock google/pybind11clif#30095.

Leaving wrapped C++ types uninitialized creates a potential for triggering undefined behavior from Python.

PiperOrigin-RevId: 602884828
@rwgk rwgk changed the title WIP: tp_init_intercepted_pywrapcc Change the implementation of the safety feature introduced with pybind/pybind11#2152 to work for any metaclass. Jan 31, 2024
@rwgk rwgk marked this pull request as ready for review January 31, 2024 18:23
@rwgk rwgk changed the title Change the implementation of the safety feature introduced with pybind/pybind11#2152 to work for any metaclass. Change implementation of the __init__() must be called when overriding __init__ safety feature to work for any metaclass. Jan 31, 2024
@@ -268,7 +282,9 @@ inline PyTypeObject *make_default_metaclass() {
type->tp_base = type_incref(&PyType_Type);
type->tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HEAPTYPE;

#if defined(PYPY_VERSION)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new mechanism does not need to override tp_call?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what is missing in the context is that pybind11_meta_call only conducts protection for base class init, and can thus be skipped on CPYTHON.

Maybe defining a few secondary compilation conditions,

#if defined(PYPY_VERSION)
#define PYBIND11_ENABLE_TP_META_CALL_INIT_PROTECTION
#else
#define PYBIND11_ENABLE_SAFE_TP_INIT_INIT_PROTECTION
#endif

and wrap the two protection mechanisms in their corresponding conditions??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No: Basically, the starting point for the original change in PyCLIF was exactly to find a trick that doesn't involve manipulating the metaclass.

I also thought a little bit about "intercepting" PyType_Type.tp_call, but that would be very intrusive and have a runtime impact for pretty much all calls creating a Python object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe defining a few secondary compilation conditions,

Done, thanks for the suggestion! I chose different names for the macros and made it so that the choice of implementation can be overridden externally.

@@ -340,6 +356,32 @@ inline bool deregister_instance(instance *self, void *valptr, const type_info *t
return ret;
}

using derived_tp_init_registry_type
= std::unordered_map<PyTypeObject *, int (*)(PyObject *, PyObject *, PyObject *)>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about initproc as the value type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer, thanks! (I totally didn't think of that)

from pybind11_tests import python_multiple_inheritance as m

#
# Using default py::metaclass():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# CppBase0 Uses default py::metaclass()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



#
# Using py::metaclass((PyObject *) &PyType_Type):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CppBase1 uses ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// This mechanism was originally developed here:
// https://github.com/google/clif/commit/7cba87dd8385ab707c98e814ce742eeca877eb9e
extern "C" inline int tp_init_intercepted(PyObject *self, PyObject *args, PyObject *kw) {
assert(PyType_Check(self) == 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this condition guaranteed? Looks like we don't ensure it when intercepting tp_init, based on my initial read of the pybind11object_new method.

PS: if it was not for consisitency with pyclif naming, I would recommend calling this safe_tp_init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this condition guaranteed?

It's not. I put the assert() here only to be sure my code isn't doing something unexpected. This PR is already TGP tested, which makes me believe it is very unlikely that the assert() will ever fire, but if it does, I'll have a concrete situation to fix (as opposed to anticipating/guessing).

PS: if it was not for consisitency with pyclif naming,

Name changed to tp_init_with_safety_checks.

(After this PR is merged I'll go back and change the PyCLIF code accordingly. I also want to backport the weakref-based cleanup.)

/// Instance creation function for all pybind11 types. It only allocates space for the
/// C++ object, but doesn't call the constructor -- an `__init__` function must do that.
extern "C" inline PyObject *pybind11_object_new(PyTypeObject *type, PyObject *, PyObject *) {
#if defined(PYBIND11_INIT_SAFETY_CHECKS_VIA_INTERCEPTING_TP_INIT)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the ifdef for pybind11_meta_call should also be in the body like here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided on chat: leaving as is.

@rwgk rwgk merged commit 54f8341 into google:main Feb 1, 2024
147 checks passed
@rwgk rwgk deleted the tp_init_intercepted_pywrapcc branch February 1, 2024 19:02
@rwgk rwgk added the could be backported Could be backported to pybind11 master label Feb 1, 2024
rwgk pushed a commit to google/clif that referenced this pull request Feb 5, 2024
…ges.

In order of importance:

* Bug fix (moderately important): Missing `Py_DECREF(type_self)` in `tp_dealloc_impl`.

  * This bug was triggered with the upgrade to Python to 3.9. It was discovered coincidentally when adding the new `testDerivedTpInitRegistryWeakrefBasedCleanup` (see below). When running the test in a `while True` loop, the Resident Memory Size (`RES` in `top`) exceeded 1 GB after just a few seconds.

  * Root cause: python/cpython#79991

  * Critical clue leading to the fix: https://github.com/pybind/pybind11/blob/768cebe17e65c2a0a64ed067510729efc3c7ff6c/include/pybind11/detail/class.h#L467-L469

* Bug fix (very minor): Incorrect `Py_DECREF(self)` in `tp_init_with_safety_checks`.

  * This bug was introduced with cl/559501787. The `Py_DECREF(self)` was accidentally adopted along with the corresponding pybind11 error message (see link in description of cl/559501787). It was discovered coincidentally by MSAN (heap use after free) while testing [google/pybind11clif#30095](google/pybind11clif#30095).

  * After inspecting https://github.com/python/cpython/blob/b833b00482234cb0f949e6c277ee1bce1a2cbb85/Objects/typeobject.c#L1103-L1106 it became clear that the `Py_DECREF(self)` is indeed incorrect in this situation (it is correct in the original pybind11 sources).

* The weakref-based cleanup added in [google/pybind11clif#30095](google/pybind11clif#30095) is ported back to PyCLIF. — This was the original purpose of this CL.

* `tp_init_intercepted` is renamed to `tp_init_with_safety_checks` for for compatibility with google/pybind11clif#30095.

GitHub testing: #90

PiperOrigin-RevId: 604337502
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
could be backported Could be backported to pybind11 master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants