-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-113055: Use pointer for interp->obmalloc state. #113412
Conversation
For interpreters that share state with the main interpreter, this points to the same static memory structure. For interpreters with their own obmalloc state, it is heap allocated. Add free_obmalloc_arenas() which will free the obmalloc arenas and radix tree structures for interpreters with their own obmalloc state.
This should improve the situtation for gh-113055. There will still be memory leaks if interpreters don't free their memory (or load extensions who don't free their memory). However, when the Py_RTFLAGS_USE_MAIN_OBMALLOC flag is set (the default, I think), the behaviour will be much like it was in 3.11. I.e. interpreters use the main interpreter obmalloc state. The obmalloc state for the main interpreter is back to being a static structure in the BSS, as it was in 3.11. My guess is the |
The obmalloc_state_main global is okay and should be ignored.
It looks like the performance different for And after: The counts in the left hand column are then number of samples that hit that instruction. My quick-and-dirty benchmark was: ./Programs/_testembed test_repeated_init_exec "$(cat bench_embed.py)" with
I didn't build with LTO and PGO, that would change things a bit I would guess. |
With PGO and LTO turned on, the code becomes quite jumbled up and hard to follow. |
Non-debug builds have memory leaks on Windows even using 33e0adb. At least, The same goes for Python 3.12.1 with/without this patch applied. EDIT: Observed with |
Bisected to: ea2c001, which this PR does not fully affect on my non-debug build. |
I think this PR should get merged. It should be safe and it reduces memory used by sub-interpreters that share the main interpreter obmalloc state. It also eliminates a memory leak in the case of sub-interpreters that don't share the state. See |
@Yhg1s Any thoughts? |
@ericsnowcurrently Have you reviewed this? Do you feel it's ready to be merged? |
I'm planning on taking a look in the next day or two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! I think this is basically right. I didn't see any correctness issues.
Clearly I did leave a bunch of comments, but it's mostly little things like alternate spellings, whether or not to statically initialize obmalloc_state_main
, and whether or not we really have to make PyInterpreterState.obmalloc
a pointer.
When you're done making the requested changes, leave the comment: |
Also, weren't you considering freeing the blocks that don't have any objects in them any more, to reduce how much leaks? |
Also, what do you think our options are for backporting this to 3.12? |
I would prefer to use a pointer. I think it's cleaner/faster and I think the memory savings, while not huge, are non-trivial. On 64-bit platforms, |
Possible but a little scary. It's not a lot of code but it changes behavior quite a bit. Up to the release manager, I guess. The number of people affected by the memory usage/leaks is likely to be really small. At least, that's my guess. So maybe they need to skip 3.12 and go from 3.11 to 3.13 it this is problem for them. @neonene might have another perspective. |
Sounds good. FWIW, subinterpreters that use the main obmalloc state supported for backward compatibility, are already uncommon, and I expect they will be more so as we move forward. So I wouldn't worry about factoring them into the decision very much. |
That's an idea. A bit of code to do that but probably not horrible. Another idea would be to add an If it is safe to enable gh-113601 then maybe it is safe to just free the arenas as well. I think the same logic applies although it seems more likely freeing the arenas would be more likely to trigger a use-after-free bug. |
Co-authored-by: Eric Snow <[email protected]>
- Remove 'runtime' argument. - Remove 'initialized' and 'heap_allocated' members. - Use _Py_IsMainInterpreter() - Remove spurious whitespace change. - Return -1 on error and 0 on success, as normal convention.
Thanks for the detailed review feedback. I think I've addressed all of your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for doing this! |
Co-authored-by: Eric Snow <[email protected]>
@Yhg1s, what are your thoughts on backporting this to 3.12? |
) For interpreters that share state with the main interpreter, this points to the same static memory structure. For interpreters with their own obmalloc state, it is heap allocated. Add free_obmalloc_arenas() which will free the obmalloc arenas and radix tree structures for interpreters with their own obmalloc state. Co-authored-by: Eric Snow <[email protected]>
This fixes a segfault happening with 3.12, so it would be great if this could be backported to 3.12. I posted more details about the crash in #116510. |
Just wanted to offer that this issue appears to impact Kodi, which has a considerable user base in the millions. The number of users affected by the problem could be quite substantial. |
ping @Yhg1s |
) For interpreters that share state with the main interpreter, this points to the same static memory structure. For interpreters with their own obmalloc state, it is heap allocated. Add free_obmalloc_arenas() which will free the obmalloc arenas and radix tree structures for interpreters with their own obmalloc state. Co-authored-by: Eric Snow <[email protected]>
For interpreters that share state with the main interpreter, this points to the same static memory structure. For interpreters with their own obmalloc state, it is heap allocated. Add free_obmalloc_arenas() which will free the obmalloc arenas and radix tree structures for interpreters with their own obmalloc state.
Note that the free is only done if obmalloc used blocks is zero. If some blocks are used, it's not safe to free the memory since some extension might still be using it.