Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow efficient read access to PyConfig options #99872

Open
scoder opened this issue Nov 29, 2022 · 11 comments
Open

Allow efficient read access to PyConfig options #99872

scoder opened this issue Nov 29, 2022 · 11 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@scoder
Copy link
Contributor

scoder commented Nov 29, 2022

Documentation

The documentation seems to be geared towards embedding scenarios and modifying the configuration, and I couldn't find an example for simply reading a PyConfig field.

PEP 587 seems similarly unhelpful.

Looking through the CPython sources, it appears that what I want to do is this:

    PyThreadState *tstate = _PyThreadState_GET();
    PyConfig* config = &tstate->interp->config;
    int optimize_flag = config->optimization_level;  // formerly known as Py_OptimizeFlag

I'm pretty sure that this is not the official API way to do it, but it's unclear to me what the right way is, given that PyInterpreterState is an opaque struct and the previously simple access to Py_OptimizeFlag has been deprecated as of Py3.12.

Also, it's unclear what way of access should be used in the limited API.

@scoder scoder added the docs Documentation in the Doc dir label Nov 29, 2022
@scoder
Copy link
Contributor Author

scoder commented Nov 29, 2022

@encukou @vstinner @ncoghlan

@vstinner
Copy link
Member

So far, nobody asked for a public API for that. The latest discussion was in issue #93103 which there was no concrete proposed API. _Py_GetConfig(), _PyInterpreterState_GetConfig() and _PyInterpreterState_GetConfigCopy() are private APIs.

The thing is that most values are only used to initialize Python module attributes, and then the config is no longer update. A good example is PyConfig.module_search_paths used to initialize sys.path, but then if sys.path is modified, PyConfig.module_search_paths is not updated.

An older discussion was issue #78100 "Expose _PyCoreConfig structure to Python". The issue was fixed by using PYTHONHASHSEED environment variable to solve his issue, exposing PyConfig wouldn't help for this issue.

given that PyInterpreterState is an opaque struct

I added _PyInterpreterState_GetConfig() and _PyInterpreterState_GetConfigCopy() for that, but so far, these functions are private.

int optimize_flag = config->optimization_level;  // formerly known as Py_OptimizeFlag

Why not using sys.flags.optimizae to access PyConfig.optimization_level?

@vstinner
Copy link
Member

By the way, Py_OptimizeFlag still exists in Python 3.12 and its value should be the expected value, but this API is deprecated in Python 3.12. Also using it should emit a deprecation warning.

@scoder scoder changed the title Document read access to PyConfig options Allow efficient read access to PyConfig options Nov 30, 2022
@scoder scoder removed the docs Documentation in the Doc dir label Nov 30, 2022
@scoder
Copy link
Contributor Author

scoder commented Nov 30, 2022

By the way, Py_OptimizeFlag still exists in Python 3.12 and its value should be the expected value, but this API is deprecated in Python 3.12. Also using it should emit a deprecation warning.

I understand that it was deprecated in the context of subinterpreters and local configuration. It seems reasonable to go through the thread state instead to read the config value.

If I was to read it from something as complex as sys.flags.optimize, I'd end up reading it once at module init time and storing it in some global variable, just like Py_OptimizeFlag before. Otherwise, guarding assertions would become too expensive. If there was an API function that gave me &(tstate>interp->config) directly, I'd be happy to use it.

I also second the interest of reading the up-to-date configuration, not just something that was initially passed into Python's setup and was then subject to dynamic changes. But it seems to me that PyInterpreterstate.config should be the place where that live configuration lives. So, why not have a PyThreadState_GetInterpreterConfig() function? (Also in the limited API.)

It seems really funny that during the time when PEP 587 decided to make all public global configuration variables obsolete, no-one came and said "wait a minute, they used to be public and might still be of interest to someone – shouldn't we provide users with a way to read the current interpreter configuration?"

given that PyInterpreterState is an opaque struct

I added _PyInterpreterState_GetConfig() and _PyInterpreterState_GetConfigCopy() for that, but so far, these functions are private.

I think I'll use them in CPython >= 3.9 then at least. If Py3.12 gains an official way to read the config, I'll switch to that.

scoder added a commit to scoder/cython that referenced this issue Nov 30, 2022
@scoder
Copy link
Contributor Author

scoder commented Nov 30, 2022

given that PyInterpreterState is an opaque struct

I added _PyInterpreterState_GetConfig() and _PyInterpreterState_GetConfigCopy() for that, but so far, these functions are private.

I think I'll use them in CPython >= 3.9 then at least. If Py3.12 gains an official way to read the config, I'll switch to that.

I just noticed that they are not a complete replacement, because it was previously possible (and reasonable) to read the global variable without holding the GIL. This is no longer possible when going through the PyThreadState. But that seems hardly avoidable if the goal is to get per-interpreter config options.

@vstinner
Copy link
Member

vstinner commented Dec 1, 2022

@gpshead @ericsnowcurrently @corona10: Do you have an option on which API should be added?

PEP 587 makes the PyConfig structure public, so maybe _Py_GetConfig() and _PyInterpreterState_GetConfig(interp) which gives a read-only access to PyConfig is good enough?

Note: "read-only" in C is not well defined with types. There are ways to modify some configuration parameters using _Py_GetConfig(), since const PyConfig* doesn't make everything "read-only". _PyInterpreterState_GetConfigCopy() avoids this problem, but having to copy lists of strings just to get PyConfig.optimization_level integer sounds overkill to me...

It seems really funny that during the time when PEP 587 decided to make all public global configuration variables obsolete, no-one came and said "wait a minute, they used to be public and might still be of interest to someone – shouldn't we provide users with a way to read the current interpreter configuration?"

PyConfig is a C API, but most people use Python with the Python API. In Python, reading sys.flags.optimize is trivial. Object type doesn't matter (no boxing/unbox perfomance issue), importing an module is not an issue, there is no question of reading values without holding the GIL, etc.

PyConfig is an API to initialize Python. As I explained, then you should read the "current" configuration (without PyConfig), since many configuration options can be changed, and so PyConfig becomes outdated.

I think it was @zooba who proposed recently to even remove PyInterpreterState.config to only use current configuration values, like sys.flags.optimize. But they are some practical issues to remove it. I added PyInterpreterState.config because it just makes the C code simpler.

read the global variable without holding the GIL

What's your use case? Which configuration variable do you want to read without holding the GIL?


To avoid inconsistencies, maybe functions like _Py_GetConfig() should update PyConfig from the current configuration (ex: sys.path). It would avoid the annoying problem that _testinternalcapi.set_config(_testinternalcapi.get_config()) resets many configuration option to the initial Python state (ex: sys.path).

I'm talking about this problem:

$ python3.11
Python 3.11.0 (main, Oct 24 2022, 00:00:00) [GCC 12.2.1 20220819 (Red Hat 12.2.1-2)] on linux
>>> import sys, _testinternalcapi
>>> before=list(sys.path)
>>> _testinternalcapi.set_config(_testinternalcapi.get_config())
>>> after=list(sys.path)

>>> len(before)
7
>>> len(after)
3

>>> import pprint
>>> pprint.pprint(before)
['',
 '/usr/lib64/python311.zip',
 '/usr/lib64/python3.11',
 '/usr/lib64/python3.11/lib-dynload',
 '/home/vstinner/.local/lib/python3.11/site-packages',
 '/usr/lib64/python3.11/site-packages',
 '/usr/lib/python3.11/site-packages']
>>> pprint.pprint(after)
['/usr/lib64/python311.zip',
 '/usr/lib64/python3.11',
 '/usr/lib64/python3.11/lib-dynload']

@scoder
Copy link
Contributor Author

scoder commented Dec 1, 2022

PyConfig is a C API, but most people use Python with the Python API.

I'm not sure if that is even true. If you consider the code that runs on user machines, then a lot of that is in extension modules that use the C-API. ISTM that Python's C-API is just as important as its Python APIs.

read the global variable without holding the GIL

What's your use case? Which configuration variable do you want to read without holding the GIL?

Cython reads the Py_OptimizeFlag status when it executes a Python assert statement in user code. You can compile out assertions with C defines, but if you don't, then Py_OptimizeFlag used to be the fastest way to decide whether they should be evaluated or not. And if the condition required the GIL for evaluation, but the GIL is not currently owned by the running code, then reading the config flag first would avoid needlessly acquiring the GIL if assertions are disabled.

Now, as you suggested, the concept of "read-only" is not well established in C, which makes it unclear whether Py_OptimizFlag and friends were meant to be read-only before or not. The documentation is also not conclusive, and although I never tried, these global variables were probably writable in the past. Which is why it used to be reasonable for Cython to read the variable itself and not a local copy. If it now turns out that reading the value becomes more expensive and cannot be done without the GIL any more, and at the same time, we decide that the the config values are not supposed to be modified after initialisation (sys.flags.* are read-only, for instance), then Cython could instead make a local copy of the value at module init time and keep that around, instead of reading the value at need.

I would guess that other candidates are Py_DebugFlag and Py_HashRandomizationFlag, not sure about the others. Can't think of a use case for them right now, at least.

Generally speaking, there should be one place of truth for these values. That used to be the global variables, but now PyInterpreterState.config seems very reasonable. We just need a way to read that, efficiently.

@vstinner
Copy link
Member

vstinner commented Dec 1, 2022

then Py_OptimizeFlag used to be the fastest way to decide whether they should be evaluated or not

Currently, it's no longer possible to change this value at runtime. I suggest you to read the value at startup (ex: when an extension is loaded, in its "init" function) and stores a copy of the value to avoid any GIL or complicated function calls.

Before PyConfig (Python 3.7), it wasn't possible to set sys.flags.optimize in Python neither, but it was possible to set Py_OptimizeFlag in C.

@zooba
Copy link
Member

zooba commented Dec 2, 2022

The most up to date configuration is in the Python API, where it always has been. Most of this is the sys module members, which are very accessible from native code.

To make the canonical source of configuration be a different structure would be a massive internal restructure, and probably a performance regression for users in Python. It's doable, but it's a ton of work.

The point of the PyConfig APIs was to streamline setting these values, because the old approach made it pretty difficult to achieve the outcome you wanted. But they should still be going into the same locations they were in before for anyone relying on them.

@gpshead
Copy link
Member

gpshead commented Dec 3, 2022

I agree, use the sys.flags or other sys APIs and just cache yourself a local copy (during your module init?) of the value you found knowing that PyConfig values in general are initial state, not modified.

If you do find some config value that is modified at runtime and you regularly need to get the current version of that efficiently via a public C API, we can consider a specific API for that. Most of PyConfig is read only. All of it probably should be, so finding things that do change at runtime is worthy of us considering if we're even handling them internally in the right manner.

scoder added a commit to scoder/cython that referenced this issue Mar 28, 2023
scoder added a commit to scoder/cython that referenced this issue Mar 28, 2023
…the old Py_OptimizeFlag. The flag was never meant to be modifiable and thus can be read once at module import time.

See python/cpython#99872 (comment)
scoder added a commit to cython/cython that referenced this issue Mar 31, 2023
* Work around the deprecation of Py_OptimizeFlag in Py3.12 by reading the value from the interpreter's current PyConfig.

See python/cpython#99872

* Avoid access to PyConfig without holding the GIL when trying to read the old Py_OptimizeFlag. The flag was never meant to be modifiable and thus can be read once at module import time.

See python/cpython#99872 (comment)
scoder added a commit to cython/cython that referenced this issue May 24, 2023
* Work around the deprecation of Py_OptimizeFlag in Py3.12 by reading the value from the interpreter's current PyConfig.

See python/cpython#99872

* Avoid access to PyConfig without holding the GIL when trying to read the old Py_OptimizeFlag. The flag was never meant to be modifiable and thus can be read once at module import time.

See python/cpython#99872 (comment)
@vstinner
Copy link
Member

Update: right now, Cython 3.0 fails to build on Python 3.13 since I removed private _PyInterpreterState_GetConfig() and Cython uses it: see issue #107076.

We should provide an efficient replacement for this Cython code:

_PyInterpreterState_GetConfig(__Pyx_PyThreadState_Current->interp)->optimization_level

By the way, I'm not fully comfortable with Parser/pegen.c and Parser/tokenizer.c which use a private API:

#ifdef Py_DEBUG
p->debug = _Py_GetConfig()->parser_debug;
#endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs)
Projects
None yet
Development

No branches or pull requests

5 participants