Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a ctypes.util function to list loaded shared libraries #119349

Closed
WardBrian opened this issue May 21, 2024 · 16 comments
Closed

Add a ctypes.util function to list loaded shared libraries #119349

WardBrian opened this issue May 21, 2024 · 16 comments
Labels
topic-ctypes type-feature A feature request or enhancement

Comments

@WardBrian
Copy link
Contributor

WardBrian commented May 21, 2024

Feature or enhancement

Proposal:

When writing code which loads dynamic libraries, it is often very useful to be able to query which shared libraries are already in use by the current process. There are a few well-known tricks to do this, but they all end up being platform dependent.

For example, on Linux, you can use dl_iterate_phdr, and it seems that quite a bit of Python code does. This won’t work on macOS or Windows, which provide other functions for this same functionality.

Julia provides this function in the standard library under Libdl.dllist.
A Python re-implementation of the same platform-specific code can be found at GitHub - WardBrian/dllist: List DLLs loaded by the current process . This essentially just wraps the platform specific code in an if-else based on the runtime platform

import dllist
print(dllist.dllist())
# ['linux-vdso.so.1', '/lib/x86_64-linux-gnu/libpthread.so.0', '/lib/x86_64-linux-gnu/libdl.so.2', ...

I would like to take the next step toward adding a similar function to the standard library

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/a-ctypes-function-to-list-all-loaded-shared-libraries/36370

Linked PRs

@WardBrian
Copy link
Contributor Author

If anyone is subscribed to this issue, #122946 is still awaiting a review. I'd appreciate anyone taking a look!

@devinrsmith
Copy link

This would be a great addition!

devinrsmith added a commit to devinrsmith/jpy that referenced this issue Jan 6, 2025
This should be considered a partial improvement related to jpy-consortium#75.

A fuller solution might involve listing which shared objects were actually loaded via [`dl_iterate_phdr`](https://man7.org/linux/man-pages/man3/dl_iterate_phdr.3.html).

python/cpython#119349
@ZeroIntensity
Copy link
Member

@encukou, this has been open for a while and I think we need someone to just make a decision on the API.

The main point of contention has been what to do for errors; I'm in favor of raising an exception, but there's been concern that an exception wouldn't be particularly useful to the user, so a None return has been favored instead (possibly emitting a warning so the error doesn't pass silently). Do you have any preferences?

@encukou
Copy link
Member

encukou commented Jan 28, 2025

Thanks for the ping! Sorry that I missed the discussion here.
I'll find some time to go over the PR. A chunk of new platform-specific code doesn't make me happy as a maintainer, but, it doesn't look like a terribly complicated chunk.

The fact that a None return value is considered raises a red flag for me. What are the use cases for this info? How do you use it after the dllist() call? Is it just for debugging?

@WardBrian
Copy link
Contributor Author

What are the use cases for this info? How do you use it after the dllist() call? Is it just for debugging?

I think debugging is a valid use case, but there are a few others:

  • My original use case involved a package that was creating and loading a lot of dynamic libraries. Users in interactive environments were reporting issues where they would often edit the source of these plugins, ask the package to load them again, and find that their changes were not reflected. This is a limitation of some systems' dlopens, so the best I could do was raise a warning if an already-loaded plugin (e.g, in the return of dllist) was requested again

I also searched through GitHub for existing python code that uses this functionality (usually implemented in an ad-hoc way or only supporting 1 platform)

  • threadpoolctl has similar code which is used for runtime lookup of specific BLAS implementations. This is by far the "biggest" user of such code that I know of
  • nvidia has a package that uses linux-specific code to look up the path for sub-dependencies that were loaded as part of another library loading (in general, a lot of other scattered usages also seem to be around CUDA dependencies)
  • several packages do (or want to, in the case of @devinrsmith above) use these techniques for locating the path to the currently used libpython
  • I can't find it any longer, but I recall at least one package using something similar in their setup.py to determine if an optional library was available. Perhaps they moved away from this since I originally searched.

My personal sense is that all of these use cases already must have a case for if the library they are looking for is not found, and in many cases this case is identical to the desired behavior if lookup fails entirely. E.g., in my warning case, I just omit the warning, so I could write the code as something like if plugin in (dllist() or []):

@ZeroIntensity
Copy link
Member

From the zen:

Errors should never pass silently.

I think this falls under that category.

In userland, it's a lot easier to implement a wrapper that returns None as opposed to a wrapper that raises an exception (because there's no indication as to what failed). The precedent dllist function in Julia also appears to not return a null value on failure, but I'm not familiar with Julia's type system or documentation, so I can't be sure (it might be like Java's type system where null is implicitly applicable to nearly everything).

A reasonable compromise could be to add two functions: dllist, which raises, and something like dllist_wrapped, which returns None, but I'm thinking out loud.

@WardBrian
Copy link
Contributor Author

The precedent dllist function in Julia also appears to not return a null value on failure, but I'm not familiar with Julia's type system or documentation, so I can't be sure (it might be like Java's type system where null is implicitly applicable to nearly everything).

Julia's dllist returns an empty list on failure. I originally did the same, but I'm convinced now that this is basically the worst option, since it is theoretically also a valid return on success (though having 0 libraries loaded would certainly be... unusual)


At any rate, I believe errors would be relatively rare. The following is basically a complete list of what the function could raise, if modified to do so:

  • NotImplementedError on unsupported platforms
  • ctypes.WinError if the Windows apis used fail
  • UnicodeError is possible from os.fsdecode if "The filesystem encoding must guarantee to successfully decode all bytes below 128" does not hold (my understanding is this is not possible on the "big three" OSes, but I may be wrong -- the implementation as of ab739d2 does not catch this)

I think many users would more or less be fine letting these percolate, and others could write a helper or just the more verbose version of my if above like

try:
  loaded = dllist()
except Exception:
  loaded = []
if plugin in loaded:
   ...

So, where I'm at right now is having a slight preference towards returning None, but I'm not married to it. If one choice or the other makes the PR more appealing to merge, I'll jump

@encukou
Copy link
Member

encukou commented Jan 31, 2025

Please raise an exception if the list can't be retrieved. Julia is a great inspiration, but in Python let's raise :)

On a similar theme: the platform APIs include the executable as the first item in the list (sometimes as an empty string). The current PR removes it.
I'd prefer modeling ctypes.util.dllist as a rather thin wrapper around the platform APIs, which would mean retaining that entry. (And mentioning it in the docs.)

NotImplementedError on unsupported platforms

In that case, don't define the dllist function at all. (Unless you can only tell after the function is called.)

@WardBrian
Copy link
Contributor Author

Thanks for taking a look @encukou. I've made those changes in my latest push, along with the more granular comments you left in the PR.

In that case, don't define the dllist function at all. (Unless you can only tell after the function is called.)

We can tell when the libc api we need is available during import, which the previous implementation was doing to just define a stub, so not defining it is easy

@encukou
Copy link
Member

encukou commented Feb 6, 2025

Pinging maintainers of platforms we don't officially support:
PRs/tests to add ctypes.util.dllist (list all loaded shared libraries) on your platform would be welcome, if you'd like to put them upstream.

AIX: @edelsohn, @ayappanec
Solaris: @kulikjak (btw, could you add yourself to the list?), @jcea

@ayappanec
Copy link
Contributor

Thanks for the notification. We will check whether this capability can be brought for AIX platform also.

@WardBrian
Copy link
Contributor Author

Other additional platform notes:

Pyodide-based wasm targets could use some variant of from pyodide_js._module import LDSO; return LDSO.loadedLibsByName.as_object_map().keys() (source), at least until emscripten adds support for the dl_iterate_phdr API: emscripten-core/emscripten#21354

@kulikjak
Copy link
Contributor

kulikjak commented Feb 6, 2025

Hi, I tested the patch on Solaris and everything works out of the box - ctypes.util.dllist lists all the libraries and both newly added tests pass.

(and I will add myself to the list :))

Thank you.

@encukou
Copy link
Member

encukou commented Feb 6, 2025

Yay for Solaris!

Emscripten: Looks like they're considering to add dl_iterate_phdr; I don't think we should add a workaround to Python. (Definitely don't add it to the current PR :)

encukou pushed a commit that referenced this issue Feb 8, 2025
…-122946)

Add function to list the currently loaded libraries to ctypes.util

The dllist() function calls platform-specific APIs in order to
list the runtime libraries loaded by Python and any imported modules.
On unsupported platforms the function may be missing.


Co-authored-by: Eryk Sun <[email protected]>
Co-authored-by: Peter Bierma <[email protected]>
@encukou
Copy link
Member

encukou commented Feb 8, 2025

Merged.
Thank you @WardBrian for the feature, and for your patience!

@encukou encukou closed this as completed Feb 8, 2025
@WardBrian
Copy link
Contributor Author

Thank you all for the discussion and careful review! This was my first time contributing to cpython but I hope not the last

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-ctypes type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

7 participants