-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg_resources: filter entries before parsing them #1134
Conversation
pkg_resources/__init__.py
Outdated
entries = ( | ||
entry | ||
for entry in os.listdir(path_item) | ||
if (entry.lower().endswith('.egg-info') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to apply this test as an argument to filter
, such that entries = filter(of_interest, os.listdir(path_item))
? That function of_interest
could then have a docstring describing its purpose. My only worry is that the additional n function calls might obviate the performance advantages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering how long is the parsing of version (50% cost of exec time IIRC), I don't think it can be worse. I'll change that.
f5b8d4d
to
cb989a2
Compare
Pull-request updated, HEAD is now cb989a2 |
Thanks for this. I was going to accept this change, but then I went to extract the of_interest function and realized that the 'only' parameter has to be included. This fact makes all of this code quite complicated. I'm not even quite sure what 'only' is for. Blaming the code, I find it's very old (c721485). I think I want to mull over this one a bit more before rolling it in because it does add complication to the already complicated code, adds redundant operations, and it's not clear to me if the benefit outweighs the cost. |
I feel you @jaraco. Run The branch is in conflict for now, I'll update it. |
`_by_version_descending` is quite heavy as it tries to parse versioning and sort ALL files listed. This is one of the heaviest method. Actually, there might not be any need to parse those file names it there will be ignored in the loop just after, so let filter them before sorting. On my system, this brings down the loading of time of pkg_resources by 20%: $ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)' 5 loops, best of 3: 1.11 sec per loop $ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)' 5 loops, best of 3: 889 msec per loop
Pull-request updated, HEAD is now 4481608 |
cb989a2
to
4481608
Compare
@jd Please have a look at the latest commit (referenced above). I believe it accomplishes the same thing you were after. |
@jaraco looks great, thanks for taking care! |
_by_version_descending
is quite heavy as it tries to parse versioning andsort ALL files listed. This is one of the heaviest method.
Actually, there might not be any need to parse those file names it there will
be ignored in the loop just after, so let filter them before sorting.
On my system, this brings down the loading of time of pkg_resources by 20%:
$ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)'
5 loops, best of 3: 1.11 sec per loop
$ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)'
5 loops, best of 3: 889 msec per loop