Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg_resources: filter entries before parsing them #1134

Closed
wants to merge 1 commit into from

Conversation

jd
Copy link
Contributor

@jd jd commented Aug 24, 2017

_by_version_descending is quite heavy as it tries to parse versioning and
sort ALL files listed. This is one of the heaviest method.

Actually, there might not be any need to parse those file names it there will
be ignored in the loop just after, so let filter them before sorting.

On my system, this brings down the loading of time of pkg_resources by 20%:

$ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)'
5 loops, best of 3: 1.11 sec per loop

$ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)'
5 loops, best of 3: 889 msec per loop

entries = (
entry
for entry in os.listdir(path_item)
if (entry.lower().endswith('.egg-info')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to apply this test as an argument to filter, such that entries = filter(of_interest, os.listdir(path_item))? That function of_interest could then have a docstring describing its purpose. My only worry is that the additional n function calls might obviate the performance advantages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering how long is the parsing of version (50% cost of exec time IIRC), I don't think it can be worse. I'll change that.

@jd jd force-pushed the pre-filter-find-on-path branch from f5b8d4d to cb989a2 Compare August 28, 2017 09:12
@jd
Copy link
Contributor Author

jd commented Aug 28, 2017

Pull-request updated, HEAD is now cb989a2

@jaraco
Copy link
Member

jaraco commented Sep 3, 2017

Thanks for this. I was going to accept this change, but then I went to extract the of_interest function and realized that the 'only' parameter has to be included. This fact makes all of this code quite complicated. I'm not even quite sure what 'only' is for. Blaming the code, I find it's very old (c721485).

I think I want to mull over this one a bit more before rolling it in because it does add complication to the already complicated code, adds redundant operations, and it's not clear to me if the benefit outweighs the cost.

@jd
Copy link
Contributor Author

jd commented Sep 4, 2017

I feel you @jaraco. Run cProfile and you'll see how heavy the parsing is. I've 400 packages installed on my system, so the time pkg_resources lose trying to parse things it does not want in the end has a real cost. That being said, if you have a better way to refactor this, I'm all for it.

The branch is in conflict for now, I'll update it.

`_by_version_descending` is quite heavy as it tries to parse versioning and
sort ALL files listed. This is one of the heaviest method.

Actually, there might not be any need to parse those file names it there will
be ignored in the loop just after, so let filter them before sorting.

On my system, this brings down the loading of time of pkg_resources by 20%:

$ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)'
5 loops, best of 3: 1.11 sec per loop

$ python -m timeit -n 5 -r 3 -c 'import pkg_resources; reload(pkg_resources)'
5 loops, best of 3: 889 msec per loop
@jd
Copy link
Contributor Author

jd commented Sep 4, 2017

Pull-request updated, HEAD is now 4481608

@jd jd force-pushed the pre-filter-find-on-path branch from cb989a2 to 4481608 Compare September 4, 2017 07:47
@jaraco
Copy link
Member

jaraco commented Sep 9, 2017

@jd Please have a look at the latest commit (referenced above). I believe it accomplishes the same thing you were after.

@jaraco jaraco closed this Sep 9, 2017
@jd
Copy link
Contributor Author

jd commented Sep 12, 2017

@jaraco looks great, thanks for taking care!

@jd jd deleted the pre-filter-find-on-path branch September 12, 2017 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants