-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache __TYPE_MAP and init submodules #1931
cache __TYPE_MAP and init submodules #1931
Conversation
OK i tested whether importing all the submodules first before pickling sped up the import further, and it doesn't. That version is here: sneakers-the-rat@0a1514e The profiling results on my computer (obviously variable) are:
The reason that including the imported modules doesn't improve the speed much if any is that the main contributor to the slow imports is that a) there are import side effects, b) those import side effects include parsing yaml c) parsing yaml is slow. The module imports take very little time because the yaml is already parsed by the time they are imported, so imo those should remain uncached because unpickling objects with declaration-time side effects (the registration decorator) is super wonky and fragile, u can see the hack i had to do in that branch. There are three remaining opportunities for future import perf improvements:
I'm marking this as ready for review without adding tests just to see if this approach would be acceptable, and if so then i'll add tests for the submodule cloning, caching, and cache invalidation. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #1931 +/- ##
==========================================
- Coverage 92.20% 91.85% -0.36%
==========================================
Files 27 27
Lines 2656 2689 +33
Branches 693 701 +8
==========================================
+ Hits 2449 2470 +21
- Misses 134 145 +11
- Partials 73 74 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Huh, so the test that's failing is this one
which makes sense because I removed Lmk what would be better, updating that test (i think that none of those dunder attributes should be in there?) or making it so |
…fy error message, fix tests which are improperly accessing dunder methods.
OK moved during testing i discovered that one also needs to cache One more thing is that while linting i noticed that the ruff |
gee for a simple thing this sure does involve a lot of fighting lol OK so i had written some tests for this but since we're talking about import side effects testing them is hard.
fortunately....
so I just scrapped the tests because this is ultimately a very simple change and i did not budget this much time to this very simple change. if that's not gonna cut it feel free to close this, just trying to do a little QoL contribs while i work with this package so not mission critical, fine for me to just have it as a fork. |
Hi @sneakers-the-rat, thanks for all the work and profiling! Really appreciate the walkthrough. @rly said he can take a look at this so I'll let him review in detail. |
src/pynwb/__init__.py
Outdated
) | ||
if not final: | ||
_load_core_namespace(final=True) | ||
_load_core_namespace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to extra mark these new functions as private with leading double underscore. Is there any reason not to here, or just stylistic difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never really been sure how pynwb or hdmf use dunders, esp module-level dunders, so thats the reason lol. No problem with renaming
Co-authored-by: Ryan Ly <[email protected]>
I made a few small changes. Looks good! Thanks so much for this PR and catching the slowdown with |
Related to:
fix: #1255
fix: #1487
prior pr: #1050
Motivation
import slow
How to test the behavior?
Checklist
I'VE DONE NONE OF THESE AND I CAN EXPLAIN
flake8
from the source directory.Narrative
sorry for the incomplete PR, i have to run out the door RIGHT NOW and didn't want to forget. opening as a draft and i'll come back later.
finishing up nwb_linkml and wanted to read through to see if there was any special behavior i was missing and was like why the heck does the import take like an hour.
on first import it's because
hdmf
imports the whole scipy sparse module and pynwb imports down that far in hdmf. that would be fixable using aTYPE_CHECKING
check, but idk if that function signature decorator thing can handle delayed annotations like that. that's nbd it goes away.most of the time is spent in the initial loading of the schema into
__TYPE_MAP
(and most of that is from probably unnecessarydeepcopy
s, but that's another day's perf to win), with the remaining ~1/3 from the side effects of importing the other modules.So i just mostly did what was talked about in the prior issues and pickled the
__TYPE_MAP
object before importing the rest of the modules bc i wasn't sure what the side effects were there, but if that's fine to also pickle then we could just copy/paste the module-level imports into theload_namespaces
function.I also was like "why the heck don't ya try to clone the submodules if you're gonna tell me i need to do it anyway" when i first imported so i also did that just bc it seems like a kind thing to do.
anyway more later and i'll test this too xoxo