-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add caching at the scene level, and handle saving/loading from disk #264
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #264 +/- ##
===========================================
+ Coverage 63.69% 66.01% +2.31%
===========================================
Files 112 114 +2
Lines 13936 15516 +1580
===========================================
+ Hits 8877 10243 +1366
- Misses 5059 5273 +214
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall it is a good step forward but as pointed out by my comments some parts seem like they are in the wrong place or seem duplicated.
What if we removed the self.resamplers
cache and had the resample module do all the caching. Like in prepare_resampler
use the cache if the key exists otherwise create the resampler class, and cache it there instead of having the Scene do any resampler caching.
I'll mention this again, what if we had one higher-level (one more layer of abstraction) object like a ResamplerManager
(but a better name) that handles all of the caching and handles the possible complications of resamplers being able to resample multiple datasets at a time versus one at a time.
@@ -703,7 +704,8 @@ def _resampled_scene(cls, datasets, destination, **resample_kwargs): | |||
source_area = dataset.attrs['area'] | |||
if source_area not in resamplers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this resampler dictionary needed if the resampler module is already caching things? It doesn't take in to account the keyword arguments like the resampler module does so couldn't you get some false positive cache hits (old resampler with different kwargs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would like to have the scene instance as a main container for the resampling cache. The idea is that when the scene is deleted, the resamplers associated with the scene are probably not needed anymore, and are then free to be garbage-collected.
satpy/scene.py
Outdated
key = (resampler.__class__, | ||
source, destination_area, | ||
hash_dict(resample_kwargs)) | ||
resamplers_cache[key] = resampler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resamplers_cache
being used in this module but declared (and used) in the other makes this seem like there must be a better way.
satpy/scene.py
Outdated
new_scn, resamplers, destination_area = self._resampled_scene(to_resample, destination, | ||
**resample_kwargs) | ||
for source, (key, resampler) in resamplers.items(): | ||
self.resamplers[key] = resampler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see now. You can't assign to self.resamplers
in _resampled_scene
because it is a class method. Is there a reason it has to be a class method? Add this for loop to that method would make sense if it wasn't a classmethod. I could see the class method being useful if it was public or if it was a utility function that users may want to use outside of a Scene, but it isn't right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed now right?
This PR restructures the caching to store resamplers in the scene, and have a weak global resampler index.
git diff origin/develop **/*py | flake8 --diff