-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MiRS reader #1511
Add MiRS reader #1511
Conversation
DeepCode's analysis on #e2bdd8 found:
Top issues
👉 View analysis in DeepCode’s Dashboard | Configure the bot |
Codecov Report
@@ Coverage Diff @@
## master #1511 +/- ##
==========================================
+ Coverage 92.59% 92.73% +0.13%
==========================================
Files 251 253 +2
Lines 36997 37576 +579
==========================================
+ Hits 34258 34845 +587
+ Misses 2739 2731 -8
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I've updated the title just because we use it in our release notes so didn't want to clutter it with the PR numbers. |
This is done because the reader name in polar2grid is mirs, and that has been used for a while, so it should not change for users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple requests (mostly style stuff), but otherwise it is looking pretty good. Also 👍 for the pytest parametrize usage.
Remove variables that are never used There seems to be no reason to handle coords again in __getitem__, let xarray handle these.
1.) Do not need the file registry before running pooch retrieve command 2.) Simplify/make easier to read code that does the pooch retrieve and call when needed, not at the beginning of the code. The filenames do not need to be in the metadata for the scene. 3.) Use dask.where rather than roundabout numpy logic. This eliminates all squeeze calls. 4.) removes bt_data array slice where it is not needed but does use the bt_data attributes for the xarray of the corrected brightness temperatures
limb_correct_bt does not need to be in the class. Move code out of class methods and restructure the call to the limb_correction to facilitate this move.
Assigning a map_blocks call to a numpy array 96 times causes dask to compute before it is necessary. Also, I could be wrong, by it does not seem necessary to loop over map_blocks. It makes more sense to loop over the fov in the function that map_blocks calls.
Merge branch 'master' of https://github.com/pytroll/satpy into mirs
…py arrays also, don't hard code the block size for the arrays, use the block size of the input daskArray.
satpy/readers/mirs.py
Outdated
bt_corrected = xr.DataArray(new_data, dims=bt_data[idx, :, :].dims, | ||
coords=bt_data[idx, :, :].coords, | ||
bt_corrected = xr.DataArray(new_data, dims=("y", "x"), | ||
coords=surf_type_mask.coords, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the coords this variable has?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this and am not sure the best way to add the coords. I was considering adjusting the way the method new_coords works at line 211 to make this a more obvious way of assigning longitude and latitude coordinates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually was thinking about that same thing for the chunk size for the map_blocks. If I added a general self.nc.shape so that I can access the scanlines/fov without slicing the bt_data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's talk about it at the meeting today BUT you shouldn't need to add lon/lat/x/y coordinates. The base reader class will do that for you based on the coordinates:
defined in the YAML (or your available_datasets method).
1.) Remove extra check for self.nc.coords and just check within new_coords method 2.) Change all dask calls in apply_limb_correction method to np calls.
This feature of xarray is not working well with dask pydata/xarray#3068
1.) Removes the new_coords method which added lat/lon coords through xarray.Dataset.assign_coords method. 2.) Removes coordinate assignment when bt_corrected xarray.DataArrays are constructed. 3.) Restructures available_datasets: Hope, it is easier to understand and ensures that lat/lon coordinates have standard_names even if file metdata without it takes precedence after the yaml.
otherwise they are dim_0 and dim_1 and they are not recognized by the reader. Scanline/FOV are converted to y,x in reader
I think it can be True/False/None. I'm not sure I know or remember exactly what you're talking about and am missing some of the context of this comment/discussion. I think you have it right though. The Regarding coords, it isn't how they are assigned, it is what is being assigned (a 2D dask array). Until we can get more confidence that xarray won't "accidentally" compute these, it would be best not to include them as |
I am sorry Dave, I meant to put a link for the available_datasets section and did not realize I hadn't done that. Sorry for the confusion. I am pretty sure that any file metadata in the mirs reader supersedes metadata read from the yaml. Perhaps I have missed a basic concept in building the datasets through the configuration file and dynamic datasets. |
I think you based this reader off of the GAASP reader which probably makes some assumptions about where dataset definitions are coming from. I think you are right, but technically the behavior is probably "undefined" when two datasets are yielded with the same DataID. I think you'll either need to keep a "cache" of what DataIDs have already been yielded by the file handler and not re-yield/override it. I think I've used a list/set called Obviously, making it work first is more important. |
Correct, I based this on the GAASP reader and it does make assumptions about the location of Metdata, mainly, I don't think it expects any of the metadata to be coming from the yaml. Currently, that is also how the mirs reader is working. Initially, I thought I would use the yaml to add metadata that was missing, but then I realized, that could be a tricky thing. I would then want to make sure that the file metadata would always override metadata from the yaml like units. I am comfortable with the idea that all the metadata comes from the file and adding things like long_name and standard_name with in the reader. |
Yes, it should be adding coords as it (xarray) sees fit. You could |
satpy/readers/mirs.py
Outdated
bt_data = bt_data.transpose("Channel", "y", "x") | ||
c_size = bt_data[idx, :, :].chunks | ||
correction = da.map_blocks(apply_atms_limb_correction, | ||
bt_data.values, idx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .values
should be removed here, right?
The necessary metadata to get metop files read into satpy is getting added in the reader. The information provided to the yaml would be nice to supplement information in the file, but I can't think of a reason why it should override information, except for descriptions, which can be handled elsewhere. Since trying to juggle yaml/file metadata would be a little more involved in the code, I feel it is best to keep the model of the gaasp reader which assumes the necessary metadata is in the file and add missing metadata within the reader.
- add the mocks - create fake coefficient data in test - updates how the coefficients are read in mirs by splitting our the reading of the file and the actual parsing. - Checks the end of coefficient file for n_chn and n_fov thus removing the need for globals - Remove the global N_FOV in the loop which calculates the coefficients on the data. Use dataset size for loop endpoint, if the dataset for some reason does not have 96 fov, this loop would fail with a fixed endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good enough to me. Thanks for the patience on getting things in order and waiting on the aux data download stuff.
This PR is a replacement for PRs #1486 and #1285. This MiRS reader loads the level 2 EDR IMG swath files produced by the Microwave Integrated Retrieval System.