-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lazy fill_value retrieval #2738
Comments
Specifically, this comes up at save time when you want to stream data. Currently, that just means NetCDF. The shimmy around this issue is to write to disk and check that your fill_value is consistent, rather than try to understand your fill_value up front. This can be done with the
|
Incidentally, I discovered we have some questionable behaviour with regards to iris save of netcdf files. It comes down to the assumption that the fill value actually represents the mask of a masked_array:
There are a couple of options here. We could:
What is clear: there is redundancy between the masked_array's |
Summary of key details: • np.ma -> slice -> np.ma in all situations My conclusion: stream out our data with a default (overridable) fill value, warning if there are genuine data point collisions as we go. If we want round-trip fill_values, we need to add that to the data model (i.e. not use the fill_value in a ma, because of the previous comment). Adding fill_value to the data model is a possible thing - there are lots of details to ensure that we are consistent, and even then we still need to validate the arrays as they get written to ensure there are no data-point collisions. Implementing this has nothing to do with Appendix
|
A question about the ability to optimise calculations by sometimes just returning an array if there are no masked values. It is my assertion that we can still do this:
|
I have posted a query on the dask "masked_arrays" PR : dask/dask#2301 (comment) |
I think this is consistent with the previous iris-dask code |
Discussed in offline meeting @pelson @djkirkham @bjlittle @pp-mo
|
The "smallest slice" method of retrieving the fill value from a dask array doesn't always work. In particular it won't work if that array is formed from the concatenation/stacking of several arrays or proxies which have differing fill values, or combinations of ndarrays and masked arrays:
The text was updated successfully, but these errors were encountered: