-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which statistics in iris.analysis
are lazy?
#4039
Comments
Hi @stefsmeets, thanks for this. I agree we ought to have something in the docstrings that tells us which aggregators are lazy. We have an open issue about doing this for functions generally (#3292), but nobody has got to it yet. For percentiles, I have an open PR at #3901. I’d welcome any feedback on that as I’m pretty new to dask. |
It looks like import numpy.ma as ma
import dask.array as da
arr = ma.array(range(4), mask=[0,0,0,1])
print(ma.median(arr))
larr = da.from_array(arr)
print(da.median(larr, axis=0).compute()) Output:
So I think we would need something extra to make lazy median consistent with our existing median aggregator. |
Hi @stefsmeets, yes it looks like that should work in principle. Something like import numpy as np
import numpy.ma as ma
import dask.array as da
def lazy_median(array, axis):
array = array.astype(np.float_)
nan_array = da.ma.filled(array, np.nan)
median = da.nanmedian(nan_array, axis)
return da.ma.fix_invalid(median)
arr = ma.array(range(4), mask=[0,0,0,1])
print(ma.median(arr))
larr = da.from_array(arr)
print(lazy_median(larr, axis=0).compute()) Output:
Though I am very much not an expert. Maybe @pp-mo has thoughts on this. |
Related: #3292 |
@fnattino: This issue could be interesting for you |
#5066 updated all the aggregator docstrings to indicate which are lazy. PERCENTILE has been lazy since v3.3. Is there still appetite to work on MEDIAN? Obviously you can just use the 50th percentile but perhaps there is an advantage to the separate median function. |
@SciTools/peloton let's wait till 2023-01 if anyone really wants MEDIAN to be lazy. |
📰 Custom Issue
Hi everyone, we are currently working on a feature to make our multimodel calculations lazy in ESMValTool by depending on
iris.analysis
to perform the calculations (ESMValGroup/ESMValCore#968). The documentation states thatMEAN
,STD_DEV
andVARIANCE
already have lazy implementations viadask
.Looking through the code, I have noticed that
iris.analysis.MIN
andiris.analysis.MAX
also have lazy functions associated with them, but these are not mentioned in the documentation as being lazy. I'm wondering if I'm missing something or if this information is not yet available in the documentation.We would be very interested to also make some of the other statistics lazy on our side, i.e.
MEDIAN
andPERCENTILE
, which also have implementations available via dask.The text was updated successfully, but these errors were encountered: