Which statistics in `iris.analysis` are lazy? #4039

stefsmeets · 2021-02-26T15:04:17Z

📰 Custom Issue

Hi everyone, we are currently working on a feature to make our multimodel calculations lazy in ESMValTool by depending on iris.analysis to perform the calculations (ESMValGroup/ESMValCore#968). The documentation states that MEAN, STD_DEV and VARIANCE already have lazy implementations via dask.

Looking through the code, I have noticed that iris.analysis.MIN and iris.analysis.MAX also have lazy functions associated with them, but these are not mentioned in the documentation as being lazy. I'm wondering if I'm missing something or if this information is not yet available in the documentation.

We would be very interested to also make some of the other statistics lazy on our side, i.e. MEDIAN and PERCENTILE, which also have implementations available via dask.

The text was updated successfully, but these errors were encountered:

rcomer · 2021-02-27T09:57:09Z

Hi @stefsmeets, thanks for this. I agree we ought to have something in the docstrings that tells us which aggregators are lazy. We have an open issue about doing this for functions generally (#3292), but nobody has got to it yet.

For percentiles, I have an open PR at #3901. I’d welcome any feedback on that as I’m pretty new to dask.

rcomer · 2021-02-27T13:03:07Z

It looks like dask.array.median is using numpy.median under the hood, so doesn't respect masks:

import numpy.ma as ma
import dask.array as da

arr = ma.array(range(4), mask=[0,0,0,1])
print(ma.median(arr))

larr = da.from_array(arr)
print(da.median(larr, axis=0).compute())

Output:

1.0
1.5

So I think we would need something extra to make lazy median consistent with our existing median aggregator.

stefsmeets · 2021-03-11T15:34:18Z

Hi @rcomer , I just noticed that a nanmedian function exists in dask. Would this be a way to make the median operation lazy in iris?

rcomer · 2021-03-11T17:10:46Z

Hi @stefsmeets, yes it looks like that should work in principle. Something like

import numpy as np
import numpy.ma as ma
import dask.array as da

def lazy_median(array, axis):
    array = array.astype(np.float_)
    nan_array = da.ma.filled(array, np.nan)
    median = da.nanmedian(nan_array, axis)
    return da.ma.fix_invalid(median)

arr = ma.array(range(4), mask=[0,0,0,1])
print(ma.median(arr))

larr = da.from_array(arr)
print(lazy_median(larr, axis=0).compute())

Output:

1.0
1.0

Though I am very much not an expert. Maybe @pp-mo has thoughts on this.

trexfeathers · 2022-10-17T10:31:51Z

Related: #3292

bouweandela · 2022-11-10T08:51:32Z

@fnattino: This issue could be interesting for you

rcomer · 2022-11-23T08:16:57Z

#5066 updated all the aggregator docstrings to indicate which are lazy. PERCENTILE has been lazy since v3.3. Is there still appetite to work on MEDIAN? Obviously you can just use the 50th percentile but perhaps there is an advantage to the separate median function.

pp-mo · 2022-11-23T10:55:48Z

@SciTools/peloton let's wait till 2023-01 if anyone really wants MEDIAN to be lazy.
IF not, add a doc note to say "it's not lazy, but percentile is"

stefsmeets added the New: Issue label Feb 26, 2021

rcomer added the Feature: ESMValTool label Mar 1, 2021

Peter9192 mentioned this issue Mar 10, 2021

Lazy implementation of multi_model_statistics and ensemble_statistics preprocessors ESMValGroup/ESMValCore#968

Merged

9 tasks

Peter9192 mentioned this issue Mar 12, 2021

Lazy Percentile Aggregator #3901

Merged

trexfeathers removed the New: Issue label Jun 15, 2022

valeriupredoi mentioned this issue Sep 30, 2022

Ordering issues labelled Feature: ESMValTool from iris ESMValGroup/ESMValCore#1738

Open

bjlittle moved this to 📚 Backlog in 🌍 ESMValTool Surgery (Discussion Topics) Oct 10, 2022

bjlittle added this to 🌍 ESMValTool Surgery (Discussion Topics) Oct 10, 2022

rcomer mentioned this issue Jan 4, 2023

Link PERCENTILE from MEDIAN docstring #5128

Merged

trexfeathers closed this as completed in #5128 Jan 4, 2023

github-project-automation bot moved this from 📚 Backlog to 🏁 Done in 🌍 ESMValTool Surgery (Discussion Topics) Jan 4, 2023

trexfeathers moved this to 🆕 New in ESMValTool Feb 20, 2023

trexfeathers added this to ESMValTool Feb 20, 2023

trexfeathers removed this from ESMValTool Feb 20, 2023

fnattino mentioned this issue Oct 8, 2024

Lazy median aggregator #6167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which statistics in `iris.analysis` are lazy? #4039

Which statistics in `iris.analysis` are lazy? #4039

stefsmeets commented Feb 26, 2021 •

edited

Loading

rcomer commented Feb 27, 2021

rcomer commented Feb 27, 2021

stefsmeets commented Mar 11, 2021

rcomer commented Mar 11, 2021

trexfeathers commented Oct 17, 2022

bouweandela commented Nov 10, 2022

rcomer commented Nov 23, 2022

pp-mo commented Nov 23, 2022

Which statistics in iris.analysis are lazy? #4039

Which statistics in iris.analysis are lazy? #4039

Comments

stefsmeets commented Feb 26, 2021 • edited Loading

📰 Custom Issue

rcomer commented Feb 27, 2021

rcomer commented Feb 27, 2021

stefsmeets commented Mar 11, 2021

rcomer commented Mar 11, 2021

trexfeathers commented Oct 17, 2022

bouweandela commented Nov 10, 2022

rcomer commented Nov 23, 2022

pp-mo commented Nov 23, 2022

Which statistics in `iris.analysis` are lazy? #4039

Which statistics in `iris.analysis` are lazy? #4039

stefsmeets commented Feb 26, 2021 •

edited

Loading