Add option to standardize to anomalies preprocessor function #300

bascrezee · 2019-10-08T08:52:18Z

I added a new preprocessor function, making use of two other existing preprocessor functions. This is the description I provided in the docstring:

This function standardizes the input data. It calculates the anomalies
(x-mean(x)) and divides them through the standard deviation where both
anomalies and standard deviation are calculated over the specified period.

As far as I could see it is not possible to achieve this behaviour by combining existing preprocessor functions in the recipe, therefore I needed to add this as a new preprocessor function.

Tasks

Add option to standardize to anomalies preprocessor function #299
Add unit tests
Public functions should have a numpy-style docstring so they appear properly in the [API documentation]
If writing a new/modified preprocessor function, please update the documentation
Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below --> there are no backward incompatible changes, as far as I can see

Closes #299

bascrezee · 2019-10-08T09:06:39Z

Can I have a quick review (by @jvegasbsc or @valeriupredoi or ... ?) before proceeding with writing a unit test and adding it to the documentation?

jvegreg · 2019-10-08T10:17:45Z

Looks nice, only that I think it will be better to have it as an option in the anomalies preprocessor.

valeriupredoi

nice one! a couple minor comments. Also, a bit confusing to me (as a non-scientist) as to why monthly, month, mon -> maybe detail this in the func docstring, defo needed in the documentation

esmvalcore/preprocessor/_time.py

bascrezee · 2019-10-08T11:52:07Z

nice one! a couple minor comments. Also, a bit confusing to me (as a non-scientist) as to why monthly, month, mon -> maybe detail this in the func docstring, defo needed in the documentation

This is equally confusing to me, my function inherits this from the other functions. Maybe @jvegasbsc can explain this?

bascrezee · 2019-10-08T11:53:37Z

Looks nice, only that I think it will be better to have it as an option in the anomalies preprocessor.

Yes, I agree. That would be an option as well. What do others think?

mattiarighi · 2019-10-08T12:01:19Z

I also prefer extending existing preprocessors rather than creating new ones.

jvegreg · 2019-10-08T12:43:22Z

This is equally confusing to me, my function inherits this from the other functions. Maybe @jvegasbsc can explain this?

To support all common abbreviates used for monthly data

bascrezee · 2019-10-22T09:32:57Z

@jvegasbsc I just included it as an option in the existing anomaly function. Could you test it?

esmvalcore/preprocessor/_time.py

Co-Authored-By: Bouwe Andela <[email protected]>

bascrezee · 2019-12-02T09:13:32Z

I added a unit test for only one case, at the moment, as a proof of concept. I look forward to some feedback, maybe from @bouweandela or @jvegasbsc ? The strategy I took is to calculate the expected outcome by using numpy functions. Unfortunately this will get rather complicated for the other cases. Also, I noted, that actually there IS quite a difference in the outcome (see the tolerances that I set). I didn't find out yet why there are these differences, it is not related to the weighting along the time dimension, I checked that.

I do think that trying to put more functionality into one function (as suggested by @mattiarighi and @jvegasbsc above) in general renders unit testing way more complex (since the options that need to be covered equal the product between input arguments). Probably something to keep in mind in the future.

jvegreg · 2019-12-02T11:11:50Z

I added a unit test for only one case, at the moment, as a proof of concept. I look forward to some feedback, maybe from @bouweandela or @jvegasbsc ?

I modified it to simplify it a bit and corrected a couple of flake8 issues. Anyway, my suggestion will be to create a new method for testing the standardized case. I think it will be also a good idea to generate a new data cube that will make easier to know at a glance what the result should be

bouweandela · 2019-12-02T12:20:59Z

since the options that need to be covered equal the product between input arguments

Indeed this is only feasible if the number of possible input values is small. Parametrizing with pytest helps a bit, but not when the list of options/possible values grows very large. Usually people try to test at the very least every code path, have a look at the coverage report in test-reports/coverage_html/index.html and make sure there is no code that is not executed during a test (marked red).

bascrezee · 2020-02-12T13:42:53Z

Just a small update (also as a note to self). I got time again to work on this :) I did a first implementation solely for the case period='full'. Interestingly, the differences between numpy result and esmvalcore preprocessor result are pretty large (~0.0012 absolute difference). I checked and the difference can not be explained by weighting of the time axis (since all weights are 1 for this test case). I am now trying to track this further down. Difference could be related to argument ddof which can be passed to Iris Aggregator function.

valeriupredoi · 2020-02-12T14:04:58Z

there is no weighting since stdev does not support weights computation - it seems to me that the difference comes from np.stdev() vs cube.collapsed(axis, stdev) no? 🍺

bascrezee · 2020-02-12T14:14:36Z

there is no weighting since stdev does not support weights computation - it seems to me that the difference comes from np.stdev() vs cube.collapsed(axis, stdev) no?

By default Iris uses ddof=1 in the STD_DEV aggregator, whereas numpy uses ddof=0 by default. After providing this as an argument in the testing, assert_allclose can be replaced by assert_array_equal (hopefully). 😄

valeriupredoi · 2020-02-12T14:22:04Z

here try this with a say, temperature cube:

import iris
import numpy as np

c = iris.load_cube("t1.nc")
# shape (1800, 96, 192)
c1 = c.collapsed('time', iris.analysis.STD_DEV)
c2 = np.std(c.data, axis=0, keepdims=True)[0, ...]
print(np.mean(c1.data - c2))

I get a mean delta of 10^-3 which is consistent with what you notice

tests/unit/preprocessor/_time/test_time.py

bouweandela

Looks good to me! Just a few minor suggestions on formatting

doc/esmvalcore/preprocessor.rst

esmvalcore/preprocessor/_time.py

Co-Authored-By: Bouwe Andela <[email protected]>

bouweandela

Looks good to me. @mattiarighi Could you please test?

bascrezee · 2020-02-21T10:39:36Z

Can this please be merged? @mattiarighi ?

bascrezee added 2 commits October 8, 2019 10:45

Added new preprocessor standardize

dcab2ce

fix prospector complaints

157459b

bascrezee requested review from valeriupredoi and jvegreg October 8, 2019 09:11

valeriupredoi requested changes Oct 8, 2019

View reviewed changes

esmvalcore/preprocessor/_time.py Outdated Show resolved Hide resolved

esmvalcore/preprocessor/_time.py Outdated Show resolved Hide resolved

esmvalcore/preprocessor/_time.py Outdated Show resolved Hide resolved

bascrezee added 2 commits October 8, 2019 14:16

some review suggestions

955377f

minor change

0fcb4e2

mattiarighi added the preprocessor Related to the preprocessor label Oct 11, 2019

bascrezee added 3 commits October 21, 2019 15:49

Merge branch 'development' into add_standardize_preproc

76b71fe

Review comments

ca4f379

moved standardize to existing anomaly functions

5c8d28a

bascrezee changed the title ~~Added new preprocessor standardize~~ Add option to standardize to anomalies preprocessor function Oct 22, 2019

bascrezee added 2 commits October 22, 2019 11:34

minor change

312f277

prospector suggestions

871c522

bascrezee marked this pull request as ready for review October 24, 2019 07:45

bascrezee requested a review from valeriupredoi October 24, 2019 07:45

Merge branch 'development' into add_standardize_preproc

a9eead2

bouweandela reviewed Oct 31, 2019

View reviewed changes

esmvalcore/preprocessor/_time.py Outdated Show resolved Hide resolved