-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mobt 812 vera threshold interpolation #2079
base: master
Are you sure you want to change the base?
Mobt 812 vera threshold interpolation #2079
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2079 +/- ##
==========================================
+ Coverage 98.39% 98.42% +0.02%
==========================================
Files 124 136 +12
Lines 12212 13439 +1227
==========================================
+ Hits 12016 13227 +1211
- Misses 196 212 +16 ☔ View full report in Codecov by Sentry. |
A couple of minor thoughts I've had over the last week since this was put into review. May be useful to the reviewer:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this. I've given this a first pass and there is some work to do. If you have any questions, please ask me.
if thresholds is None: | ||
thresholds = forecast_at_thresholds.coord(threshold_coord).points | ||
warnings.warn( | ||
f"No thresholds provided, using existing thresholds. Thresholds being used are: {list(thresholds)}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of this? Why not just return the cube unchanged as it already contains all of the required thresholds?
More broadly, why is thresholds
an optional argument rather than a mandatory one? The only outcome without providing this argument is that you get your cube back unchanged, so why would anyone call it in that way?
return result | ||
|
||
|
||
def Threshold_interpolation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is written in an odd style. Not quite CamelCase where the first letter of each word is capitalised which we use for plugins and not quite a function suitable name with all lowercase letters.
Unless there is a good reason, this code ought to be written as a plugin:
class ThresholdInterpolation(BasePlugin):
def __init__(thresholds):
etc.
def _interpolate_thresholds():
etc.
def _create_cube_with_thresholds():
etc.
def process(cube):
do stuff.
This contains all the related code together within a class. The underscores proceeding the method names indicates they are private methods, i.e. they are specific to this class and should not be called for other purposes. The use of a class allows the sharing of variables using self
. More importantly, we can create an instance of a class, e.g.
plugin = ThresholdInterpolation(percentiles=[10,50,90])
result = plugin.process(cube)
This value of this (though not really in this case) is that if there is some complexity in configuring the plugin, i.e. converting the percentile values, checking them, etc. all of that can be done just once when we create the plugin instance. We can then call it over and over again without this overhead.
forecast_at_thresholds, | ||
method="mean", | ||
) | ||
forecast_at_thresholds = collapsed_forecast_at_thresholds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kat-90 asked about whether realization collapse should occur before the interpolation. This is a good question to which the answer is yes, this is all commutative and we can save some effort by collapsing first. An example with some numbers:
Process each realization then collapse
Realization | Thresh 1 (200) | Thresh k (500) | Thresh 2 (800) |
---|---|---|---|
R0 | 0.6 | 0.4 | 0.2 |
R1 | 1.0 | 0.8 | 0.6 |
R2 | 0.8 | 0.6 | 0.4 |
Mean | 0.8 | 0.6 | 0.4 |
Collapse then process the result
Realization | Thresh 1 (200) | Thresh 2 (800) |
---|---|---|
R0 | 0.6 | 0.2 |
R1 | 1.0 | 0.6 |
R2 | 0.8 | 0.4 |
Mean | 0.8 | 0.4 |
thresh_k = 500: value = [(500 - 200) / (800 - 200)] * (0.4 - 0.8) + 0.8 = 0.6
This is a very simple example, but the result is general for our simple linear interpolation (we're basically just averaging numbers).
def process( | ||
forecast_at_thresholds: cli.inputcube, | ||
*, | ||
thresholds: cli.comma_separated_list = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments in plugin file about this not being optional.
thresholds = ["cat", "dog", "elephant"] | ||
error_msg = "could not convert string to float" | ||
with pytest.raises(ValueError, match=error_msg): | ||
Threshold_interpolation(input_cube, thresholds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required unless the plugin is doing something to handle this exception itself. Here we are essentially testing a numpy exception and that's (hopefully) covered by numpy's unit tests.
coord.name() for coord in realization_cube.coords(dim_coords=True) | ||
] | ||
expected_dim_coords.remove("realization") | ||
assert dim_coords, expected_dim_coords |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is testing the collapse_realization function, which already has unit tests covering this, so this is not required.
realization_cube.remove_coord("visibility_in_air") | ||
error_msg = "No threshold coord found" | ||
with pytest.raises(CoordinateNotFoundError, match=error_msg): | ||
Threshold_interpolation(realization_cube, thresholds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing the find_threshold_coordinate function and is not required.
warning_msg = "No thresholds provided, using existing thresholds." | ||
|
||
with pytest.warns(UserWarning, match=warning_msg): | ||
Threshold_interpolation(input_cube) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this goes away if you get rid of the option to not provide thresholds.
Testing that a Cube is returned when inputting a masked cube. | ||
""" | ||
thresholds = [100, 150, 200, 250, 300] | ||
result = Threshold_interpolation(masked_cube_same, thresholds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should test here that the mask in is the same as the mask out.
for point in self.thresholds: | ||
cube = template_cube.copy() | ||
coord = iris.coords.DimCoord( | ||
np.array([point], dtype="float32"), units=cube.units |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.array([point], dtype="float32"), units=cube.units | |
np.array([point], dtype="float32"), units=threshold_units |
70c4922
to
e08049d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work Phoebe. Have a load more comments, but we are getting there.
"""Test that the plugin returns an Iris.cube.Cube with suitable units.""" | ||
thresholds = [100, 150, 200, 250, 300] | ||
result = ThresholdInterpolation(thresholds)(input_cube) | ||
assert result, Cube |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line still doesn't do anything. It's still the same as assert 1, 2
To check the type of something we use isinstance
. So your test should be:
assert isinstance(result, Cube)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not fixed.
def test_masked_cube(masked_cube): | ||
""" | ||
Testing that a Cube is returned when inputting a masked cube. | ||
""" | ||
thresholds = [100, 150, 200, 250, 300] | ||
result = ThresholdInterpolation(thresholds)(masked_cube) | ||
assert isinstance(result, Cube) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pytest allows us to parameterize our tests so we don't have to write the same test twice we can just vary the inputs. Unfortunately parameterising over fixtures is the least possible friendly introduction to this but oh well.
If you replace your test_basic with what follows and remove the test_masked_cube test you will have exactly the same test coverage. The test below loops over the fixtures given in the brackets of the parametrize
statement, namely ["input_cube", "masked_cube"]
. Annoyingly you reference these as string names which I hate.
@pytest.mark.parametrize("input", ["input_cube", "masked_cube"])
def test_cube_returned(request, input):
"""Test that the plugin returns an Iris.cube.Cube with suitable units."""
cube = request.getfixturevalue(input)
thresholds = [100, 150, 200, 250, 300]
result = ThresholdInterpolation(thresholds)(cube)
assert result, Cube
assert result.units == cube.units
To describe:
@pytest.mark.parametrize("input", ["input_cube", "masked_cube"])
This tells pytest we are looping over parameters. The variable name is "input", i.e. that's what it sets to have a given value, like i
in for i in [1, 2, 3]
. Here the values are the names of the fixtures that we want to pass in, which return our different cubes; the normal one and the masked one.
def test_cube_returned(request, input):
We pass in the "input" variable. As we are looping over fixtures that we reference as string names we also have to pass in the request
magic argument that's part of pytest. This can be called within the code to effectively call the fixture and get the thing it returns, which in this case is one of our input cubes.
cube = request.getfixturevalue(input)
This is us doing exactly that, calling the fixture to get the cube it returns, either a normal cube or one containing masked data.
The rest of the test is as before but using the cube
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Phoebe. Very little left to do.
Note that your acceptance tests are broken. The precipitation diagnostics you've used for the masked data tests are thresholded as greater_than
the threshold values. Now that we've removed the hard-coding of the spp__relative_to_threshold
coordinate attribute these are being created with the correct threshold when the tests run, but these do not agree with the KGO which have the wrong attribute. These need to be recreated.
@@ -118,11 +113,9 @@ def _interpolate_thresholds( | |||
forecast_at_interpolated_thresholds | |||
) | |||
|
|||
# Reshape forecast_at_percentiles, so the percentiles dimension is | |||
# first, and any other dimension coordinates follow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've deleted this comment rather than rewritten it to explain what is actually happening here. Can you write it again but correct it to describe what this function is doing to your data.
Cube expected to contain a threshold coordinate. | ||
|
||
Returns: | ||
Cube: | ||
Cube with forecast values at the desired set of thresholds. | ||
The threshold coordinate is always the zeroth dimension. | ||
""" | ||
self.threshold_coord = find_threshold_coordinate(forecast_at_thresholds).name() | ||
self.threshold_coord = find_threshold_coordinate(forecast_at_thresholds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 196/197 below:
Code coverage is suggesting you don't have test coverage of this. You'll need a unit test with a cube that has a realization dimension.
"""Test that the plugin returns an Iris.cube.Cube with suitable units.""" | ||
thresholds = [100, 150, 200, 250, 300] | ||
result = ThresholdInterpolation(thresholds)(input_cube) | ||
assert result, Cube |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not fixed.
Hi Ben, thank you. I have recreated the KGOs and the acceptance tests are running now. |
Addresses #https://github.com/metoppv/mo-blue-team/issues/812
Test data:
metoppv/improver_test_data#73
Description:
Adds plugin and CLI for threshold interpolation step and necessary tests alongside.
Testing: