Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mobt 812 vera threshold interpolation #2079

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

lambert-p
Copy link
Contributor

@lambert-p lambert-p commented Jan 21, 2025

Addresses #https://github.com/metoppv/mo-blue-team/issues/812

Test data:
metoppv/improver_test_data#73

Description:
Adds plugin and CLI for threshold interpolation step and necessary tests alongside.

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

Copy link

codecov bot commented Jan 22, 2025

Codecov Report

Attention: Patch coverage is 98.30508% with 1 line in your changes missing coverage. Please review.

Project coverage is 98.42%. Comparing base (84a8944) to head (c74eb6a).
Report is 74 commits behind head on master.

Files with missing lines Patch % Lines
improver/utilities/threshold_interpolation.py 98.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2079      +/-   ##
==========================================
+ Coverage   98.39%   98.42%   +0.02%     
==========================================
  Files         124      136      +12     
  Lines       12212    13439    +1227     
==========================================
+ Hits        12016    13227    +1211     
- Misses        196      212      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Kat-90
Copy link
Contributor

Kat-90 commented Feb 3, 2025

A couple of minor thoughts I've had over the last week since this was put into review. May be useful to the reviewer:

  • This branch needs rebasing so that the tests pass.
  • Should we move the collapse realization to the beginning of the process function, this way we are only interpolating once rather than over each realization we are going to collapse anyway.
  • Should we make collapse realization an argument? For our case, we always want this coord collapsed, but we've made this generic so I wonder if there is a case where it wouldn't want to be collapsed. This is an option in the threshold step

Copy link
Contributor

@bayliffe bayliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this. I've given this a first pass and there is some work to do. If you have any questions, please ask me.

Comment on lines 182 to 119
if thresholds is None:
thresholds = forecast_at_thresholds.coord(threshold_coord).points
warnings.warn(
f"No thresholds provided, using existing thresholds. Thresholds being used are: {list(thresholds)}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this? Why not just return the cube unchanged as it already contains all of the required thresholds?

More broadly, why is thresholds an optional argument rather than a mandatory one? The only outcome without providing this argument is that you get your cube back unchanged, so why would anyone call it in that way?

return result


def Threshold_interpolation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is written in an odd style. Not quite CamelCase where the first letter of each word is capitalised which we use for plugins and not quite a function suitable name with all lowercase letters.

Unless there is a good reason, this code ought to be written as a plugin:

class ThresholdInterpolation(BasePlugin):
    def __init__(thresholds):
        etc.
    def _interpolate_thresholds():
        etc.
    def _create_cube_with_thresholds():
        etc.
    def process(cube):
        do stuff.

This contains all the related code together within a class. The underscores proceeding the method names indicates they are private methods, i.e. they are specific to this class and should not be called for other purposes. The use of a class allows the sharing of variables using self. More importantly, we can create an instance of a class, e.g.

plugin = ThresholdInterpolation(percentiles=[10,50,90])
result = plugin.process(cube)

This value of this (though not really in this case) is that if there is some complexity in configuring the plugin, i.e. converting the percentile values, checking them, etc. all of that can be done just once when we create the plugin instance. We can then call it over and over again without this overhead.

forecast_at_thresholds,
method="mean",
)
forecast_at_thresholds = collapsed_forecast_at_thresholds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kat-90 asked about whether realization collapse should occur before the interpolation. This is a good question to which the answer is yes, this is all commutative and we can save some effort by collapsing first. An example with some numbers:

Process each realization then collapse

Realization Thresh 1 (200) Thresh k (500) Thresh 2 (800)
R0 0.6 0.4 0.2
R1 1.0 0.8 0.6
R2 0.8 0.6 0.4
Mean 0.8 0.6 0.4

Collapse then process the result

Realization Thresh 1 (200) Thresh 2 (800)
R0 0.6 0.2
R1 1.0 0.6
R2 0.8 0.4
Mean 0.8 0.4

thresh_k = 500: value = [(500 - 200) / (800 - 200)] * (0.4 - 0.8) + 0.8 = 0.6

This is a very simple example, but the result is general for our simple linear interpolation (we're basically just averaging numbers).

def process(
forecast_at_thresholds: cli.inputcube,
*,
thresholds: cli.comma_separated_list = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in plugin file about this not being optional.

thresholds = ["cat", "dog", "elephant"]
error_msg = "could not convert string to float"
with pytest.raises(ValueError, match=error_msg):
Threshold_interpolation(input_cube, thresholds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required unless the plugin is doing something to handle this exception itself. Here we are essentially testing a numpy exception and that's (hopefully) covered by numpy's unit tests.

coord.name() for coord in realization_cube.coords(dim_coords=True)
]
expected_dim_coords.remove("realization")
assert dim_coords, expected_dim_coords
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is testing the collapse_realization function, which already has unit tests covering this, so this is not required.

realization_cube.remove_coord("visibility_in_air")
error_msg = "No threshold coord found"
with pytest.raises(CoordinateNotFoundError, match=error_msg):
Threshold_interpolation(realization_cube, thresholds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing the find_threshold_coordinate function and is not required.

warning_msg = "No thresholds provided, using existing thresholds."

with pytest.warns(UserWarning, match=warning_msg):
Threshold_interpolation(input_cube)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this goes away if you get rid of the option to not provide thresholds.

Testing that a Cube is returned when inputting a masked cube.
"""
thresholds = [100, 150, 200, 250, 300]
result = Threshold_interpolation(masked_cube_same, thresholds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should test here that the mask in is the same as the mask out.

for point in self.thresholds:
cube = template_cube.copy()
coord = iris.coords.DimCoord(
np.array([point], dtype="float32"), units=cube.units
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
np.array([point], dtype="float32"), units=cube.units
np.array([point], dtype="float32"), units=threshold_units

@lambert-p lambert-p force-pushed the mobt_812_vera_threshold_interpolation branch from 70c4922 to e08049d Compare February 19, 2025 13:52
Copy link
Contributor

@bayliffe bayliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work Phoebe. Have a load more comments, but we are getting there.

"""Test that the plugin returns an Iris.cube.Cube with suitable units."""
thresholds = [100, 150, 200, 250, 300]
result = ThresholdInterpolation(thresholds)(input_cube)
assert result, Cube
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line still doesn't do anything. It's still the same as assert 1, 2

To check the type of something we use isinstance. So your test should be:

assert isinstance(result, Cube)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not fixed.

Comment on lines 114 to 120
def test_masked_cube(masked_cube):
"""
Testing that a Cube is returned when inputting a masked cube.
"""
thresholds = [100, 150, 200, 250, 300]
result = ThresholdInterpolation(thresholds)(masked_cube)
assert isinstance(result, Cube)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pytest allows us to parameterize our tests so we don't have to write the same test twice we can just vary the inputs. Unfortunately parameterising over fixtures is the least possible friendly introduction to this but oh well.

If you replace your test_basic with what follows and remove the test_masked_cube test you will have exactly the same test coverage. The test below loops over the fixtures given in the brackets of the parametrize statement, namely ["input_cube", "masked_cube"]. Annoyingly you reference these as string names which I hate.

@pytest.mark.parametrize("input", ["input_cube", "masked_cube"])
def test_cube_returned(request, input):
    """Test that the plugin returns an Iris.cube.Cube with suitable units."""
    cube = request.getfixturevalue(input)
    thresholds = [100, 150, 200, 250, 300]
    result = ThresholdInterpolation(thresholds)(cube)
    assert result, Cube
    assert result.units == cube.units

To describe:
@pytest.mark.parametrize("input", ["input_cube", "masked_cube"])
This tells pytest we are looping over parameters. The variable name is "input", i.e. that's what it sets to have a given value, like i in for i in [1, 2, 3]. Here the values are the names of the fixtures that we want to pass in, which return our different cubes; the normal one and the masked one.

def test_cube_returned(request, input):
We pass in the "input" variable. As we are looping over fixtures that we reference as string names we also have to pass in the request magic argument that's part of pytest. This can be called within the code to effectively call the fixture and get the thing it returns, which in this case is one of our input cubes.

cube = request.getfixturevalue(input)
This is us doing exactly that, calling the fixture to get the cube it returns, either a normal cube or one containing masked data.

The rest of the test is as before but using the cube variable.

Copy link
Contributor

@bayliffe bayliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Phoebe. Very little left to do.

Note that your acceptance tests are broken. The precipitation diagnostics you've used for the masked data tests are thresholded as greater_than the threshold values. Now that we've removed the hard-coding of the spp__relative_to_threshold coordinate attribute these are being created with the correct threshold when the tests run, but these do not agree with the KGO which have the wrong attribute. These need to be recreated.

@@ -118,11 +113,9 @@ def _interpolate_thresholds(
forecast_at_interpolated_thresholds
)

# Reshape forecast_at_percentiles, so the percentiles dimension is
# first, and any other dimension coordinates follow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've deleted this comment rather than rewritten it to explain what is actually happening here. Can you write it again but correct it to describe what this function is doing to your data.

Cube expected to contain a threshold coordinate.

Returns:
Cube:
Cube with forecast values at the desired set of thresholds.
The threshold coordinate is always the zeroth dimension.
"""
self.threshold_coord = find_threshold_coordinate(forecast_at_thresholds).name()
self.threshold_coord = find_threshold_coordinate(forecast_at_thresholds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 196/197 below:
Code coverage is suggesting you don't have test coverage of this. You'll need a unit test with a cube that has a realization dimension.

"""Test that the plugin returns an Iris.cube.Cube with suitable units."""
thresholds = [100, 150, 200, 250, 300]
result = ThresholdInterpolation(thresholds)(input_cube)
assert result, Cube
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not fixed.

@lambert-p
Copy link
Contributor Author

Hi Ben, thank you. I have recreated the KGOs and the acceptance tests are running now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants