Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced testing for Seasonal Water Yield (SWY) to test intermediate rasters #1744

Merged
merged 6 commits into from
Jan 27, 2025

Conversation

claire-simpson
Copy link
Contributor

Description

Improved testing for SWY:

  • Added three new unit tests to validate functions that create intermediate raster outputs (route_baseflow_sum, calculate_local_recharge, and _calculate_curve_number_raster)
  • Added assertion to existing test_local_recharge_undefined_nodata

Fixes #1549

Checklist

  • Updated HISTORY.rst and link to any relevant issue (if these changes are user-facing)
  • Updated the user's guide (if needed)
  • Tested the Workbench UI (if relevant)

Copy link
Contributor

@davemfish davemfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @claire-simpson , I had a few minor suggestions. We can talk them over of course.

os.path.join(self.workspace_dir, 'target_aet_path.tif'),
os.path.join(self.workspace_dir, 'target_precip_path.tif'))
except Exception as e:
assert False, f"calculate_local_recharge raised an exception: {e}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're only concerned about calculate_local_recharge not raising an exception, then if we did get an exception we will want to make sure we see the full traceback. The simplest way to accomplish that might be to remove the try/except altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I was struggling with how to assert that there were no errors and landed on this unsatisfying solution.. but I think your note below makes the most sense to improve this test and remove redundancy!

except Exception as e:
assert False, f"calculate_local_recharge raised an exception: {e}"

def test_calculate_local_recharge(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the setup in this test is almost identical to test_local_recharge_undefined_nodata do you think it would make sense to combine them? Basically taking the assertions you added in this test and applying them to the pre-existing test that had no assertions?

assert numpy.allclose(b, expected_b, equal_nan=True), \
f"Baseflow raster values do not match. Expected: {expected_b}, Got: {b}"
assert numpy.allclose(b_sum, expected_b_sum, equal_nan=True), \
f"b_sum raster values do not match. Expected: {expected_b_sum}, Got: {b_sum}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the builtin assert, unittest and numpy.testing both provide lots of really convenient assertion methods. These are nice because their names tend to be very descriptive in what they're asserting, and they handle the string formatting when the assertion fails. For example, in this case we could use numpy.testing.assert_allclose.

And in other, simpler cases, we tend to use one of these: https://docs.python.org/3/library/unittest.html#assert-methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is super helpful. I have been seeing different types of assertions and wasn't sure which was best

def test_calculate_curve_number_raster(self):
"""test `_calculate_curve_number_raster`"""
from natcap.invest.seasonal_water_yield import seasonal_water_yield
import pandas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move the pandas import to the top of the module? I think it's only important to import the modules we are testing inside the scope of the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely, I noticed another test importing pandas in this way so I followed suit, but I can change both

soil_array = numpy.zeros((3, 3), dtype=numpy.int32)
for i, row in enumerate(soil_array):
row[:] = i % soil_groups + 1
make_raster_from_array(soil_array, soil_group_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this module has make_soil_raster and make_lulc_raster and make_biophysical_csv already defined. Does it make sense to use those functions to create the input data we need here? It may not, but I'm curious what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I noticed these as well! I chose not to use them as the make_soil_raster and make_lulc_raster both create 100x100 pixel rasters, which seemed a bit unnecessarily large to me as then I'd also be hard-coding a 100x100 pixel raster target.

The issue I had with make_biophysical_csv is that the column names are uppercase vs. the _calculate_curve_number_raster function expected lowercase. I could alternatively change the make_biophysical_csv function output to be lowercase, however the function as a whole does accept uppercase column names (as shown in the sample data) so I elected not to change the make_biophysical_csv function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, that all makes sense to me, thanks!

seasonal_water_yield._calculate_curve_number_raster(
lulc_raster_path, soil_group_path, biophysical_df, cn_path)

cn = pygeoprocessing.raster_to_numpy_array(cn_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it can be nice to call this variable actual_cn so it's easy to see that it is the data we are testing.

Copy link
Contributor

@davemfish davemfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks!

soil_array = numpy.zeros((3, 3), dtype=numpy.int32)
for i, row in enumerate(soil_array):
row[:] = i % soil_groups + 1
make_raster_from_array(soil_array, soil_group_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, that all makes sense to me, thanks!

@davemfish davemfish merged commit b3774ae into natcap:main Jan 27, 2025
29 checks passed
@claire-simpson claire-simpson deleted the feature/swy-tests branch January 27, 2025 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SWY tests do not test raster outputs
2 participants