Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate spatial files referenced in a CSV column #1504

Merged
merged 19 commits into from
Mar 29, 2024

Conversation

emlys
Copy link
Member

@emlys emlys commented Jan 18, 2024

Description

Fixes #327
Currently limited to validating rasters and vectors as the issue specified, but it would be simple to expand to all types of columns.

Checklist

  • Updated HISTORY.rst and link to any relevant issue (if these changes are user-facing)
  • Updated the user's guide (if needed)
  • Tested the Workbench UI (if relevant)

@@ -3449,36 +3447,6 @@ def logger_callback(proportion_complete):
return logger_callback


def _validate_habitat_table_paths(habitat_table_path):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now redundant to what's done in validation by default.

@@ -1044,23 +1044,13 @@ def _validate_threat_path(threat_path, lulc_key):
"""
# Checking threat path exists to control custom error messages
# for user readability.
try:
threat_gis_type = pygeoprocessing.get_gis_type(threat_path)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer need to check that the file can be opened by GDAL. Validation already does that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this model there are three columns expecting raster paths. Only the "cur_path" column is required. The "base_path", and "fut_path" columns are conditionally required. If the user has an empty "base_path" column with the conditional unmet (the column is not required but is present in the table), I think the model says that it's okay for that column to be empty.

Does that still work here with validation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that this check is still left in for the model to look for.

So I guess my question is, does validate check conditional spatial files referenced in a CSV?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that in validation for this an empty entry is returned with no error.

Copy link
Member Author

@emlys emlys Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, empty entries are passed through and don't raise any validation warning. Note that this applies even when the column is required - NA values don't fail validation because in some cases we do need to allow NA values.

@@ -1548,21 +1548,6 @@ def test_clip_project_vector_on_invalid_geometry(self):
n_actual_features = target_layer.GetFeatureCount()
self.assertTrue(n_actual_features == n_features - 1)

def test_invalid_habitat_table_paths(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now covered by regular validation and validation tests.

@emlys emlys changed the title Feature/327 Validate files referenced in a CSV column Jan 24, 2024
@emlys emlys changed the title Validate files referenced in a CSV column Validate spatial files referenced in a CSV column Jan 24, 2024
@emlys emlys requested a review from dcdenu4 January 24, 2024 20:06
@emlys emlys self-assigned this Jan 24, 2024
@emlys emlys marked this pull request as ready for review January 24, 2024 20:07
Copy link
Member

@dcdenu4 dcdenu4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emlys for this. I had some small comments / suggestions / questions. Looking at the PR checklist reminder, I was also curious if you tested this in the UI at all?

@@ -1044,23 +1044,13 @@ def _validate_threat_path(threat_path, lulc_key):
"""
# Checking threat path exists to control custom error messages
# for user readability.
try:
threat_gis_type = pygeoprocessing.get_gis_type(threat_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this model there are three columns expecting raster paths. Only the "cur_path" column is required. The "base_path", and "fut_path" columns are conditionally required. If the user has an empty "base_path" column with the conditional unmet (the column is not required but is present in the table), I think the model says that it's okay for that column to be empty.

Does that still work here with validation?

@@ -1044,23 +1044,13 @@ def _validate_threat_path(threat_path, lulc_key):
"""
# Checking threat path exists to control custom error messages
# for user readability.
try:
threat_gis_type = pygeoprocessing.get_gis_type(threat_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that this check is still left in for the model to look for.

So I guess my question is, does validate check conditional spatial files referenced in a CSV?

@@ -80,7 +80,7 @@
"values besides 0 or 1 will be treated as 0.")
}},
"fields": {},
"geometries": spec_utils.POLYGONS,
"geometries": spec_utils.ALL_GEOMS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is true that multiple geometry types are allowed for this, should the "about" section below be updated as well? It currently states:

For vectors, a polygon indicates...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I had to update this line because one of our tests used a non-polygon geometry, which would now fail validation.

@@ -1044,23 +1044,13 @@ def _validate_threat_path(threat_path, lulc_key):
"""
# Checking threat path exists to control custom error messages
# for user readability.
try:
threat_gis_type = pygeoprocessing.get_gis_type(threat_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that in validation for this an empty entry is returned with no error.

tests/test_validation.py Outdated Show resolved Hide resolved
tests/test_validation.py Outdated Show resolved Hide resolved
@@ -1352,6 +1342,43 @@ def test_rows(self):
self.assertEqual(df['row2'][1], 3)
self.assertEqual(df['row2'].dtype, float)

def test_recursive_path_validation(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there also be a test case for vector paths?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I added a couple more tests

@emlys emlys requested a review from dcdenu4 March 12, 2024 21:35
dcdenu4
dcdenu4 previously approved these changes Mar 20, 2024
Copy link
Member

@dcdenu4 dcdenu4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Emily! I think this is looking good!

@dcdenu4
Copy link
Member

dcdenu4 commented Mar 20, 2024

Hey @emlys, there appears to be a Windows Workbench test that fails consistently.

@dcdenu4 dcdenu4 self-requested a review March 20, 2024 17:37
Copy link
Member

@dcdenu4 dcdenu4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removing approval, so we can check out that test failure.

@dcdenu4 dcdenu4 self-requested a review March 20, 2024 17:39
@dcdenu4 dcdenu4 dismissed their stale review March 20, 2024 17:39

Test failing.

Copy link
Member

@dcdenu4 dcdenu4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried for the first time, "dismiss review".

@emlys
Copy link
Member Author

emlys commented Mar 28, 2024

@dcdenu4 , I'm not sure what was happening with that test, but it's passing now, so it must've just been flaky. I'll go ahead and merge if you approve!

@emlys emlys requested a review from dcdenu4 March 28, 2024 18:16
@dcdenu4 dcdenu4 enabled auto-merge March 29, 2024 18:14
Copy link
Member

@dcdenu4 dcdenu4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dcdenu4 dcdenu4 merged commit da55458 into natcap:main Mar 29, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow CSV validation to validate rasters/vectors referenced in a CSV column
2 participants