Proposal (and bugfix) to implement value-based conditional requirement #1511

phargogh · 2024-01-24T20:12:05Z

This proposal implements value-based conditional requirement expressions in validation.

The use case for this is that sometimes (like in UNA, see #1509 ), validation is conditional not only on whether a parameter is provided and has a value (is "sufficient"), but sometimes validation depends on the value of another key. In the case of UNA, several different inputs are required depending on the value of search_radius_mode. If search_radius_mode represents uniform search radii, then args['search_radius'] is required. If search_radius_mode represents search radii by population group, then args['population_gorup_radii_table'] is required.

An alternate approach to this could have been to just write out the required logic in UNA's validate function, but this feature also seemed useful and perhaps worth expanding. I would be happy to replace this implementation with such a model-specific implementation if that is preferred.

Implementation-wise, I thought it might be convenient to be able to use a dot-notation in the MODEL_SPEC expression to represent the argument's value. So for the above example, one might use search_radius_mode.value == "{RADIUS_OPT_UNIFORM}". Behind the scenes, validation (as proposed) will tokenize the expression and rewrite search_radius_mode.value into a new local variable name. This allows backwards compatibility with existing expressions throughout InVEST, while supporting the proposed dot-notation.

Critical feedback is very welcome!

Fixes #1509
Fixes #1165

Checklist

Updated HISTORY.rst and link to any relevant issue (if these changes are user-facing)
~~- [ ] Updated the user's guide (if needed)~~
Tested the Workbench UI (if relevant)

RE:natcap#1503

…509-una-validation-missing-uniform-search-radius

src/natcap/invest/urban_nature_access.py

emlys

@phargogh I really like where this is going! Our existing validation doesn't yet evaluate conditional requirement of CSV columns or vector fields, so I think that would need to be added. I think we just left it out of the original ARGS_SPEC design to keep things simple, but it would be a great improvement to have.

I found two other models where I left in a placeholder conditional requrirement for CSV columns. It would be great to have those working too:

invest/src/natcap/invest/forest_carbon_edge_effect.py

Lines 83 to 106 in 9e50106

    
           "c_below": { 
        
               "type": "number", 
        
               "units": u.metric_ton/u.hectare, 
        
               "required": "pools_to_calculate == 'all'", 
        
               "about": gettext( 
        
                   "Carbon density value for the belowground carbon " 
        
                   "pool. Required if calculating all pools.") 
        
           }, 
        
           "c_soil": { 
        
               "type": "number", 
        
               "units": u.metric_ton/u.hectare, 
        
               "required": "pools_to_calculate == 'all'", 
        
               "about": gettext( 
        
                   "Carbon density value for the soil carbon pool. " 
        
                   "Required if calculating all pools.") 
        
           }, 
        
           "c_dead": { 
        
               "type": "number", 
        
               "units": u.metric_ton/u.hectare, 
        
               "required": "pools_to_calculate == 'all'", 
        
               "about": gettext( 
        
                   "Carbon density value for the dead matter carbon " 
        
                   "pool. Required if calculating all pools.") 
        
           },

invest/src/natcap/invest/urban_cooling_model.py

Lines 72 to 96 in 9e50106

    
           "shade":  { 
        
               "type": "ratio", 
        
               "required": "cc_method == factors", 
        
               "about": gettext( 
        
                   "The proportion of area in this LULC class that is " 
        
                   "covered by tree canopy at least 2 meters high. " 
        
                   "Required if the 'factors' option is selected for " 
        
                   "the Cooling Capacity Calculation Method.")}, 
        
           "albedo": { 
        
               "type": "ratio", 
        
               "required": "cc_method == factors", 
        
               "about": gettext( 
        
                   "The proportion of solar radiation that is directly " 
        
                   "reflected by this LULC class. Required if the " 
        
                   "'factors' option is selected for the Cooling " 
        
                   "Capacity Calculation Method.")}, 
        
           "building_intensity": { 
        
               "type": "ratio", 
        
               "required": "cc_method == intensity", 
        
               "about": gettext( 
        
                   "The ratio of building floor area to footprint " 
        
                   "area, with all values in this column normalized " 
        
                   "between 0 and 1. Required if the 'intensity' option " 
        
                   "is selected for the Cooling Capacity Calculation " 
        
                   "Method.")}

It's interesting that we've done this before in a limited way with boolean inputs by lumping in their true/false value with "input sufficiency". Which has worked well, but would it make more sense to explicitly check the value of the input like you propose here? For instance,

"required": "do_valuation.value == True" vs
"required": "do_valuation.value" vs
"required": "do_valuation"?

…earch-radius

RE:natcap#1509

Conditional requirement now evaluates the truthiness of the expression based on the value of the parameter. If the parameter is missing, that parameter's value is Falsy. RE:natcap#1509

The latest update to using truthiness in conditional requirement expressions handles all of these cases. RE:natcap#1509

This ended up not being needed if we just work directly with truthiness. RE:natcap#1509

RE:natcap#1509

…. RE:natcap#1509

…ting tests to write. RE:natcap#1509

…s' of github.com:phargogh/invest into bugfix/1509-una-validation-missing-uniform-search-radius

RE:natcap#1509

phargogh · 2024-02-14T18:33:09Z

Thanks so much for taking a look at this @emlys ! I made a few changes that ultimately resulted in a larger refactor that really simplified validate. Here's a summary:

Expressions Now Use Arg Values

Symbols in expressions that represent args keys will now contain the user-provided value of the arg. If the key is not present in args, a value of False is assumed in these expressions. The symbol map passed to the expression is therefore {key: args.get(key, False) for key in spec}.

Even though we have been assuming so far that keys in expressions represent sufficiency (whether a key has a non-empty value in args), sufficiency is really just the truthy evaluation of the arg value ... so why not just use the arg value itself? There was only one place in all of InVEST where the expression needed to be changed (in UCM) because of this change in meaning.

This also gets around your question about the value or sufficiency of checkboxes. By switching to using the values of the checkboxes, we're being more explicit about just using the value of the input (whatever type that is), and allowing the programmer to decide how to handle its type in an expression that evaluates to a bool.

Expression Results are Cast to `bool`

Now that expressions use arg values, the result of the eval'd expression is now explicitly cast to bool. This allows us to ensure that we have only 2 possible states for conditional validity: valid and invalid. This works because a string with nonzero length is interpreted as True in a truthy context.

Switching to Truthiness = No Need to `tokenize`

Switching to truthiness means that there's no need to do any special parsing or mangling of variable names - we've simplified the meaning of the expression to be closer to how python handles expressions.

emlys

Thanks @phargogh! I really like how this simplifies validation. I had some questions but nothing major.

src/natcap/invest/validation.py

emlys · 2024-02-20T21:50:09Z

src/natcap/invest/validation.py

+    insufficient_keys = set()
+    for key in spec.keys():
+        try:
+            sufficient_inputs[key] = bool(args[key])


Would an arg value of 0 be labeled insufficient here?

This specific line has been removed in a prior edit, but this is a case that the authors of expressions will need to be aware of. One cost of this proposed simplification is that it is possible for there to be a falsey value (like 0) which was previously acknowledged as "sufficient" to now be effectively insufficient. This can be mitigated by better, context-aware expressions in cases where this may be an issue, so I think we'll largely be OK. In practice, I don't think there are many places where we are likely to encounter a value of 0.

emlys · 2024-02-22T19:10:35Z

src/natcap/invest/validation.py

-    sufficient_keys = set(args.keys()).difference(insufficient_keys)
-    for key in sufficient_keys.difference(excluded_keys):
+    insufficient_keys = (missing_keys | keys_with_no_value)
+    for key in set(args.keys()) - insufficient_keys:


If I'm understanding, there is a subtle change here: before, we removed excluded_keys from the set to validate. Now, any keys that have a value will be validated, even if they're not required. I'm not sure what the correct answer is, but it seems worth noting the change.

Yes, this exclusion was definitely something I needed to address, and the fix now includes a better variable name for excluded_keys to clarify how they were being used.

@phargogh might there still be a difference if a non-required key has a real value? Suppose you have a valuation_table that's required only if do_valuation is True. Then if we provide args where do_valuation is False and valuation_table is a valid path, should we still validate valuation_table?

Great question, and thank you for thinking critically about this!

This specific snippet we're discussing here isn't updating for some reason, but there is now logic to cover the case where a non-required parameter has a truthy value. So yes, in the latest revision (at least as of rev 617e76d) I would expect the validation you describe to take place.

Ah okay, thanks for clarifying. I think that validation would not take place before, because excluded_keys were removed from the set to validate:

invest/src/natcap/invest/validation.py

Lines 1003 to 1004 in 9a70648

sufficient_keys = set(args.keys()).difference(insufficient_keys)

for key in sufficient_keys.difference(excluded_keys):

Is that a change worth noting in history?

Oh! I see what you mean. Sorry for my confusion about this! Yes, this is indeed a change in behavior that I think is for the best and I have noted the changes in HISTORY. Once we figure out the code-signing issue, I'll share a dev build with Stacie to have her try it out.

emlys

@phargogh I just had a couple more comments. The test suite also failed - not sure if it's related

RE:natcap#1509

Set names are now clearer, and type-specific validation will no longer take place if the value is falsey. RE:natcap#1509

phargogh · 2024-02-23T00:17:36Z

Thanks @emlys ! Yes, the test failures were definitely related and I believe I have addressed everything. Let me know what you think!

RE:natcap#1509

…earch-radius

emlys

Awesome, thanks @phargogh!

phargogh added 4 commits January 24, 2024 11:55

Varname mangling for value-based conditional req.

845c49f

RE:natcap#1503

Forgot a few docstrings. RE:natcap#1503

969ef23

Merge branch 'main' of https://github.com/natcap/invest into bugfix/1…

273ef3f

…509-una-validation-missing-uniform-search-radius

Noting change in HISTORY. RE:natcap#1509

6096ad6

phargogh added the proposal Internal software team proposal label Jan 24, 2024

phargogh requested a review from emlys January 24, 2024 20:12

phargogh assigned phargogh and emlys Jan 24, 2024

emlys reviewed Jan 25, 2024

View reviewed changes

src/natcap/invest/urban_nature_access.py Outdated Show resolved Hide resolved

emlys reviewed Jan 25, 2024

View reviewed changes

src/natcap/invest/urban_nature_access.py Outdated Show resolved Hide resolved

emlys reviewed Jan 25, 2024

View reviewed changes

src/natcap/invest/urban_nature_access.py Outdated Show resolved Hide resolved

emlys reviewed Jan 25, 2024

View reviewed changes

phargogh added 17 commits February 13, 2024 15:42

Merge branch 'main' into bugfix/1509-una-validation-missing-uniform-s…

85d8028

…earch-radius

Reworking sufficiency for clarity. RE:natcap#1509

66f1490

Reworking comprehensions into standard loop for readability.

37a0338

RE:natcap#1509

Reworking conditional requirement.

e1bef80

Conditional requirement now evaluates the truthiness of the expression based on the value of the parameter. If the parameter is missing, that parameter's value is Falsy. RE:natcap#1509

Removing .value in UNA conditional requirement.

658748d

The latest update to using truthiness in conditional requirement expressions handles all of these cases. RE:natcap#1509

Removing the variable name rewriting.

0ce3303

This ended up not being needed if we just work directly with truthiness. RE:natcap#1509

Slight refactor for efficiency. RE:natcap#1509

041f411

First stab at rewriting nested cond. requirement.

4a1f548

RE:natcap#1509

Removing unnecessary conditional requirement - table already required…

416240d

…. RE:natcap#1509

Clarifying varnames, copying spec to avoid modifying by reference, no…

21ae820

…ting tests to write. RE:natcap#1509

Merge branch 'bugfix/1509-una-validation-missing-uniform-search-radiu…

b5a98a4

…s' of github.com:phargogh/invest into bugfix/1509-una-validation-missing-uniform-search-radius

Adding a test for vector field validity.

4135eec

RE:natcap#1509

Adding a test for csv column conditional validity.

0b9111f

RE:natcap#1509

Correcting WKT syntax. RE:natcap#1509

5e6e35b

Adding a test for CSV row conditional validation. RE:natcap#1509

b02b49c

Fixing expression in UCM. RE:natcap#1509

fec802a

Fixing operands in wind energy expression. RE:natcap#1509

f8497f6

phargogh added 2 commits February 14, 2024 10:36

Removing a comment that is no longer relevant. RE:natcap#1509

cb70a12

Switching bitwise operations to python bools. RE:natcap#1509

ecdc498

phargogh requested a review from emlys February 14, 2024 18:45

Correcting boolean operator and->or. RE:natcap#1509

bf55a26

phargogh mentioned this pull request Feb 15, 2024

Implement conditional requirement for required keys anywhere in the args spec #1165

Closed

emlys requested changes Feb 20, 2024

View reviewed changes

phargogh added 5 commits February 21, 2024 14:24

Light linting, mostly around line length. RE:natcap#1509

482b327

Removing a deprecated line. RE:natcap#1509

b32fcbf

Simplifying validate(). RE:natcap#1509

1deee3e

Adding directory axis. RE:natcap#1509

880e31f

Casting to bool outside of the expression. RE:natcap#1509

d70b15d

phargogh requested a review from emlys February 22, 2024 00:06

emlys reviewed Feb 22, 2024

View reviewed changes

emlys self-requested a review February 22, 2024 19:14

emlys reviewed Feb 22, 2024

View reviewed changes

phargogh added 3 commits February 22, 2024 15:37

Small test fix from slight return value change.

831fd99

RE:natcap#1509

HQ validation now partially covered by expressions.

4b6f932

RE:natcap#1509

Clarifying set names, restoring falsey exclusion.

617e76d

Set names are now clearer, and type-specific validation will no longer take place if the value is falsey. RE:natcap#1509

phargogh requested a review from emlys February 23, 2024 00:17

phargogh added 2 commits February 27, 2024 15:37

Noting changes in history.

03c7fb0

RE:natcap#1509

Merge branch 'main' into bugfix/1509-una-validation-missing-uniform-s…

c49c398

…earch-radius

emlys approved these changes Mar 11, 2024

View reviewed changes

emlys merged commit 724e9b4 into natcap:main Mar 11, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal (and bugfix) to implement value-based conditional requirement #1511

Proposal (and bugfix) to implement value-based conditional requirement #1511

phargogh commented Jan 24, 2024 •

edited

Loading

emlys left a comment

phargogh commented Feb 14, 2024

emlys left a comment

emlys Feb 20, 2024

phargogh Feb 22, 2024

emlys Feb 22, 2024

phargogh Feb 23, 2024

emlys Feb 23, 2024

phargogh Feb 23, 2024

emlys Feb 23, 2024

phargogh Feb 27, 2024 •

edited

Loading

emlys left a comment

phargogh commented Feb 23, 2024

emlys left a comment

	"c_below": {
	"type": "number",
	"units": u.metric_ton/u.hectare,
	"required": "pools_to_calculate == 'all'",
	"about": gettext(
	"Carbon density value for the belowground carbon "
	"pool. Required if calculating all pools.")
	},
	"c_soil": {
	"type": "number",
	"units": u.metric_ton/u.hectare,
	"required": "pools_to_calculate == 'all'",
	"about": gettext(
	"Carbon density value for the soil carbon pool. "
	"Required if calculating all pools.")
	},
	"c_dead": {
	"type": "number",
	"units": u.metric_ton/u.hectare,
	"required": "pools_to_calculate == 'all'",
	"about": gettext(
	"Carbon density value for the dead matter carbon "
	"pool. Required if calculating all pools.")
	},

	"shade": {
	"type": "ratio",
	"required": "cc_method == factors",
	"about": gettext(
	"The proportion of area in this LULC class that is "
	"covered by tree canopy at least 2 meters high. "
	"Required if the 'factors' option is selected for "
	"the Cooling Capacity Calculation Method.")},
	"albedo": {
	"type": "ratio",
	"required": "cc_method == factors",
	"about": gettext(
	"The proportion of solar radiation that is directly "
	"reflected by this LULC class. Required if the "
	"'factors' option is selected for the Cooling "
	"Capacity Calculation Method.")},
	"building_intensity": {
	"type": "ratio",
	"required": "cc_method == intensity",
	"about": gettext(
	"The ratio of building floor area to footprint "
	"area, with all values in this column normalized "
	"between 0 and 1. Required if the 'intensity' option "
	"is selected for the Cooling Capacity Calculation "
	"Method.")}

	sufficient_keys = set(args.keys()).difference(insufficient_keys)
	for key in sufficient_keys.difference(excluded_keys):

Proposal (and bugfix) to implement value-based conditional requirement #1511

Proposal (and bugfix) to implement value-based conditional requirement #1511

Conversation

phargogh commented Jan 24, 2024 • edited Loading

Checklist

emlys left a comment

Choose a reason for hiding this comment

phargogh commented Feb 14, 2024

Expressions Now Use Arg Values

Expression Results are Cast to bool

Switching to Truthiness = No Need to tokenize

emlys left a comment

Choose a reason for hiding this comment

emlys Feb 20, 2024

Choose a reason for hiding this comment

phargogh Feb 22, 2024

Choose a reason for hiding this comment

emlys Feb 22, 2024

Choose a reason for hiding this comment

phargogh Feb 23, 2024

Choose a reason for hiding this comment

emlys Feb 23, 2024

Choose a reason for hiding this comment

phargogh Feb 23, 2024

Choose a reason for hiding this comment

emlys Feb 23, 2024

Choose a reason for hiding this comment

phargogh Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

emlys left a comment

Choose a reason for hiding this comment

phargogh commented Feb 23, 2024

emlys left a comment

Choose a reason for hiding this comment

phargogh commented Jan 24, 2024 •

edited

Loading

Expression Results are Cast to `bool`

Switching to Truthiness = No Need to `tokenize`

phargogh Feb 27, 2024 •

edited

Loading