Add variables setup #388

dalonsoa · 2024-02-15T16:11:46Z

Description

Supersedes #371

This PR adds a more full featured version of the variables infrastructure. It adds two registries, one containing all the known variables based on the modules that are being used in the simulation and another one for runtime variables that are actually being used by the requested models.

Each variable can be initialised by one single model and can be updated (with a warning) and used by more than one model. Who initialises, updates and use each variable is added to the relevant attributes of each variable.

The availability of the axis each variable requires is also validated.

What is known to be wrong or missing

At the moment, we use the models required_init_vars to decide which variable is initialised by each model, but that is not what required_init_vars represents, but rather what variable needs to be in the data object to be able to initialise the model. So, in reality, they are more like vars_used, a new attribute that needs to be populated for each model. Likewise, the vars_updated might actually be more closely related to what the model is initialising, although maybe not in all cases.

Keeping this confusion aside, all models (including the core ones) will need to define their own variable.py module with the relevant variables. An example for one variable used by the plants model is included to show how that needs to be done. To implement that, it might be easier if each of you create such a variable module for each of your models and open a PR against this one.

Needless to say, tests and docs need to be implemented. That will be done once we are happy with the implementation. Until all the above is sorted, test will fail, naturally.

Fixes # (issue) - partly addresses #371

Type of change

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)
Breaking change

Key checklist

Make sure you've run the pre-commit checks: $ pre-commit run -a
All tests pass: $ poetry run pytest

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

dalonsoa · 2024-02-27T13:32:11Z

I'm not in a hurry, but do you have any comments on this, specially in what I mention in the "What is known to be wrong or missing" section?

davidorme

Definitely looks like what we're after. Some comments below. On this part of your PR message:

At the moment, we use the models required_init_vars to decide which variable is initialised by each model, but that is not what required_init_vars represents, but rather what variable needs to be in the data object to be able to initialise the model. So, in reality, they are more like vars_used, a new attribute that needs to be populated for each model. Likewise, the vars_updated might actually be more closely related to what the model is initialising, although maybe not in all cases.

Assuming we're only dealing with the init and update methods:

required_init_vars are variables that must be either provided through the run configuration or created by a previously initialised model. (I am vaguely wondering here whether there's mileage in having a Variable.initialise(self, data: Data) method that allows a variable initialisation to be importable across modules. That would save code duplication).
We probably need required_update_vars - variables that are required for the update method.
We have vars_updated but need vars_initialised.

davidorme · 2024-03-01T14:16:59Z

virtual_rainforest/core/config.py

+import virtual_rainforest.core.variables as variables
 from virtual_rainforest.core.exceptions import ConfigurationError


You're using this rather than from virtual_rainforest.core.variables import setup_variables to keep the use of the variables namespace clear in the flow? It's functionally identical (right?) - I'm only asking because the style differs from how we've imported other functionality.

It is not identical. The method I used does not import the contents of the module until setup_variables is used deep in the code. This avoids circular import errors since by the time it is necessary to really load the contents of variable all the dependencies are already loaded.

And it keep the namespace clean 😃

davidorme · 2024-03-01T14:19:43Z

virtual_rainforest/core/variables.py

+    list(getmembers(variables_submodule))
+


This looks like a return value but it isn't. Is the process of listing the members triggering the __post_init__ for each Variable class?

Yeah, I could not figure out a more elegant way of loading the contents of a module so the variables are registered. Any suggestion is most welcomed.

This has now been ditched.

davidorme · 2024-03-01T14:33:11Z

virtual_rainforest/core/variables.py

+RUN_VARIABLES_REGISTRY: dict[str, Variable] = {}
+"""The global registry of variables used in a run."""
+
+KNOWN_VARIABLES: dict[str, Variable] = {}
+"""The global known variable registry."""


I'm not sure I fully get the difference between these two registry objects. KNOWN_VARIABLES is only populated when a module is registered - and that only happens when the configuration of a run requests that module, so the keys of these should be identical within a run?

On the flip-side of that, we need a mechanism that allows the docs and data_variables.toml to be populated and those should contain all known models, not just the models in a specific run. We could just include model.variables.py in the autodoc for each model documentation, but what we really want is a single variables page, so having a function in variables that explicitly populates data_variables.toml from everything in the models submodule. We can then use that in sphinx.

Ok, I get it. It should have contained all possible variables, but clearly I put the registration in the wrong place.

I will look into this.

davidorme · 2024-07-11T21:58:00Z

It's really late in the day to be moaning about this, but I find I constantly stumble over the meanings of the model variable attributes. Does anyone else find this - if so could we switch up the names?

    required_init_vars -> vars_required_for_init
    vars_updated -> vars_updated
    required_update_vars -> vars_required_for_update
    populated_by_init_vars -> vars_populated_by_init
    populated_by_update_vars -> vars_populated_by_first_update

…lated by first update

…ented out more 'experimental' axis definitions.

davidorme · 2024-07-11T22:33:21Z

OK - this has basically been me smacking it repeatedly with a stick and swearing at it. I think we should get a logo for the VE that is just a particularly grouchy looking donkey.

I don't think I've done anything mad but @vgro and @TaranRallings please have a look at the changes to hydrology and animal model files.

Removed surface_runoff from initial input data.
Added a bunch of variables that are populated by the first update but were missing from populated_by_update_vars.
Correct space to spatial in variable axis definitions, added a placeholder time axis validator, moved some more outlandish hypothetical axes into comments in data_variables.toml.
Update the axis testing to expect the new time axis name.

I haven't got my head around why test_ve_run passed, yet basically running the same thing in a notebook failed.

vgro · 2024-07-12T07:26:37Z

Good morning, you were busy last night!

it is ok to remove the surface_runoff from the input data now, however, it will probably be needed once we introduce some sort of spinup. The idea is that the surface runoff needs to accumulate over a spinup period to reach a steady state. I can add it back when we get there.
the aerodynamic_resistance_surface is indeed initialized by the first update
the abiotic variables that are populated in the hydrology here are somewhat optional; it depends on the order of models. If abiotic runs first that is fine, but I thought we wanted to run the hydrology before abiotic?

vgro · 2024-07-12T07:29:02Z

This conversation definitely highlights the value of your contribution @dalonsoa , thank you :-)

dalonsoa · 2024-07-12T07:31:13Z

It's really late in the day to be moaning about this, but I find I constantly stumble over the meanings of the model variable attributes. Does anyone else find this - if so could we switch up the names?
    required_init_vars -> vars_required_for_init
    vars_updated -> vars_updated
    required_update_vars -> vars_required_for_update
    populated_by_init_vars -> vars_populated_by_init
    populated_by_update_vars -> vars_populated_by_first_update

No problem changing the names - just tell me what you want, and I'll change it :)

The first issue is the three variables in the hydrology model populated_by_init_vars that are also in the abiotic model. This is a known issue - the overlap is noted in comments - but the variables system can't run until this is resolved, right @dalonsoa ?

That's right. That's the point of the validation process done in the variables system, to avoid having two models initialising the same variables. It was mentioned in the comments that the adiabatic models were also initialising those, but there was no suggestion of a solution (at lease not one I can implement without knowing the science). The adiabatic models and the hydrology models are not parallel models (like adiabatic and adiabatic_simple that are mutually exclusive), so if they are to coexist, they need to have compatible variable initialisation processes.

vgro · 2024-07-12T07:34:52Z

It's really late in the day to be moaning about this, but I find I constantly stumble over the meanings of the model variable attributes. Does anyone else find this - if so could we switch up the names?
    required_init_vars -> vars_required_for_init
    vars_updated -> vars_updated
    required_update_vars -> vars_required_for_update
    populated_by_init_vars -> vars_populated_by_init
    populated_by_update_vars -> vars_populated_by_first_update
No problem changing the names - just tell me what you want, and I'll change it :)

The first issue is the three variables in the hydrology model populated_by_init_vars that are also in the abiotic model. This is a known issue - the overlap is noted in comments - but the variables system can't run until this is resolved, right @dalonsoa ?

That's right. That's the point of the validation process done in the variables system, to avoid having two models initialising the same variables. It was mentioned in the comments that the adiabatic models were also initialising those, but there was no suggestion of a solution (at lease not one I can implement without knowing the science). The adiabatic models and the hydrology models are not parallel models (like adiabatic and adiabatic_simple that are mutually exclusive), so if they are to coexist, they need to have compatible variable initialisation processes.

A solution would be to make sure that the abiotic model (whichever is selected) is initialized BEFORE the hydrology, then we can just delete these variables from the initialization of the hydrology model.

It is in the update step where it makes more sense to have the hydrology first to get the current soil moisture

dalonsoa · 2024-07-12T07:42:23Z

Could we indicate in the config file that hydrology depends on adiabatic? At the moment, it is the other way around in here, for example:

virtual_ecosystem/virtual_ecosystem/example_data/config/ve_run.toml

Lines 2 to 4 in 6812907

    
           [hydrology.depends] 
        
           init = ['plants'] 
        
           update = ['plants', 'abiotic_simple']

vgro · 2024-07-12T07:59:49Z

Could we indicate in the config file that hydrology depends on adiabatic? At the moment, it is the other way around in here, for example:

virtual_ecosystem/virtual_ecosystem/example_data/config/ve_run.toml

Lines 2 to 4 in 6812907

[hydrology.depends]

init = ['plants']

update = ['plants', 'abiotic_simple']

ah, this is what I was looking for. I think we want

init in this order: plants, abiotic_simple, hydrology, others
update in this order: hydrology, abiotic_simple, plants, others

which should look like this:

[hydrology.depends] 
init = ['plants', 'abiotic_simple'] 

[abiotic_simple.depends]
init=['plants']
update=['hydrology']

[plants.depends]
update=['abiotic_simple', 'hydrology']

Also, I just noticed that the infamous aerodynamic_resistance_surface is a required_update_var for the abiotic model. So it would be best to update the hydrology first.

can you confirm this @davidorme ?

dalonsoa · 2024-07-12T08:13:19Z

Actually, we don't need to worry about the order. If we indicate correctly what variables are initialised by what model and when, then we can use the information stored in the variables registry to come up with the right sequence for the init and the update. We don't need to indicate that in the config file - and we can completely ditch that part. At least when it comes to the initialisation.

davidorme · 2024-07-12T08:33:20Z

Actually, we don't need to worry about the order. If we indicate correctly what variables are initialised by what model and when, then we can use the information stored in the variables registry to come up with the right sequence for the init and the update. We don't need to indicate that in the config file - and we can completely ditch that part. At least when it comes to the initialisation.

Yeah - this is one of the main advantages of the variables system, I think. It can automatically establish if there is a feasible variable setup sequence for any given suite of models and give meaningful errors about why no sequence can be established.

davidorme · 2024-07-12T08:37:49Z

At least when it comes to the initialisation.

I think the same should be true for the update? Going into update, we know which variables have been loaded from data and populated by the model __init__ methods. So for each model, we can know which variables are missing and then which models populate that variable when they update. That ought to give us the update order.

davidorme · 2024-07-12T08:43:06Z

A solution would be to make sure that the abiotic model (whichever is selected) is initialized BEFORE the hydrology, then we can just delete these variables from the initialization of the hydrology model.

I'm not sure there is a right and wrong way here - you could calculate these variable within either model? We can only initialise or populate them in a single model, so we just have to pick one and do it and that comes down to whichever is the cleanest or most logical code or where the calculations fit better within the broader scientific theory. There are are always going to be things on the boundaries between models 😄

davidorme · 2024-07-12T08:45:21Z

It's really late in the day to be moaning about this, but I find I constantly stumble over the meanings of the model variable attributes. Does anyone else find this - if so could we switch up the names?
    required_init_vars -> vars_required_for_init
    vars_updated -> vars_updated
    required_update_vars -> vars_required_for_update
    populated_by_init_vars -> vars_populated_by_init
    populated_by_update_vars -> vars_populated_by_first_update
No problem changing the names - just tell me what you want, and I'll change it :)

@jacobcook1995 gave this the thumbs up, so unless anyone else is wildly bothered, I say we go for those switches. It probably makes sense to update the _check_... functions in core.variables to match.

davidorme · 2024-07-12T08:49:29Z

it is ok to remove the surface_runoff from the input data now, however, it will probably be needed once we introduce some sort of spinup. The idea is that the surface runoff needs to accumulate over a spinup period to reach a steady state. I can add it back when we get there.

I've a nasty feeling that we'll need to add to variables system then to track spinup variables and the order of execution of spinup methods... Although if spinup is limited to a very few variables and models, we could just provide external code to estimate the initial values to put in, rather than having to embed it all in the VE code.

dalonsoa · 2024-07-12T09:49:01Z

Actually, we don't need to worry about the order. If we indicate correctly what variables are initialised by what model and when, then we can use the information stored in the variables registry to come up with the right sequence for the init and the update. We don't need to indicate that in the config file - and we can completely ditch that part. At least when it comes to the initialisation.

Yeah - this is one of the main advantages of the variables system, I think. It can automatically establish if there is a feasible variable setup sequence for any given suite of models and give meaningful errors about why no sequence can be established.

Do we want to do this in this issue? It's becoming pretty large and I feel it would be best to consolidate what we have before starting to use the new functionality to improve the workflow.

dalonsoa · 2024-07-12T09:51:34Z

It's really late in the day to be moaning about this, but I find I constantly stumble over the meanings of the model variable attributes. Does anyone else find this - if so could we switch up the names?
    required_init_vars -> vars_required_for_init
    vars_updated -> vars_updated
    required_update_vars -> vars_required_for_update
    populated_by_init_vars -> vars_populated_by_init
    populated_by_update_vars -> vars_populated_by_first_update
No problem changing the names - just tell me what you want, and I'll change it :)
@jacobcook1995 gave this the thumbs up, so unless anyone else is wildly bothered, I say we go for those switches. It probably makes sense to update the _check_... functions in core.variables to match.

I'm going to make this change in this PR, right now. Let's see if I don't break anything else...

davidorme · 2024-07-12T09:59:13Z

Do we want to do this in this issue?

No. Nope. Nix. Hard pass.

It's becoming pretty large and I feel it would be best to consolidate what we have before starting to use the new functionality to improve the workflow.

☝️ 👆 This.

davidorme

Looks good to me! Time to bank?

dalonsoa · 2024-07-12T10:54:43Z

Yes, please!!! 🙏

vgro · 2024-07-12T11:00:03Z

yes, go for it 👍

vgro · 2024-07-12T11:00:30Z

oh you already did, perfect :-) Happy weekend!

dalonsoa and others added 19 commits January 23, 2024 10:54

Add initial version of Variables

8a5e49a

Add another debug check

d309849

Remove fields from init

a0337d5

Fix typo

f8f24a8

Make variables inmutbale

c147b5d

Alternative sketch of variable registry

4860a31

Merge branch 'develop' into variables

43282b0

Start from simple variable model

d997774

Expand attributes of Variable

edcfafb

Make the variables callables

bf099a8

Add register_variables

9d9fb65

Add initialise variables

7a7df0b

Add verify_updated_by

dcf73e7

Add verify_used_by

fee4a83

Reorganise some functionality

4a7070c

Add and run setup variables

14bcfec

Add axis verification

6febcf4

Regorganise imports

9eb0fb7

Add example of adding a variable

d68bd7c

dalonsoa requested review from davidorme, jacobcook1995, TaranRallings and vgro February 15, 2024 16:11

davidorme mentioned this pull request Feb 26, 2024

Fixing broken tests #393

Merged

7 tasks

Merge branch 'develop' into variables_1

84c7233

davidorme reviewed Mar 1, 2024

View reviewed changes

davidorme mentioned this pull request Mar 1, 2024

Clarify role of required init vars #349

Closed

dalonsoa added 2 commits March 20, 2024 06:25

Adapt to new name

24d94c3

Add vars used to base model

3d7c308

davidorme added 3 commits July 11, 2024 23:05

Removing surface_runoff from init data, fixing missing variables popu…

5afbe21

…lated by first update

Fixing up axis checking - added placeholder time axis validator, comm…

e8317d0

…ented out more 'experimental' axis definitions.

Updating axis testing for new time axis name

98335f9

Rename variables-related... variables?

65fdcc2

davidorme approved these changes Jul 12, 2024

View reviewed changes

davidorme merged commit fb253a5 into develop Jul 12, 2024
12 checks passed

davidorme deleted the variables_1 branch July 12, 2024 10:59

vgro mentioned this pull request Jul 18, 2024

List of variables in data object (toml) #202

Closed

davidorme mentioned this pull request Sep 24, 2024

Variables registry #370

Closed

davidorme mentioned this pull request Oct 10, 2024

New model documentation is outdated #588

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add variables setup #388

Add variables setup #388

dalonsoa commented Feb 15, 2024 •

edited

Loading

dalonsoa commented Feb 27, 2024

davidorme left a comment •

edited

Loading

davidorme Mar 1, 2024

dalonsoa Mar 20, 2024

davidorme Mar 1, 2024

dalonsoa Mar 20, 2024

dalonsoa Mar 20, 2024

davidorme Mar 1, 2024

dalonsoa Mar 20, 2024

davidorme commented Jul 11, 2024

davidorme commented Jul 11, 2024 •

edited

Loading

vgro commented Jul 12, 2024 •

edited

Loading

vgro commented Jul 12, 2024

dalonsoa commented Jul 12, 2024

vgro commented Jul 12, 2024 •

edited

Loading

dalonsoa commented Jul 12, 2024

vgro commented Jul 12, 2024 •

edited

Loading

dalonsoa commented Jul 12, 2024

davidorme commented Jul 12, 2024

davidorme commented Jul 12, 2024

davidorme commented Jul 12, 2024

davidorme commented Jul 12, 2024

davidorme commented Jul 12, 2024

dalonsoa commented Jul 12, 2024

dalonsoa commented Jul 12, 2024

davidorme commented Jul 12, 2024

davidorme left a comment

dalonsoa commented Jul 12, 2024

vgro commented Jul 12, 2024

vgro commented Jul 12, 2024

		import virtual_rainforest.core.variables as variables
		from virtual_rainforest.core.exceptions import ConfigurationError

Add variables setup #388

Add variables setup #388

Conversation

dalonsoa commented Feb 15, 2024 • edited Loading

Description

What is known to be wrong or missing

Type of change

Key checklist

Further checks

dalonsoa commented Feb 27, 2024

davidorme left a comment • edited Loading

Choose a reason for hiding this comment

davidorme Mar 1, 2024

Choose a reason for hiding this comment

dalonsoa Mar 20, 2024

Choose a reason for hiding this comment

davidorme Mar 1, 2024

Choose a reason for hiding this comment

dalonsoa Mar 20, 2024

Choose a reason for hiding this comment

dalonsoa Mar 20, 2024

Choose a reason for hiding this comment

davidorme Mar 1, 2024

Choose a reason for hiding this comment

dalonsoa Mar 20, 2024

Choose a reason for hiding this comment

davidorme commented Jul 11, 2024

davidorme commented Jul 11, 2024 • edited Loading

vgro commented Jul 12, 2024 • edited Loading

vgro commented Jul 12, 2024

dalonsoa commented Jul 12, 2024

vgro commented Jul 12, 2024 • edited Loading

dalonsoa commented Jul 12, 2024

vgro commented Jul 12, 2024 • edited Loading

dalonsoa commented Jul 12, 2024

davidorme commented Jul 12, 2024

davidorme commented Jul 12, 2024

davidorme commented Jul 12, 2024

davidorme commented Jul 12, 2024

davidorme commented Jul 12, 2024

dalonsoa commented Jul 12, 2024

dalonsoa commented Jul 12, 2024

davidorme commented Jul 12, 2024

davidorme left a comment

Choose a reason for hiding this comment

dalonsoa commented Jul 12, 2024

vgro commented Jul 12, 2024

vgro commented Jul 12, 2024

dalonsoa commented Feb 15, 2024 •

edited

Loading

davidorme left a comment •

edited

Loading

davidorme commented Jul 11, 2024 •

edited

Loading

vgro commented Jul 12, 2024 •

edited

Loading

vgro commented Jul 12, 2024 •

edited

Loading

vgro commented Jul 12, 2024 •

edited

Loading