Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add variables setup #388

Merged
merged 140 commits into from
Jul 12, 2024
Merged

Add variables setup #388

merged 140 commits into from
Jul 12, 2024

Conversation

dalonsoa
Copy link
Collaborator

@dalonsoa dalonsoa commented Feb 15, 2024

Description

Supersedes #371

This PR adds a more full featured version of the variables infrastructure. It adds two registries, one containing all the known variables based on the modules that are being used in the simulation and another one for runtime variables that are actually being used by the requested models.

Each variable can be initialised by one single model and can be updated (with a warning) and used by more than one model. Who initialises, updates and use each variable is added to the relevant attributes of each variable.

The availability of the axis each variable requires is also validated.

What is known to be wrong or missing

At the moment, we use the models required_init_vars to decide which variable is initialised by each model, but that is not what required_init_vars represents, but rather what variable needs to be in the data object to be able to initialise the model. So, in reality, they are more like vars_used, a new attribute that needs to be populated for each model. Likewise, the vars_updated might actually be more closely related to what the model is initialising, although maybe not in all cases.

Keeping this confusion aside, all models (including the core ones) will need to define their own variable.py module with the relevant variables. An example for one variable used by the plants model is included to show how that needs to be done. To implement that, it might be easier if each of you create such a variable module for each of your models and open a PR against this one.

Needless to say, tests and docs need to be implemented. That will be done once we are happy with the implementation. Until all the above is sorted, test will fail, naturally.

Fixes # (issue) - partly addresses #371

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@dalonsoa
Copy link
Collaborator Author

I'm not in a hurry, but do you have any comments on this, specially in what I mention in the "What is known to be wrong or missing" section?

Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely looks like what we're after. Some comments below. On this part of your PR message:

At the moment, we use the models required_init_vars to decide which variable is initialised by each model, but that is not what required_init_vars represents, but rather what variable needs to be in the data object to be able to initialise the model. So, in reality, they are more like vars_used, a new attribute that needs to be populated for each model. Likewise, the vars_updated might actually be more closely related to what the model is initialising, although maybe not in all cases.

Assuming we're only dealing with the init and update methods:

  • required_init_vars are variables that must be either provided through the run configuration or created by a previously initialised model. (I am vaguely wondering here whether there's mileage in having a Variable.initialise(self, data: Data) method that allows a variable initialisation to be importable across modules. That would save code duplication).
  • We probably need required_update_vars - variables that are required for the update method.
  • We have vars_updated but need vars_initialised.

Comment on lines 20 to 21
import virtual_rainforest.core.variables as variables
from virtual_rainforest.core.exceptions import ConfigurationError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using this rather than from virtual_rainforest.core.variables import setup_variables to keep the use of the variables namespace clear in the flow? It's functionally identical (right?) - I'm only asking because the style differs from how we've imported other functionality.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not identical. The method I used does not import the contents of the module until setup_variables is used deep in the code. This avoids circular import errors since by the time it is necessary to really load the contents of variable all the dependencies are already loaded.

And it keep the namespace clean 😃

Comment on lines 78 to 79
list(getmembers(variables_submodule))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a return value but it isn't. Is the process of listing the members triggering the __post_init__ for each Variable class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I could not figure out a more elegant way of loading the contents of a module so the variables are registered. Any suggestion is most welcomed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has now been ditched.

Comment on lines 53 to 57
RUN_VARIABLES_REGISTRY: dict[str, Variable] = {}
"""The global registry of variables used in a run."""

KNOWN_VARIABLES: dict[str, Variable] = {}
"""The global known variable registry."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully get the difference between these two registry objects. KNOWN_VARIABLES is only populated when a module is registered - and that only happens when the configuration of a run requests that module, so the keys of these should be identical within a run?

On the flip-side of that, we need a mechanism that allows the docs and data_variables.toml to be populated and those should contain all known models, not just the models in a specific run. We could just include model.variables.py in the autodoc for each model documentation, but what we really want is a single variables page, so having a function in variables that explicitly populates data_variables.toml from everything in the models submodule. We can then use that in sphinx.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I get it. It should have contained all possible variables, but clearly I put the registration in the wrong place.

I will look into this.

@davidorme
Copy link
Collaborator

It's really late in the day to be moaning about this, but I find I constantly stumble over the meanings of the model variable attributes. Does anyone else find this - if so could we switch up the names?

    required_init_vars -> vars_required_for_init
    vars_updated -> vars_updated
    required_update_vars -> vars_required_for_update
    populated_by_init_vars -> vars_populated_by_init
    populated_by_update_vars -> vars_populated_by_first_update

@davidorme
Copy link
Collaborator

davidorme commented Jul 11, 2024

OK - this has basically been me smacking it repeatedly with a stick and swearing at it. I think we should get a logo for the VE that is just a particularly grouchy looking donkey.

I don't think I've done anything mad but @vgro and @TaranRallings please have a look at the changes to hydrology and animal model files.

  1. Removed surface_runoff from initial input data.
  2. Added a bunch of variables that are populated by the first update but were missing from populated_by_update_vars.
  3. Correct space to spatial in variable axis definitions, added a placeholder time axis validator, moved some more outlandish hypothetical axes into comments in data_variables.toml.
  4. Update the axis testing to expect the new time axis name.

I haven't got my head around why test_ve_run passed, yet basically running the same thing in a notebook failed.

@vgro
Copy link
Collaborator

vgro commented Jul 12, 2024

Good morning, you were busy last night!

  • it is ok to remove the surface_runoff from the input data now, however, it will probably be needed once we introduce some sort of spinup. The idea is that the surface runoff needs to accumulate over a spinup period to reach a steady state. I can add it back when we get there.
  • the aerodynamic_resistance_surface is indeed initialized by the first update
  • the abiotic variables that are populated in the hydrology here are somewhat optional; it depends on the order of models. If abiotic runs first that is fine, but I thought we wanted to run the hydrology before abiotic?

@vgro
Copy link
Collaborator

vgro commented Jul 12, 2024

This conversation definitely highlights the value of your contribution @dalonsoa , thank you :-)

@dalonsoa
Copy link
Collaborator Author

It's really late in the day to be moaning about this, but I find I constantly stumble over the meanings of the model variable attributes. Does anyone else find this - if so could we switch up the names?

    required_init_vars -> vars_required_for_init
    vars_updated -> vars_updated
    required_update_vars -> vars_required_for_update
    populated_by_init_vars -> vars_populated_by_init
    populated_by_update_vars -> vars_populated_by_first_update

No problem changing the names - just tell me what you want, and I'll change it :)

The first issue is the three variables in the hydrology model populated_by_init_vars that are also in the abiotic model. This is a known issue - the overlap is noted in comments - but the variables system can't run until this is resolved, right @dalonsoa ?

That's right. That's the point of the validation process done in the variables system, to avoid having two models initialising the same variables. It was mentioned in the comments that the adiabatic models were also initialising those, but there was no suggestion of a solution (at lease not one I can implement without knowing the science). The adiabatic models and the hydrology models are not parallel models (like adiabatic and adiabatic_simple that are mutually exclusive), so if they are to coexist, they need to have compatible variable initialisation processes.

@vgro
Copy link
Collaborator

vgro commented Jul 12, 2024

It's really late in the day to be moaning about this, but I find I constantly stumble over the meanings of the model variable attributes. Does anyone else find this - if so could we switch up the names?

    required_init_vars -> vars_required_for_init
    vars_updated -> vars_updated
    required_update_vars -> vars_required_for_update
    populated_by_init_vars -> vars_populated_by_init
    populated_by_update_vars -> vars_populated_by_first_update

No problem changing the names - just tell me what you want, and I'll change it :)

The first issue is the three variables in the hydrology model populated_by_init_vars that are also in the abiotic model. This is a known issue - the overlap is noted in comments - but the variables system can't run until this is resolved, right @dalonsoa ?

That's right. That's the point of the validation process done in the variables system, to avoid having two models initialising the same variables. It was mentioned in the comments that the adiabatic models were also initialising those, but there was no suggestion of a solution (at lease not one I can implement without knowing the science). The adiabatic models and the hydrology models are not parallel models (like adiabatic and adiabatic_simple that are mutually exclusive), so if they are to coexist, they need to have compatible variable initialisation processes.

A solution would be to make sure that the abiotic model (whichever is selected) is initialized BEFORE the hydrology, then we can just delete these variables from the initialization of the hydrology model.

It is in the update step where it makes more sense to have the hydrology first to get the current soil moisture

@dalonsoa
Copy link
Collaborator Author

Could we indicate in the config file that hydrology depends on adiabatic? At the moment, it is the other way around in here, for example:

[hydrology.depends]
init = ['plants']
update = ['plants', 'abiotic_simple']

@vgro
Copy link
Collaborator

vgro commented Jul 12, 2024

Could we indicate in the config file that hydrology depends on adiabatic? At the moment, it is the other way around in here, for example:

[hydrology.depends]
init = ['plants']
update = ['plants', 'abiotic_simple']

ah, this is what I was looking for. I think we want

init in this order: plants, abiotic_simple, hydrology, others
update in this order: hydrology, abiotic_simple, plants, others

which should look like this:

[hydrology.depends] 
init = ['plants', 'abiotic_simple'] 

[abiotic_simple.depends]
init=['plants']
update=['hydrology']

[plants.depends]
update=['abiotic_simple', 'hydrology']

Also, I just noticed that the infamous aerodynamic_resistance_surface is a required_update_var for the abiotic model. So it would be best to update the hydrology first.

can you confirm this @davidorme ?

@dalonsoa
Copy link
Collaborator Author

Actually, we don't need to worry about the order. If we indicate correctly what variables are initialised by what model and when, then we can use the information stored in the variables registry to come up with the right sequence for the init and the update. We don't need to indicate that in the config file - and we can completely ditch that part. At least when it comes to the initialisation.

@davidorme
Copy link
Collaborator

Actually, we don't need to worry about the order. If we indicate correctly what variables are initialised by what model and when, then we can use the information stored in the variables registry to come up with the right sequence for the init and the update. We don't need to indicate that in the config file - and we can completely ditch that part. At least when it comes to the initialisation.

Yeah - this is one of the main advantages of the variables system, I think. It can automatically establish if there is a feasible variable setup sequence for any given suite of models and give meaningful errors about why no sequence can be established.

@davidorme
Copy link
Collaborator

At least when it comes to the initialisation.

I think the same should be true for the update? Going into update, we know which variables have been loaded from data and populated by the model __init__ methods. So for each model, we can know which variables are missing and then which models populate that variable when they update. That ought to give us the update order.

@davidorme
Copy link
Collaborator

A solution would be to make sure that the abiotic model (whichever is selected) is initialized BEFORE the hydrology, then we can just delete these variables from the initialization of the hydrology model.

I'm not sure there is a right and wrong way here - you could calculate these variable within either model? We can only initialise or populate them in a single model, so we just have to pick one and do it and that comes down to whichever is the cleanest or most logical code or where the calculations fit better within the broader scientific theory. There are are always going to be things on the boundaries between models 😄

@davidorme
Copy link
Collaborator

It's really late in the day to be moaning about this, but I find I constantly stumble over the meanings of the model variable attributes. Does anyone else find this - if so could we switch up the names?

    required_init_vars -> vars_required_for_init
    vars_updated -> vars_updated
    required_update_vars -> vars_required_for_update
    populated_by_init_vars -> vars_populated_by_init
    populated_by_update_vars -> vars_populated_by_first_update

No problem changing the names - just tell me what you want, and I'll change it :)

@jacobcook1995 gave this the thumbs up, so unless anyone else is wildly bothered, I say we go for those switches. It probably makes sense to update the _check_... functions in core.variables to match.

@davidorme
Copy link
Collaborator

  • it is ok to remove the surface_runoff from the input data now, however, it will probably be needed once we introduce some sort of spinup. The idea is that the surface runoff needs to accumulate over a spinup period to reach a steady state. I can add it back when we get there.

I've a nasty feeling that we'll need to add to variables system then to track spinup variables and the order of execution of spinup methods... Although if spinup is limited to a very few variables and models, we could just provide external code to estimate the initial values to put in, rather than having to embed it all in the VE code.

@dalonsoa
Copy link
Collaborator Author

Actually, we don't need to worry about the order. If we indicate correctly what variables are initialised by what model and when, then we can use the information stored in the variables registry to come up with the right sequence for the init and the update. We don't need to indicate that in the config file - and we can completely ditch that part. At least when it comes to the initialisation.

Yeah - this is one of the main advantages of the variables system, I think. It can automatically establish if there is a feasible variable setup sequence for any given suite of models and give meaningful errors about why no sequence can be established.

Do we want to do this in this issue? It's becoming pretty large and I feel it would be best to consolidate what we have before starting to use the new functionality to improve the workflow.

@dalonsoa
Copy link
Collaborator Author

It's really late in the day to be moaning about this, but I find I constantly stumble over the meanings of the model variable attributes. Does anyone else find this - if so could we switch up the names?

    required_init_vars -> vars_required_for_init
    vars_updated -> vars_updated
    required_update_vars -> vars_required_for_update
    populated_by_init_vars -> vars_populated_by_init
    populated_by_update_vars -> vars_populated_by_first_update

No problem changing the names - just tell me what you want, and I'll change it :)

@jacobcook1995 gave this the thumbs up, so unless anyone else is wildly bothered, I say we go for those switches. It probably makes sense to update the _check_... functions in core.variables to match.

I'm going to make this change in this PR, right now. Let's see if I don't break anything else...

@davidorme
Copy link
Collaborator

Do we want to do this in this issue?

No. Nope. Nix. Hard pass.

It's becoming pretty large and I feel it would be best to consolidate what we have before starting to use the new functionality to improve the workflow.

☝️ 👆 This.

Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Time to bank?

@dalonsoa
Copy link
Collaborator Author

Yes, please!!! 🙏

@davidorme davidorme merged commit fb253a5 into develop Jul 12, 2024
12 checks passed
@davidorme davidorme deleted the variables_1 branch July 12, 2024 10:59
@vgro
Copy link
Collaborator

vgro commented Jul 12, 2024

yes, go for it 👍

@vgro
Copy link
Collaborator

vgro commented Jul 12, 2024

oh you already did, perfect :-) Happy weekend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants