Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting the Validation Files #9617

Open
8 tasks
par456 opened this issue Jan 28, 2025 · 8 comments
Open
8 tasks

Splitting the Validation Files #9617

par456 opened this issue Jan 28, 2025 · 8 comments
Assignees
Labels
Build System Project Improvement An enhancement to an existing functionality or system

Comments

@par456
Copy link
Collaborator

par456 commented Jan 28, 2025

Describe the new feature

As part of the ongoing update to the build system, we are redesigning how APSIM validation files are being handled so that we can avoid the following problems:

  • Merge conflicts when multiple people are adding new data to a crop model which currently requires editing the same apsimx file
  • Merge conflicts when multiple people are adding new data to an excel file for validation
  • The ability to run experiments within a crop validation concurrently on a cloud service

In order to do this, we need to break up the current validation files for each model, which are currently just a single big apsimx file for each model, along side weather files and excel files for the observed data. A text file input will be used to help break up these big files, so that experiments can still be grouped together when required.

After splitting these up, each experiment or group of experiment, will be placed into a sub directory within the validation/model folder, with an excel sheet that only contains records for the simulations that are connected to that new apsimx file. Weather files will not be split or put into these subfolders as they are often shared between experiments.

Some issues we need to consider:

  • When working on changing a model, what file does a user work in now? Do they need to edit the base resource instead? Impacts of this.
  • How does a user get a 'big picture' view of the model and graphing variables like they do in the current large files.
  • When we do the split on a validation file, we have to be sure nobody is currently editing that file on a branch, otherwise they won't be able to merge their work back in.

@HamishBrownPFR @hol353 @hut104

To do list:

  • Create new project for this splitting tool outside the main apsim repo
  • Investigate what needs to be split between files, what top level models and replacements are currently used in validation files
  • Matching simulation names to excel rows to split up data into new sheets
  • Splitting the apsimx file
  • Text input file format for group experiments together
  • Command line interface for program
  • Unit testing to make sure that no simulations or experiments are lost during the transfer
  • Unit tests to make sure no data is lost from excel sheets during transfer
@par456 par456 added the Improvement An enhancement to an existing functionality or system label Jan 28, 2025
@par456 par456 self-assigned this Jan 28, 2025
@HamishBrownPFR
Copy link
Contributor

@par456. Power to your arm! This needs doing to tidy up the test set and allow us to extract maximum value from the testing system. It is logical to have .apsimx model config files paired with observed files at an experimental level so experiments can be used to validate multiple models and custodians of experiments can easily find their experiment set ups in the testing system.
A few immediate thoughts to consider:

  1. do we want to be using excel files for observed data. Observed data in csv files would allow for easier diffing which would be nice. CSV files are not as nice to work with in preparing or cleaning obs data but there are some nice tools (e.g https://www.ronsplace.ca/products/ronsdataedit) which look and feel like excel when you are working with gridded files but work with a .csv file in the background.
  2. How do re reaggregate test sets when we want to run a test for a specific model locally and visualize the results?
  3. If we do a variable rename in the code (which is not uncommon when refactoring models), can we have a tool that finds and replaces the corresponding variable names in Obs files. Or do we need some method for dealing with variable name synonyms in the model back end to keep things matched up.
  4. In addition to the PO stats that we use for checking model predictions, it would be nice to be able to access the test set to conduct some more sophisticated analyses to try to identify issues with the model. With this in mind, it would be good to think through how we might add indexing to the test set to achieve that. Alot of indexing will be pulled from the configuration (what models are in the set, whats the location etc) but it would also be useful to be able to append arbitrary index tags. For example have something like a manager scrip in simulations that applies index tags to the simulation. They could be simple user input tags or tags that are entered (an example of this is in the wheat simulation where we have specified growing region tags) or they could be tags that are determined by a post simulation analysis related to levels of inputs or stress for example.

@HamishBrownPFR
Copy link
Contributor

  1. If we are testing a model by configuring JSON in replacements in the GUI (current approach but will need to change), how do we impose that change onto the appropriate simulations in the test set. In the very early days of Plant2 Dean wrote and IDE which would read and write the crop.json file (it was xml in those days). It would require a build to get the changes to the model into the scope of the test simulations but then all tests were aware of the change. This was also a bit safer that replacements as it reduced the risk of having different models in replacement nodes of different simulations
  2. We need to think about the process of getting prototypes into release. Currently we have lots of test simulations in prototypes. When and how should these be considered by the test system. This could just be procedural like ensuring if we are adding a new model, the test sets that hold variable names that match that model are added to the official test on the same pull request as the model config going into release.
  3. Are our brains big enough to cope with an expanded testing system and all the collateral damage it will show up. For example if we are developing a first cut version of a crop model with limited data (worth doing as is better than no model and give a start point) we will not be anticipating great performance for that model. However, if it contains soil water data for example, that will affect the results for the soil water model test. I am certain there are plenty of cases in the test set where certain models are not performing well in certain simulations because that model was not the primary focus when setting up that test set. It will be good to show these cases up but then we need to do something about it and hope we don't get bogged down with trying to comprehend trade offs in collateral damage.

@HamishBrownPFR
Copy link
Contributor

  1. A likely result (or probably need) of a better testing system is the possibility of more sophisticated optimization approaches.
  • We will probably need to trivialize the activity of re optimizing a model so when we fix an error in one part of a model we can refit the rest of the model to (hopefully) reduce the prospect of fixing models but making the stats worse.
  • Specific models will be able to pick up more data for test sets which will allow for more power in optimization and motivate better fitting of parameters for existing models.
  • Expanding test sets will likely bring in data to test one model that was in the test set for another model whose performance was not the concern of that test set. This will make stats look worse and we will need some optimization methods to try to deal to these issues. One specific issue we are likely to hit her is uncertainty in test simulation configuration. For example if we have an experiment that was focused on understanding drainage from wheat under irrigated situations we would focus on making sure the soil was configured well for testing of the drainage model and may take an educated guess at parameters for wheat model. If that test set happens to include wheat data but it is not predicted well that would have been of less of a concern for the activity the test was developed for but would look bad for the wheat model when the test picks up those results. We would need a way to test if we allow a range of wheat parameters within reasonable bounds, would they improve the fits for wheat without degrading the fits for drainage. There are lots of parameters in simulations that we guess or have substantial uncertainty in their measurement and it would be valuable to have some tools to assess the sensitivity of model performance stats to reasonable ranges in these configuration values.

@HamishBrownPFR
Copy link
Contributor

Where does the refactoring of the scoping methodology fit in this project? It would be useful if we didn't have two ways of addressing variables (with and without square brackets around the first class) depending on if the address needs to be scoped.

@par456
Copy link
Collaborator Author

par456 commented Feb 5, 2025

Not within this project, that's part of the locator system and that's way to big a system to start modifying as part of this project.

@HamishBrownPFR
Copy link
Contributor

I am trying to build some different approaches to visualizing and testing the wheat validation and a reoccurring frustration is the way we currently handle errors in observed files.
Specifically, there is not a consistent approach to dealing with this where we can have a column with the variable name with .se or Error suffixed to it or (most commonly) no error value at all. We have no idea of the method or veracity of the calculation of the error value associated with it. If we need to convert the units of the associated variable the error value must also be found and rescaled. When dealing with data outside of the APSIM gui I use pandas datatables and commonly want to loop through all the column headers to perform an action on the associated data in the column. It is messy having error columns mixed in with means for such operations.

A better approach (At least from my opinion) would be to limit columns in observed files to observed values that match an apsim variable and enter replicate values down the column (where replicates exist). The software doing the stats (APSIM gui or custom system) could then apply what ever method is deemed most appropriate to aggregate the replicate values (mean or median etc) and apply a consistent method to quantify errors. This would also leave the list of column headers as a tidy record of what observations are present making the curation of validation sets and scripting of analysis easier.

@hol353
Copy link
Contributor

hol353 commented Feb 7, 2025

Hmm, not a bad idea @HamishBrownPFR. Perhaps create this as a separate GitHub Issue.

@HamishBrownPFR
Copy link
Contributor

Something else that I encounter that has not caused major problems but does not feel quite right is the use of Phenology.CurrentStageName == 'HarvestRipe' to identify final harvest data in observed files. We arbitrarily apply put 'HarvestRipe' in the Phenology.CurrentStageName column alongside final harvest data regardless of whether the crop was at the harvest ripe stage or not. In reality harvest may be taken earlier that harvest ripe (and samples dried) or later than harvest ripe if the crop got there before samples were taken. The potential problem with this is that we are assume everything that matches an apsim output variable in an observed file is a legitimate observation. There is a chance someone could wrongly assume the 'HarvestRipe' values were observations of the time that harvest ripe occurred on, associate this with the date of harvest and make a comparison of observed and predicted dates. It may be safer and provide a cleaner convention if we found another way to identify which observed data represents a final harvest. Perhaps we should have an event column in the obs file where we put [Wheat].Harvesting (or what ever appropriate event) to tag specific data in the observed file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build System Project Improvement An enhancement to an existing functionality or system
Projects
Status: Backlog
Development

No branches or pull requests

3 participants