-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
History fields: Fix inconsistent gridcell averages and provide metadata for correct regional / global averages #1297
Comments
From discussion at last week's CLM-science meeting (see https://github.com/ESCOMP/CTSM/wiki/Meeting-Notes-2021-Science#april-8 and the presentation linked from there):
|
I have opened a project to track individual tasks within this larger project: https://github.com/ESCOMP/CTSM/projects/35 . General discussion about this project can stay in this issue. |
It might make sense to make l2g_scale_type (intended to be renamed to landunit_mask) a required argument when defining a history field. For fields that have valid values for all lanudnits, we could have an 'all_landunits' value. Even though it will be a bit tedious to add this argument to the 1000+ calls, this might help with consistency moving forward, since it will force anyone adding a new field to consider what the averaging should be. (However, this may be unnecessary if we find that most fields do have valid values over all landunits.) |
I have added individual issues for the first few steps of this project. Those should get us to the point where we can start playing with how we want to write scripts for taking global and regional averages. |
I like making landunit_mask a required argument, since you should think about it each time you add a hist_addfld call. The default could be all, but unless you make it a required argument it doesn't force the developer to think about it. I think we likely will need to go through a few iterations of this to get to the final state that we want it in. So the first step might be to make "all" the default, but in a later step we require it of all calls, which then requires changing the thousand calls. |
I have now added issues for all of the steps I can think of associated with this project. |
Late in the game comment here ... Have you considered using the CF convention attribute In the hypothetical example of
could you average only over the vegetated area, and provide a There is a admittedly a chicken and egg problem of using this attribute in output and updating analysis workflows to take this into account. |
Thanks a lot for your thoughts @klindsay28 . I'm going to copy your comment to #1345 because I think the best thing to do is for the people writing post-processing scripts to decide what will work best for them, and then we can implement that in the model. |
Brief summary of bug
I am folding two closely related issues into this issue: (1) We are inconsistent in how we do gridcell averaging of history fields, and this is likely done incorrectly for at least some history fields; (2) We don't provide metadata on history files that would be needed for correct averaging up to regional or global averages.
@danicalombardozzi and I plan to lead a discussion on this at an upcoming CLM-science meeting (so no need to comment in this issue, unless you especially want to: we can discuss it by zoom, which may be more efficient).
General bug information
CTSM version you are using: N/A
Does this bug cause significantly incorrect results in the model's science? Probably, for some diagnostic fields
Configurations affected: Not yet investigated, but probably all, to some extent
Details of bug
Inconsistent (and probably in some cases incorrect) gridcell averages
Consider a grid cell that is 1/2 natural veg and 1/2 lake, and a diagnostic field that is only calculated over vegetated columns. If the value of this variable is 10 in the vegetated column, what should its grid cell average be: 10? 5?
The answer to this depends on the variable. The first consideration is often: Is this (a) a variable expressed on a per-m^2 basis, or (b) a variable not expressed per-m^2 (e.g., temperature, a stress coefficient, etc.). For (b), we generally would want to exclude lakes from the averages, so the grid cell average in this example would be 10. For (a), it is sometimes / often appropriate to assume a 0 value over some land units – so in this example, we may end up with a grid cell average value of 5 (because now this value applies per m^2 grid cell area) – but this may not always be the case.
This is implemented in different ways for different variables. I can think of four methods in the code:
l2g_scale_type
: this is an optional argument specified for some history fields in the hist_addfld call. For example, settingl2g_scale_type='veg'
means to only take averages over vegetated landunits (natural veg & crop).set_*=spval
andset_*=0
in the hist_addfld call: this method predates the implementation ofl2g_scale_type
.set_*=spval
accomplishes something likel2g_scale_type
, whereasset_*=0
accomplishes something like initializing the value to 0 for certain landunits in initCold. (See also History fields incorrect when set_xxx=0 but the xxx landunit is also set in initCold #431 )This inconsistency is itself a problem, making it hard to understand the treatment of any given history variable as well as whether two variables are treated the same way or different ways. In addition, it is almost certainly the case that some history variables are treated differently from how they should be.
Lack of metadata needed for correct regional and global averages
There is another, related problem with our diagnostics: We do not have the metadata on the history files that would be needed to calculate correct regional and global averages. This issue is especially pronounced for fields that only apply over special landunits, which typically have a small and very variable area per grid cell. Consider two grid cells, one with 10% lake and one with 50% lake, and a history field that only applies over lakes. When you take the average of those two grid cells, you would like the second grid cell to contribute 5x as much to the weighted average as the first grid cell, but the history files do not provide the metadata that would be needed for you to know how to do this.
Part of the problem with adding the correct metadata is the inconsistency described above for how we set missing values for particular landunits. If we can migrate to a consistent solution for that, then we can add an extra bit of metadata for each history field pointing to the weighting map that should be used for that field.
My preferred solution
My preference would be to migrate all history fields to using the
l2g_scale_type
mechanism. This would mean dropping support for theset_*
arguments tohist_addfld
, removing the initial setting of all history fields tospval
, and removing any ad hoc settings of history fields to spval, if any currently exist. Removing the initial setting of history fields tospval
would be nice for a few other reasons: It has caused some subtle issues in the past, and it stops us from being able to check for NaNs in the code very effectively when running in debug mode.If we can completely remove this use of
spval
in the code, we may be able to remove the checks for whether the field is spval in the subgridAve code.Then we can add fields to the history file that give weighting maps corresponding to each possible
l2g_scale_type
variable. Note that these are potentially dynamic in time, due to dynamic landunits. Alternatively, we could have some standard way of denoting the applicable landunits in a metadata field and let the user construct their own weighting maps using the existingPCT_LANDUNIT
metadata. (The latter would be more work for the user, but would avoid adding bloat to the history files.)Digging back through some notes, I see that I discussed this pretty extensively with @dlawrenncar @olyson @ekluzek and @slevisconsulting in Fall, 2011. Apparently I even started working on this at that time, with a very work-in-progress branch in svn: https://svn-ccsm-models.cgd.ucar.edu/clm2/branches/percent_na_metadata . But then that work got abandoned due to higher priority issues.
The text was updated successfully, but these errors were encountered: