Zonal and meridional statistics preprocessor #24

ledm · 2019-06-06T07:59:57Z

This is an extension of Issue #1117 and PR #1123.

The following jobs need to be completed for the documentation paper (Table 1).

Change zonal_means preprocesor function into two functions: zonal_statistics and meridional_statistics.
Rename the mean_type to operator, like in the new area_statistics
Create associated tests
Add the new preprocessors to the ocean example recipe
- Correct ocean example recipe with correct fx_files.
Move the get_iris_analysis_operation function in a new file, _shared.py in the preprocesor folder.
- Is it sensible to put an artificial list to limit this function?
Use this new function in the volume_statistics function, and add documentation to the header of that function.
Add support for fx_files, and use cell volume to weight the means.
Write tests which include fx_files for area_statistics, volume_statistics, zonal and meridional statistics.

The text was updated successfully, but these errors were encountered:

ledm · 2019-06-06T09:41:34Z

I think that the absence of fx_files is a significant mathematical problem that we will need to address in the meridional_statistics function when using the mean operator.

In the zonal direction, the weighting is less of a problem.

…nt_zonal_statistics_1137, relating to issue #24 (core) or #1137 (tool).

ledm · 2019-06-18T13:40:18Z

One thing I'm working on here is adding fx_files to the recipe_ocean_example.yml recipe - basically making it accurate!

When ESMValTool is unable to locate fx_files, everything seems to continue as normal. @valeriupredoi, I would expect a fatal error to be given if an FX file was requested but not found. Is that not the case?

valeriupredoi · 2019-06-18T13:42:14Z

if the masking is done in the preprocessor, if fx files are not found, the code defaults to applying Natural Earth masks instead

ledm · 2019-06-18T13:51:35Z

It doesn't work like this in the area_statistics & volume_statistics functions. They too try to calculate the area or volume fields, but I feel that this is very inaccurate. We'd be better off if it fails when there's no fx_files. After all, area and volume descriptions are a CMIP5/6 requirement, right?

Should the error message be added in the get_input_fx_filelist function of _data_finder.py or in the area_statistics & volume_statistics functions?

ledm · 2019-06-18T14:33:25Z

So i guess the problem I'm seeing right now is: Why can't ESMValTool find the FX files anymore?

Another point is that none of the preprocessor tests include fx files. How can we write tests that include fx files?

ledm · 2019-06-18T15:23:07Z

Help @valeriupredoi! None of my recipes are able to find the fx_files anymore on my local desktop!

Can anyone else test a recipe that requires an fx_file to be loaded? @schlunma - have you had this problem?

valeriupredoi · 2019-06-18T15:31:46Z

@ledm I literally just ran a recipe with fx files grabbed from a local repo, all fine, are you using CMIP6? If so, there are missing input structures for fx files in config-developer

ledm · 2019-06-18T15:39:08Z

Which recipe?

Are you 100% sure that it did actually load and use the fx files? Several of the preprocessors have a fall-back option if they can't find the fx_files.

Also, in my case, I'm clearly seeing that ESMValTool can see the fx files, but the preprocessor can not find them!

valeriupredoi · 2019-06-18T15:48:45Z

well they are needed in the diagnostic so that'd fail if there was no fx file in the metadata file. I ran the recipe_autoassess_landurface_surfrad recipe, which one are you running?

ledm · 2019-06-18T15:52:11Z

I'm running this one:

https://github.com/ESMValGroup/ESMValTool/blob/version2_development_Zonal_Meridional_statistics_24/esmvaltool/recipes/recipe_ocean_example.yml

valeriupredoi · 2019-06-18T15:53:29Z

cheers, will test now with that

valeriupredoi · 2019-06-18T16:25:36Z

on the branch now, it finds the fx files and it says it's using them but I am running in this before anything else is done diagnostic-wise:

ValueError: Unknown preprocessor function 'zonal_statistics', choose from: download, fix_file,...

ledm · 2019-06-19T08:26:45Z

Yes, I'm working on that preprocessor in branch development_Zonal_Meridional_statistics_24. However, you should be able to test the recipe by commenting out that diagnostic (and the other zonal & Meridional diagnostics.) Thanks!

ledm · 2019-06-19T08:46:16Z

If you want to look at the same Core branch, it's in development_Zonal_Meridional_statistics_24.

ledm · 2019-06-19T09:17:06Z

So, my conclusion to this is that there's been a change in the way that recipes work. Edit: Either that or fx_files have never worked as I thought they did!

In order to get the recipe to work, I needed to move the fx_files: [areacello,] line from the diagnostics section to the preprocessor section! This encounters the problem that the fx_files grid are not passed to the other preprocessors. Ie, if the preprocessor includes some operator like a extract_region, the fx_file is not subjected to this function. This means that the fx data does not represent the preprocessed data anymore.

bouweandela · 2019-06-19T12:43:38Z

It doesn't work like this in the area_statistics & volume_statistics functions. They too try to calculate the area or volume fields, but I feel that this is very inaccurate. We'd be better off if it fails when there's no fx_files. After all, area and volume descriptions are a CMIP5/6 requirement, right?

What about observational datasets? The function should work for those too I think.

bouweandela · 2019-06-19T12:46:13Z

Another point is that none of the preprocessor tests include fx files. How can we write tests that include fx files?

Load a variable of interest + fx variable from file using iris, slice both cubes so they are small (i.e no more than a couple of points) and look at the data and coordinates. You can use this to create the cubes needed for the tests. Does that help?

bouweandela · 2019-06-19T12:55:31Z

Note that there are two ways in which the recipe uses fx_files:

Specified in the variable sections, this will just find fx files and pass their paths on to your diagnostic script. This is in the process of being changed Development fx restructured #21, fx variables should be treated as any other variables. I see you are using this in your recipe.
Use by preprocessor functions: there is no need to mention fx files in the recipe at all, this is automatically filled in. The code that does this is here:
https://github.com/ESMValGroup/ESMValCore/blob/development/esmvalcore/_recipe.py#L364-L422

ledm · 2019-06-19T13:23:51Z

It doesn't work like this in the area_statistics & volume_statistics functions. They too try to calculate the area or volume fields, but I feel that this is very inaccurate. We'd be better off if it fails when there's no fx_files. After all, area and volume descriptions are a CMIP5/6 requirement, right?

What about observational datasets? The function should work for those too I think.

How should we differentiate in the preprocessor whether a preprocessor requires an fx file or not? I think it should fail for model data when no FX file is available. Especially when the model uses irregular grids. There's an argument that regular grids are easier for Iris to calculate - however Bill Little mentioned that the iris.analysis.cartography.area_weights calculator might be removed in the near future anyway!

Another point is that none of the preprocessor tests include fx files. How can we write tests that include fx files?

Load a variable of interest + fx variable from file using iris, slice both cubes so they are small (i.e no more than a couple of points) and look at the data and coordinates. You can use this to create the cubes needed for the tests. Does that help?

Sort of. Will the new sliced data and fx file need to be added to the git repository in testing data or something?

Note that there are two ways in which the recipe uses fx_files:

Specified in the variable sections, this will just find fx files and pass their paths on to your diagnostic script. This is in the process of being changed Development fx restructured #21, fx variables should be treated as any other variables. I see you are using this in your recipe.

Use by preprocessor functions: there is no need to mention fx files in the recipe at all, this is automatically filled in. The code that does this is here:
https://github.com/ESMValGroup/ESMValCore/blob/development/esmvalcore/_recipe.py#L364-L422

I just spoke with V and I think it would be worth shelving this part of the work until #21 has completed.

bouweandela · 2019-06-19T13:33:11Z

How should we differentiate in the preprocessor whether a preprocessor requires an fx file or not? I think it should fail for model data when no FX file is available.

I think this is something we could implement in _recipe.py, because we know what dataset we are building a preprocessor for there. Can you make a separate issue for it?

Will the new sliced data and fx file need to be added to the git repository in testing data or something?

I would just create the iris cube in code and save it to a temporary file if you need to read it from file for your test.

ledm · 2019-06-19T14:21:16Z

Created issue #103.

valeriupredoi · 2019-06-19T15:37:21Z

@ledm your mysteriously vanishing fx files are behaving like this because you need to specify the fx_files key argument under each of those _statistics preprocesor calls in the recipe and not in the variable: the code in _recipe.py reads:

    for step in ('area_statistics', 'volume_statistics', 'zonal_statistics',
                 'meridional_statistics'):
        if settings.get(step, {}).get('fx_files'):
            settings[step]['fx_files'] = get_input_fx_filelist(
                variable=variable,
                rootpath=config_user['rootpath'],
                drs=config_user['drs'],
            )

the if settings.get(step, {}).get('fx_files') is null if fx files are not in the step's setting. Also note that whoever coded this up here (@bouweandela ?) forgot to add the needed

variable['fx_files'] = settings.get(step, {}).get('fx_files')

so that the block reads:

   for step in ('area_statistics', 'volume_statistics', 'zonal_statistics',
                 'meridional_statistics'):
        if settings.get(step, {}).get('fx_files'):
            variable['fx_files'] = settings.get(step, {}).get('fx_files')
            settings[step]['fx_files'] = get_input_fx_filelist(
                variable=variable,
                rootpath=config_user['rootpath'],
                drs=config_user['drs'],
            )

with that in all you fx ducks are in line (if the files actually exist on ESGF or your local DB).

There is however, some bad coding in the stats functions themselves because I get this (after some running):

ValueError: Fx area (216, 360) and dataset (48, 196, 111) shapes do not match.

which suggests to me that you are applying a 2d mask on 3d data 🍺

ledm · 2019-06-19T15:55:56Z

Thanks V!

Did we always have to specify the fx_file in the preprocessor - or is that a new thing? So far, I've always been specifying it in the diagnostics. Perhaps my recipes have been wrong all along (There was no documentation or examples at the time so I refuse to feel too guilty about that!)

I can't tell for sure whats happening with your shapes not matching - as I can't tell which preprocessor/diagnostic you're running. However, this may be a problem if the grid fx does not receive the same preprocessing as the dataset. For instance, if the extract_region preprocessor reduces the grid from (216, 360) to (196, 111), then we can no longer use the same areacello to match it!

Perhaps the solution is for extract_region to apply a mask instead of change the shape of the dataset?

valeriupredoi · 2019-06-19T15:59:08Z

I dunno, may be a mistake in my recipe, I added fx_files: [areacello, ] everywhere under each of the _statistic functions for testing, maybe I should have been more careful 😁

No, there was no documentation and in fact it is the first time I see myself the addition of fx_files: in the preprocessor, quite a mystery on our hands 🗺️

valeriupredoi · 2019-06-19T16:26:53Z

well actually here's the bugger:

    custom_order: true
    extract_levels:
      levels: [0., 10., 100., 1000.]
      scheme: linear_horizontal_extrapolate_vertical
    extract_region:
      start_longitude: -80.
      end_longitude: 30.
      start_latitude: -80.
      end_latitude: 80.
    area_statistics:
      operator: mean
      fx_files: [areacello,

you are applying a (216, 360) mask on a (48, 10, 96, 21) cube and the catcher in the area_statistic is

    if grid_areas.shape != cube.shape[-2:]:
        raise ValueError('Fx area {} and dataset {} shapes do not match.'
                         ''.format(grid_areas.shape, cube.shape))

so yeah, you need to extract the area and levels the same way for the fx mask as for the data

ledm · 2019-06-20T08:03:20Z

Exactly, that's what I've been saying!

The preprocessor needs to be applied to the fx files as well as the actual dataset. This doesn't happen if the fx_files in the preprocessor section of the recipe. If they're given at the diagnostic section of the recipe, the area_statistics preprocessor doesn't receive them!

bouweandela · 2019-06-26T12:59:29Z

Also note that whoever coded this up here (@bouweandela ?) forgot to add the needed

Use git blame if you want to find out who wrote something, e.g.

$ git blame esmvalcore/_recipe.py | grep -A6 area_statistics
18f5d3882 esmvaltool/_recipe.py   (Manuel Schlund       2019-06-05 15:00:01 +0200  416)     for step in ('area_statistics', 'volume_statistics'):
baa4fb2f4 esmvaltool/_recipe.py   (Manuel Schlund       2019-02-08 12:23:00 +0100  417)         if settings.get(step, {}).get('fx_files'):
8267a24c3 esmvaltool/_recipe.py   (Lee de Mora          2019-01-29 11:07:32 +0000  418)             settings[step]['fx_files'] = get_input_fx_filelist(
52f0a8d74 esmvaltool/_recipe.py   (Manuel Schlund       2019-02-08 12:20:16 +0100  419)                 variable=variable,
61e642c36 esmvaltool/_recipe.py   (Lee de Mora          2019-01-23 14:31:21 +0000  420)                 rootpath=config_user['rootpath'],
8267a24c3 esmvaltool/_recipe.py   (Lee de Mora          2019-01-29 11:07:32 +0000  421)                 drs=config_user['drs'],
8267a24c3 esmvaltool/_recipe.py   (Lee de Mora          2019-01-29 11:07:32 +0000  422)             )

valeriupredoi · 2019-06-26T14:02:12Z

ah I shed a tear everytime I see git blame 😁

ledm · 2019-07-24T14:58:46Z

I'm back to work on this issue, starting from @valeriupredoi's new PR #170.

I'm slowing coming to the conclusion that we can not sensibly produce zonal or meridional statistics for irregular grids.

This is because iris can't apply iris.analysis functions along one dimension of an irregular grid. It simply does not work at the moment. The iris.analysis functions can only be applied along the x-y axis of an array, not along an arbitrary latitude-like axis of an irregular grid. (I believe that this is a fairly permanent problem and this functionality won't be ready until Iris 3 is ready in 2023 or similar!)

This means that if we want to calculate zonal or meridional statistics for irregular grids, we need to regrid the data to a regular grid, then apply the zonal or meridional statistics to that regular grid. This requires ESMValTool to regrid the fx_files (areacello or volcello), or recalculate them from the new grid. Neither option is particularly precise. Furthermore, the regrid function does not currently apply to the fx_files at the moment (see this comment. )

So, I propose that we do not support zonal or meridional statistics for irregular grids. Any thoughts?

valeriupredoi · 2019-09-02T14:47:04Z

fix in #214 now 🍺

valeriupredoi · 2019-11-07T14:17:28Z

Nope! these changes are obsolete, all is done up in development already! Yay 🎉

ledm · 2020-01-16T16:18:02Z

I'm not sure that this issue was ever resolved. As @mattiarighi just pointed out to me in an email, the zonal_statisctics and meridional_statistics preprocessors are included in the ESMValTool v2 paper, so we need to get them in.

mattiarighi · 2020-01-16T16:19:04Z

Apparently it's just about renaming zonal_means to zonal_statistics.
I'm doing this. Will submit in a second.

ledm · 2020-01-16T16:26:57Z

To summarise where this issue got to, we encountered 3 main problems with zonal and meridional statistics:

Other parts of the preprocessor chain are not applied to the fx files. This means that zonal and meridional statistics can only be applied on the global scale - not regional. Also discsussed in Fx files in Omon or Ofx #405.
We were unable to convince iris to apply a mean along the zonal or meridional direction on irregular grids. It only worked with regular grids.
We were unable to satisfactorily determine what to do when an FX file was needed but not found. (I think @schlunma recent work might be able to address this Allowed arbitrary CMIP6 fx variables in special preprocessors and added a catch on project not found in config-developer (invalid project) #432.)

Looks like there's been progres with issues 1 and 3 here, but what about issue 2?

mattiarighi · 2020-01-16T16:34:48Z

zonal_means seems to be used successfully in two recipes: recipe_collins13ipcc.yml and recipe_flato13ipcc. So, the functionality is there, it is just a matter of naming.

For consistency with the other _statistics, we should rename it to zonal_statistics.

ledm self-assigned this Jun 6, 2019

mattiarighi transferred this issue from ESMValGroup/ESMValTool Jun 11, 2019

mattiarighi added the paper label Jun 11, 2019

mattiarighi mentioned this issue Jun 11, 2019

Preprocessor naming nomenclature ESMValGroup/ESMValTool#847

Closed

ledm mentioned this issue Jun 14, 2019

Refactor time operations #87

Merged

ledm added a commit that referenced this issue Jun 18, 2019

Brought over developpements from esmvaltool branch version2_developme…

b5ba937

…nt_zonal_statistics_1137, relating to issue #24 (core) or #1137 (tool).

valeriupredoi mentioned this issue Jun 19, 2019

What happens when fx_files are needed but not found? #103

Closed

valeriupredoi closed this as completed Nov 7, 2019

ledm reopened this Jan 16, 2020

mattiarighi mentioned this issue Jan 16, 2020

[URGENT] Rename zonal_means to zonal_statistics #433

Merged

mattiarighi closed this as completed in #433 Jan 17, 2020

Zonal and meridional statistics preprocessor #24

Zonal and meridional statistics preprocessor #24

Comments

ledm commented Jun 6, 2019 • edited by valeriupredoi Loading

ledm commented Jun 6, 2019

ledm commented Jun 18, 2019

valeriupredoi commented Jun 18, 2019

ledm commented Jun 18, 2019

ledm commented Jun 18, 2019

ledm commented Jun 18, 2019

valeriupredoi commented Jun 18, 2019

ledm commented Jun 18, 2019

valeriupredoi commented Jun 18, 2019

ledm commented Jun 18, 2019

valeriupredoi commented Jun 18, 2019

valeriupredoi commented Jun 18, 2019

ledm commented Jun 19, 2019

ledm commented Jun 19, 2019

ledm commented Jun 19, 2019 • edited Loading

bouweandela commented Jun 19, 2019

bouweandela commented Jun 19, 2019

bouweandela commented Jun 19, 2019

ledm commented Jun 19, 2019

bouweandela commented Jun 19, 2019

ledm commented Jun 19, 2019 • edited Loading

valeriupredoi commented Jun 19, 2019

ledm commented Jun 19, 2019

valeriupredoi commented Jun 19, 2019

valeriupredoi commented Jun 19, 2019

ledm commented Jun 20, 2019

bouweandela commented Jun 26, 2019

valeriupredoi commented Jun 26, 2019

ledm commented Jul 24, 2019

valeriupredoi commented Sep 2, 2019

valeriupredoi commented Nov 7, 2019

ledm commented Jan 16, 2020

mattiarighi commented Jan 16, 2020

ledm commented Jan 16, 2020

mattiarighi commented Jan 16, 2020

ledm commented Jun 6, 2019 •

edited by valeriupredoi

Loading

ledm commented Jun 19, 2019 •

edited

Loading

ledm commented Jun 19, 2019 •

edited

Loading