Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zonal and meridional statistics preprocessor #24

Closed
9 tasks done
ledm opened this issue Jun 6, 2019 · 35 comments · Fixed by #433
Closed
9 tasks done

Zonal and meridional statistics preprocessor #24

ledm opened this issue Jun 6, 2019 · 35 comments · Fixed by #433
Assignees

Comments

@ledm
Copy link
Contributor

ledm commented Jun 6, 2019

This is an extension of Issue #1117 and PR #1123.

The following jobs need to be completed for the documentation paper (Table 1).

  • Change zonal_means preprocesor function into two functions: zonal_statistics and meridional_statistics.
  • Rename the mean_type to operator, like in the new area_statistics
  • Create associated tests
  • Add the new preprocessors to the ocean example recipe
    • Correct ocean example recipe with correct fx_files.
  • Move the get_iris_analysis_operation function in a new file, _shared.py in the preprocesor folder.
    • Is it sensible to put an artificial list to limit this function?
  • Use this new function in the volume_statistics function, and add documentation to the header of that function.
  • Add support for fx_files, and use cell volume to weight the means.
  • Write tests which include fx_files for area_statistics, volume_statistics, zonal and meridional statistics.
@ledm ledm self-assigned this Jun 6, 2019
@ledm
Copy link
Contributor Author

ledm commented Jun 6, 2019

I think that the absence of fx_files is a significant mathematical problem that we will need to address in the meridional_statistics function when using the mean operator.

In the zonal direction, the weighting is less of a problem.

@mattiarighi mattiarighi transferred this issue from ESMValGroup/ESMValTool Jun 11, 2019
ledm added a commit that referenced this issue Jun 18, 2019
…nt_zonal_statistics_1137, relating to issue #24 (core) or #1137 (tool).
@ledm
Copy link
Contributor Author

ledm commented Jun 18, 2019

One thing I'm working on here is adding fx_files to the recipe_ocean_example.yml recipe - basically making it accurate!

When ESMValTool is unable to locate fx_files, everything seems to continue as normal. @valeriupredoi, I would expect a fatal error to be given if an FX file was requested but not found. Is that not the case?

@valeriupredoi
Copy link
Contributor

if the masking is done in the preprocessor, if fx files are not found, the code defaults to applying Natural Earth masks instead

@ledm
Copy link
Contributor Author

ledm commented Jun 18, 2019

It doesn't work like this in the area_statistics & volume_statistics functions. They too try to calculate the area or volume fields, but I feel that this is very inaccurate. We'd be better off if it fails when there's no fx_files. After all, area and volume descriptions are a CMIP5/6 requirement, right?

Should the error message be added in the get_input_fx_filelist function of _data_finder.py or in the area_statistics & volume_statistics functions?

@ledm
Copy link
Contributor Author

ledm commented Jun 18, 2019

So i guess the problem I'm seeing right now is: Why can't ESMValTool find the FX files anymore?

Another point is that none of the preprocessor tests include fx files. How can we write tests that include fx files?

@ledm
Copy link
Contributor Author

ledm commented Jun 18, 2019

Help @valeriupredoi! None of my recipes are able to find the fx_files anymore on my local desktop!

Can anyone else test a recipe that requires an fx_file to be loaded? @schlunma - have you had this problem?

@valeriupredoi
Copy link
Contributor

@ledm I literally just ran a recipe with fx files grabbed from a local repo, all fine, are you using CMIP6? If so, there are missing input structures for fx files in config-developer

@ledm
Copy link
Contributor Author

ledm commented Jun 18, 2019

Which recipe?

Are you 100% sure that it did actually load and use the fx files? Several of the preprocessors have a fall-back option if they can't find the fx_files.

Also, in my case, I'm clearly seeing that ESMValTool can see the fx files, but the preprocessor can not find them!

@valeriupredoi
Copy link
Contributor

well they are needed in the diagnostic so that'd fail if there was no fx file in the metadata file. I ran the recipe_autoassess_landurface_surfrad recipe, which one are you running?

@ledm
Copy link
Contributor Author

ledm commented Jun 18, 2019

I'm running this one:

https://github.com/ESMValGroup/ESMValTool/blob/version2_development_Zonal_Meridional_statistics_24/esmvaltool/recipes/recipe_ocean_example.yml

@valeriupredoi
Copy link
Contributor

cheers, will test now with that

@valeriupredoi
Copy link
Contributor

on the branch now, it finds the fx files and it says it's using them but I am running in this before anything else is done diagnostic-wise:

ValueError: Unknown preprocessor function 'zonal_statistics', choose from: download, fix_file,...

@ledm
Copy link
Contributor Author

ledm commented Jun 19, 2019

Yes, I'm working on that preprocessor in branch development_Zonal_Meridional_statistics_24. However, you should be able to test the recipe by commenting out that diagnostic (and the other zonal & Meridional diagnostics.) Thanks!

@ledm
Copy link
Contributor Author

ledm commented Jun 19, 2019

If you want to look at the same Core branch, it's in development_Zonal_Meridional_statistics_24.

@ledm
Copy link
Contributor Author

ledm commented Jun 19, 2019

So, my conclusion to this is that there's been a change in the way that recipes work. Edit: Either that or fx_files have never worked as I thought they did!

In order to get the recipe to work, I needed to move the fx_files: [areacello,] line from the diagnostics section to the preprocessor section! This encounters the problem that the fx_files grid are not passed to the other preprocessors. Ie, if the preprocessor includes some operator like a extract_region, the fx_file is not subjected to this function. This means that the fx data does not represent the preprocessed data anymore.

@bouweandela
Copy link
Member

It doesn't work like this in the area_statistics & volume_statistics functions. They too try to calculate the area or volume fields, but I feel that this is very inaccurate. We'd be better off if it fails when there's no fx_files. After all, area and volume descriptions are a CMIP5/6 requirement, right?

What about observational datasets? The function should work for those too I think.

@bouweandela
Copy link
Member

Another point is that none of the preprocessor tests include fx files. How can we write tests that include fx files?

Load a variable of interest + fx variable from file using iris, slice both cubes so they are small (i.e no more than a couple of points) and look at the data and coordinates. You can use this to create the cubes needed for the tests. Does that help?

@bouweandela
Copy link
Member

Note that there are two ways in which the recipe uses fx_files:

  1. Specified in the variable sections, this will just find fx files and pass their paths on to your diagnostic script. This is in the process of being changed Development fx restructured #21, fx variables should be treated as any other variables. I see you are using this in your recipe.
  2. Use by preprocessor functions: there is no need to mention fx files in the recipe at all, this is automatically filled in. The code that does this is here:
    https://github.com/ESMValGroup/ESMValCore/blob/development/esmvalcore/_recipe.py#L364-L422

@ledm
Copy link
Contributor Author

ledm commented Jun 19, 2019

It doesn't work like this in the area_statistics & volume_statistics functions. They too try to calculate the area or volume fields, but I feel that this is very inaccurate. We'd be better off if it fails when there's no fx_files. After all, area and volume descriptions are a CMIP5/6 requirement, right?

What about observational datasets? The function should work for those too I think.

How should we differentiate in the preprocessor whether a preprocessor requires an fx file or not? I think it should fail for model data when no FX file is available. Especially when the model uses irregular grids. There's an argument that regular grids are easier for Iris to calculate - however Bill Little mentioned that the iris.analysis.cartography.area_weights calculator might be removed in the near future anyway!

Another point is that none of the preprocessor tests include fx files. How can we write tests that include fx files?

Load a variable of interest + fx variable from file using iris, slice both cubes so they are small (i.e no more than a couple of points) and look at the data and coordinates. You can use this to create the cubes needed for the tests. Does that help?

Sort of. Will the new sliced data and fx file need to be added to the git repository in testing data or something?

Note that there are two ways in which the recipe uses fx_files:

  1. Specified in the variable sections, this will just find fx files and pass their paths on to your diagnostic script. This is in the process of being changed Development fx restructured #21, fx variables should be treated as any other variables. I see you are using this in your recipe.
  2. Use by preprocessor functions: there is no need to mention fx files in the recipe at all, this is automatically filled in. The code that does this is here:
    https://github.com/ESMValGroup/ESMValCore/blob/development/esmvalcore/_recipe.py#L364-L422

I just spoke with V and I think it would be worth shelving this part of the work until #21 has completed.

@bouweandela
Copy link
Member

How should we differentiate in the preprocessor whether a preprocessor requires an fx file or not? I think it should fail for model data when no FX file is available.

I think this is something we could implement in _recipe.py, because we know what dataset we are building a preprocessor for there. Can you make a separate issue for it?

Will the new sliced data and fx file need to be added to the git repository in testing data or something?

I would just create the iris cube in code and save it to a temporary file if you need to read it from file for your test.

@ledm
Copy link
Contributor Author

ledm commented Jun 19, 2019

Created issue #103.

@valeriupredoi
Copy link
Contributor

@ledm your mysteriously vanishing fx files are behaving like this because you need to specify the fx_files key argument under each of those _statistics preprocesor calls in the recipe and not in the variable: the code in _recipe.py reads:

    for step in ('area_statistics', 'volume_statistics', 'zonal_statistics',
                 'meridional_statistics'):
        if settings.get(step, {}).get('fx_files'):
            settings[step]['fx_files'] = get_input_fx_filelist(
                variable=variable,
                rootpath=config_user['rootpath'],
                drs=config_user['drs'],
            )

the if settings.get(step, {}).get('fx_files') is null if fx files are not in the step's setting. Also note that whoever coded this up here (@bouweandela ?) forgot to add the needed

variable['fx_files'] = settings.get(step, {}).get('fx_files')

so that the block reads:

   for step in ('area_statistics', 'volume_statistics', 'zonal_statistics',
                 'meridional_statistics'):
        if settings.get(step, {}).get('fx_files'):
            variable['fx_files'] = settings.get(step, {}).get('fx_files')
            settings[step]['fx_files'] = get_input_fx_filelist(
                variable=variable,
                rootpath=config_user['rootpath'],
                drs=config_user['drs'],
            )

with that in all you fx ducks are in line (if the files actually exist on ESGF or your local DB).

There is however, some bad coding in the stats functions themselves because I get this (after some running):

ValueError: Fx area (216, 360) and dataset (48, 196, 111) shapes do not match.

which suggests to me that you are applying a 2d mask on 3d data 🍺

@ledm
Copy link
Contributor Author

ledm commented Jun 19, 2019

Thanks V!

Did we always have to specify the fx_file in the preprocessor - or is that a new thing? So far, I've always been specifying it in the diagnostics. Perhaps my recipes have been wrong all along (There was no documentation or examples at the time so I refuse to feel too guilty about that!)

I can't tell for sure whats happening with your shapes not matching - as I can't tell which preprocessor/diagnostic you're running. However, this may be a problem if the grid fx does not receive the same preprocessing as the dataset. For instance, if the extract_region preprocessor reduces the grid from (216, 360) to (196, 111), then we can no longer use the same areacello to match it!

Perhaps the solution is for extract_region to apply a mask instead of change the shape of the dataset?

@valeriupredoi
Copy link
Contributor

I dunno, may be a mistake in my recipe, I added fx_files: [areacello, ] everywhere under each of the _statistic functions for testing, maybe I should have been more careful 😁

No, there was no documentation and in fact it is the first time I see myself the addition of fx_files: in the preprocessor, quite a mystery on our hands 🗺️

@valeriupredoi
Copy link
Contributor

well actually here's the bugger:

    custom_order: true
    extract_levels:
      levels: [0., 10., 100., 1000.]
      scheme: linear_horizontal_extrapolate_vertical
    extract_region:
      start_longitude: -80.
      end_longitude: 30.
      start_latitude: -80.
      end_latitude: 80.
    area_statistics:
      operator: mean
      fx_files: [areacello,

you are applying a (216, 360) mask on a (48, 10, 96, 21) cube and the catcher in the area_statistic is

    if grid_areas.shape != cube.shape[-2:]:
        raise ValueError('Fx area {} and dataset {} shapes do not match.'
                         ''.format(grid_areas.shape, cube.shape))

so yeah, you need to extract the area and levels the same way for the fx mask as for the data

@ledm
Copy link
Contributor Author

ledm commented Jun 20, 2019

Exactly, that's what I've been saying!

The preprocessor needs to be applied to the fx files as well as the actual dataset. This doesn't happen if the fx_files in the preprocessor section of the recipe. If they're given at the diagnostic section of the recipe, the area_statistics preprocessor doesn't receive them!

@bouweandela
Copy link
Member

Also note that whoever coded this up here (@bouweandela ?) forgot to add the needed

Use git blame if you want to find out who wrote something, e.g.

$ git blame esmvalcore/_recipe.py | grep -A6 area_statistics
18f5d3882 esmvaltool/_recipe.py   (Manuel Schlund       2019-06-05 15:00:01 +0200  416)     for step in ('area_statistics', 'volume_statistics'):
baa4fb2f4 esmvaltool/_recipe.py   (Manuel Schlund       2019-02-08 12:23:00 +0100  417)         if settings.get(step, {}).get('fx_files'):
8267a24c3 esmvaltool/_recipe.py   (Lee de Mora          2019-01-29 11:07:32 +0000  418)             settings[step]['fx_files'] = get_input_fx_filelist(
52f0a8d74 esmvaltool/_recipe.py   (Manuel Schlund       2019-02-08 12:20:16 +0100  419)                 variable=variable,
61e642c36 esmvaltool/_recipe.py   (Lee de Mora          2019-01-23 14:31:21 +0000  420)                 rootpath=config_user['rootpath'],
8267a24c3 esmvaltool/_recipe.py   (Lee de Mora          2019-01-29 11:07:32 +0000  421)                 drs=config_user['drs'],
8267a24c3 esmvaltool/_recipe.py   (Lee de Mora          2019-01-29 11:07:32 +0000  422)             )

@valeriupredoi
Copy link
Contributor

ah I shed a tear everytime I see git blame 😁

@ledm
Copy link
Contributor Author

ledm commented Jul 24, 2019

I'm back to work on this issue, starting from @valeriupredoi's new PR #170.

I'm slowing coming to the conclusion that we can not sensibly produce zonal or meridional statistics for irregular grids.

This is because iris can't apply iris.analysis functions along one dimension of an irregular grid. It simply does not work at the moment. The iris.analysis functions can only be applied along the x-y axis of an array, not along an arbitrary latitude-like axis of an irregular grid. (I believe that this is a fairly permanent problem and this functionality won't be ready until Iris 3 is ready in 2023 or similar!)

This means that if we want to calculate zonal or meridional statistics for irregular grids, we need to regrid the data to a regular grid, then apply the zonal or meridional statistics to that regular grid. This requires ESMValTool to regrid the fx_files (areacello or volcello), or recalculate them from the new grid. Neither option is particularly precise. Furthermore, the regrid function does not currently apply to the fx_files at the moment (see this comment. )

So, I propose that we do not support zonal or meridional statistics for irregular grids. Any thoughts?

@valeriupredoi
Copy link
Contributor

fix in #214 now 🍺

@valeriupredoi
Copy link
Contributor

Nope! these changes are obsolete, all is done up in development already! Yay 🎉

@ledm
Copy link
Contributor Author

ledm commented Jan 16, 2020

I'm not sure that this issue was ever resolved. As @mattiarighi just pointed out to me in an email, the zonal_statisctics and meridional_statistics preprocessors are included in the ESMValTool v2 paper, so we need to get them in.

@ledm ledm reopened this Jan 16, 2020
@mattiarighi
Copy link
Contributor

Apparently it's just about renaming zonal_means to zonal_statistics.
I'm doing this. Will submit in a second.

@ledm
Copy link
Contributor Author

ledm commented Jan 16, 2020

To summarise where this issue got to, we encountered 3 main problems with zonal and meridional statistics:

  1. Other parts of the preprocessor chain are not applied to the fx files. This means that zonal and meridional statistics can only be applied on the global scale - not regional. Also discsussed in Fx files in Omon or Ofx #405.
  2. We were unable to convince iris to apply a mean along the zonal or meridional direction on irregular grids. It only worked with regular grids.
  3. We were unable to satisfactorily determine what to do when an FX file was needed but not found. (I think @schlunma recent work might be able to address this Allowed arbitrary CMIP6 fx variables in special preprocessors and added a catch on project not found in config-developer (invalid project) #432.)

Looks like there's been progres with issues 1 and 3 here, but what about issue 2?

@mattiarighi
Copy link
Contributor

zonal_means seems to be used successfully in two recipes: recipe_collins13ipcc.yml and recipe_flato13ipcc. So, the functionality is there, it is just a matter of naming.

For consistency with the other _statistics, we should rename it to zonal_statistics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants