Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move multimodel statistics preprocessor to after spatial/temporal statistics in default preprocessor order #1221

Closed
bouweandela opened this issue Jul 9, 2021 · 3 comments · Fixed by #1299
Milestone

Comments

@bouweandela
Copy link
Member

bouweandela commented Jul 9, 2021

At the moment, the multi_model_statistics preprocessor function is by default executed before any temporal/spatial statistics preprocessor functions. This is computationally inefficient and possibly less accurate, because it introduces the need for extra regridding/vertical interpolation before the multimodel statistics can be computed. Therefore I propose to move multi_model_statistics all the way to the end of the default order, after convert_units. Does anyone have an opinion on this @ESMValGroup/esmvaltool-developmentteam?

To explain the above a bit more clearly, here are some examples of preprocessing chains that users could build:

Current implementation:

2D input -> regrid -> multimodel stats -> area stats
3D input -> extract levels -> regrid -> multimodel stats -> volume stats
daily input -> multimodel stats (fails because of 360 day + 365 day calendars) -> monthly statistics

Whereas if we would move multimodel stats to the end, the above examples would simplify to

2D input -> area stats -> multimodel stats
3D input -> volume stats -> multimodel stats
daily input -> monthly statistics -> multimodel stats (works because all calendars have 12 months)

so we would lose the regrid and extract levels functions that introduce inaccuracies.

In all recipes currently in the esmvaltool, the custom_order option is used to change the order to the new default order proposed above. This are recipes:

  • recipe_kcs.yml
  • recipe_ocean_Landschuetzer2016.yml
  • recipe_ocean_example.yml
  • recipe_ocean_bgc.yml

The only exception to this is recipe_carvalhais14nat.yml, with

preproc_meanRegrid:
  custom_order: true
  regrid:
    target_grid: 0.5x0.5
    scheme: nearest
  mask_landsea:
    mask_out: sea
  multi_model_statistics:
    span: overlap
    statistics: [median]
  climate_statistics:
    operator: mean
    period: full

but that already uses custom_order to move regrid before mask_landsea, so it would not be affected by this change.

For reference: the current default preprocessor order:

# Time extraction (as defined in the preprocessor section)
'extract_time',
'extract_season',
'extract_month',
'resample_hours',
'resample_time',
# Level extraction
'extract_levels',
# Weighting
'weighting_landsea_fraction',
# Mask landsea (fx or Natural Earth)
'mask_landsea',
# Natural Earth only
'mask_glaciated',
# Mask landseaice, sftgif only
'mask_landseaice',
# Regridding
'regrid',
# Point interpolation
'extract_point',
# Masking missing values
'mask_multimodel',
'mask_fillvalues',
'mask_above_threshold',
'mask_below_threshold',
'mask_inside_range',
'mask_outside_range',
# Other
'clip',
# Region selection
'extract_region',
'extract_shape',
'extract_volume',
'extract_trajectory',
'extract_transect',
# 'average_zone': average_zone,
# 'cross_section': cross_section,
'detrend',
'multi_model_statistics',
# Grid-point operations
'extract_named_regions',
'depth_integration',
'area_statistics',
'volume_statistics',
# Time operations
# 'annual_cycle': annual_cycle,
# 'diurnal_cycle': diurnal_cycle,
'amplitude',
'zonal_statistics',
'meridional_statistics',
'hourly_statistics',
'daily_statistics',
'monthly_statistics',
'seasonal_statistics',
'annual_statistics',
'decadal_statistics',
'climate_statistics',
'anomalies',
'regrid_time',
'timeseries_filter',
'linear_trend',
'linear_trend_stderr',
'convert_units',

@bouweandela
Copy link
Member Author

Previous discussion on this topic took place in #1213

@jvegreg
Copy link
Contributor

jvegreg commented Jul 9, 2021

A general consensus at BSC is that you should do as much as you can in the native model grid, so this new default order is great for us.

@zklaus
Copy link

zklaus commented Jul 12, 2021

@bouweandela, would you be ok with targetting 2.4.0 for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants