Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fx files in Omon or Ofx #405

Closed
ledm opened this issue Nov 18, 2019 · 16 comments
Closed

Fx files in Omon or Ofx #405

ledm opened this issue Nov 18, 2019 · 16 comments
Assignees
Labels
cmor Related to the CMOR standard

Comments

@ledm
Copy link
Contributor

ledm commented Nov 18, 2019

I'm finding some strange behaviour with the ocean grid cell volume volcello files. There are two issues.

  • Firstly, I don't think that iris is able to load any of these files, (using iris.load_cube(fn) ). I have no idea why though. They all open fine with ncdump ncview, & netCDF4.
    This is sorted now with iris=2.3.0

  • Secondly, most the models have put the volcello files in the Ofx MIP, but some have put them in the Omon MIP. At first, I assumed that the Omon ones were incorrect. However, some of the models may produce grid data which changes with time resolution, so that makes sense for these to exist. Is there any way to change my recipe such that the finder looks in both Ofx and the correct time resolution mip (Omon in my case)?

@valeriupredoi says:
There are two sides of this problem:

  • the data finding side
  • running a set of preprocessing steps on a coupled set of variables e.g. (tas, volcello) when a volume_statistic using volcello needs to be applied on tas
@ledm
Copy link
Contributor Author

ledm commented Nov 18, 2019

The first issue is linked with @valeriupredoi's iris issue: SciTools/iris#3544

@ledm
Copy link
Contributor Author

ledm commented Nov 20, 2019

From @bjlittle in the iris github 3544 issue SciTools/iris#3544:

Just checked and this issue has been already been fixed in upstream/master of iris by #3386. This will be part of iris for the forthcoming releases, in both versions 2.3.0 (Python2 and Python3) and 3.0.0 (Python3 only).

If you have an environment with all the installed dependencies of iris, then you can just use the master of iris instead and it'll all work wink

@valeriupredoi
Copy link
Contributor

OK so this is quite a peculiar but alas very useful issue @ledm

  • the first part of the issue is solved by moving to iris=3
  • the second part of the issue is more touchy in that I am not sure if they will actually place the Ofx data in ESGF; eeven so, as we spoke on the phone, you pointed to me that the actual time-dependent data is needed and not the time-invariant (time-averaged). The way you used with the optional argument to the derive script is nifty but it is, at the end of the day, a bug in ESMValTool - we haven't actually designed that optional extra arg to work like that (bump @schlunma ) - but we can make it work by adding a preferred mip (and that way the variable data is checked against the correct mip table) - let me see if I can implement that in a test mode

@ledm
Copy link
Contributor Author

ledm commented Nov 25, 2019

Thanks @valeriupredoi, for the help with the iris3 installation. That did help!

Just to summarise the problem with volcello (and a solution) as I see it:

In CMIP5, the ocean cell volume, volcello, can only be provided in the fx MIP. It always had the same dimensions: (depth, Latitude, longitude).

In CMIP6, the ocean cell volume, volcello, can be provided in multiple different MIPs, but they have different dimensions:

  • Ofx: (depth, Latitude, longitude)
  • Omon/Oyr/Odec*: (time, depth, Latitude, longitude)

When we're using volcello files, we want whichever one is available, either the Ofx or in the same time resolution as the temperature dataset (typically Omon or Oyr). However, ESMValTool now treats all volcello files as if they have all 4 dimensions.

When the volcello dataset provided by the modelling group is Ofx instead of Omon ESMValTool tried to apply the time operations to a dataset with no time dimension, and we get a fatal error.

The naive solution to the problem is to check for the presence of time in the extract_time preprocessor. If the cube has no time coordinate, then do nothing. Other time preprocessors may also need this check.

I suspect that this problem also exists for 2D FX files like areacello, but areacello never has a time component in any MIP table, so this does not fail.

Odec*: I have not seen any Odec data ever, but I'm including it here to completeness.

@mattiarighi
Copy link
Contributor

Moving this to the core since it is data_finder. issue

@mattiarighi mattiarighi transferred this issue from ESMValGroup/ESMValTool Dec 19, 2019
@mattiarighi mattiarighi added the cmor Related to the CMOR standard label Dec 19, 2019
@ledm
Copy link
Contributor Author

ledm commented Jan 14, 2020

Think I've found a solution to this problem. It's a bit of a hack though.

I've added a clause to the _get_correct_fx_file function in _recipe.py:

elif fx_varname == 'volcello':
if var['dataset'] in ['UKESM1-0-LL', 'GFDL-CM4', 'HadGEM3-GC31-LL', ]:
fx_var = _add_fxvar_keys({'short_name': fx_varname, 'mip': var['mip']},
var)
else:
fx_var = _add_fxvar_keys({'short_name': fx_varname, 'mip': 'Ofx'},
var)

The only thing that this does is set the volcello's fx MIP to the same time resolution as the the main data if the model is one of the three models with time-dimensional volcello.

@valeriupredoi
Copy link
Contributor

@ledm as we discussed over private email, I'd suggest a generalized clause that assigns the variable's mip to the fx variable only for special cases (identified by fx variable name rather than by hardcoding the special dataset names). Since the changed mip is needed only at the file finding stage and not at the CMOR stage (in fact, the mip needs to stay correct from CMOR point of view, eg Ofx for sftof) I'd suggest switching the mip only at the data finding call:

def _get_correct_fx_file(variable, fx_varname, config_user):
    """Wrapper to standard file getter to recover the correct fx file."""
    var = dict(variable)
    if var['project'] in ['CMIP5', 'OBS', 'OBS6', 'obs4mips']:
        fx_var = _add_fxvar_keys({'short_name': fx_varname, 'mip': 'fx'}, var)
    elif var['project'] == 'CMIP6':
        if fx_varname == 'sftlf':
            fx_var = _add_fxvar_keys({'short_name': fx_varname, 'mip': 'fx'},
                                     var)
        elif fx_varname == 'sftof':
            fx_var = _add_fxvar_keys({'short_name': fx_varname, 'mip': 'Ofx'},
                                     var)
        # TODO allow availability for multiple mip's for sftgif
        elif fx_varname == 'sftgif':
            fx_var = _add_fxvar_keys({'short_name': fx_varname, 'mip': 'fx'},
                                     var)
    else:
        raise RecipeError(
            f"Project {var['project']} not supported with fx variables")

    fx_files = _get_input_files(fx_var, config_user)

    # sftof with original variable mip
    if not fx_files and fx_var['mip'] == 'Ofx':
        fx_var['mip'] = var['mip']
        fx_files = _get_input_files(fx_var, config_user)

    # allow for empty lists corrected for by NE masks
    if fx_files:
        fx_files = fx_files[0]

    return fx_files

@valeriupredoi
Copy link
Contributor

can actually make a list in if not fx_files and fx_var['mip'] in ['Ofx', ]: where you can add all the crazy cases 🍺

@ledm
Copy link
Contributor Author

ledm commented Jan 14, 2020

I've had to make a minor change, but V's solution seems to work.

@valeriupredoi
Copy link
Contributor

get the PR out, brohem 🍺

@ledm
Copy link
Contributor Author

ledm commented Jan 14, 2020

The solutions solve the problem of being unable to find the fx files (yay!), but raise another issue.

In the recipe I'm working on, the preporcessor is:

    prep_3_20:
        custom_order: true
        extract_volume:
            z_min: 0
            z_max: 700
        annual_statistics:
            operator: mean
        volume_statistics:
            operator: mean
            fx_files: [volcello, ]
        regrid_time:
            frequency: year

Basically, it extracts the top 700m of the surface ocean temperature, takes the annual average, then take the volume-weighted average.

The volcello cube arrives at volume_statistics without having passed the extract_volume or the annual_statistics stages, so the thetao and volcello cube shapes are incompatible (but also huge - this is a sloooooow recipe!):

  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/_volume.py", line 230, in volume_statistics
    '({})'.format(cube.data.shape, grid_volume.shape))
ValueError: Cube shape ((55, 15, 1080, 1440)) doesn`t match grid volume shape ((240, 35, 1080, 1440))

I'm happy to push the current fix as a PR, but we should be clear that it only solves half the problem. Thoughts @valeriupredoi?

@valeriupredoi
Copy link
Contributor

that is a problem of handling the fx variable as an auxiliary fx variable and not as a true diagnostic variable - if you list it as a diagnostic variable then all the needed preproc steps will be applied to it. But then again that is a Catch 22 since that way you will not be able to couple it to the other variable (tas?). So my hacky suggestion would be to treat these sort of cases in the diagnostic where you will have both tas and volcello preprocc-ed data. Can that be done?

@ledm
Copy link
Contributor Author

ledm commented Jan 14, 2020

So the only way to use fx_files in something like volume_statistics is when there are no operations that change the size of the volume cube before the volume_statistics?

We can only look at global means, no local or regional means? That seems very restrictive.

@valeriupredoi
Copy link
Contributor

@ledm I have changed the original message of the issue to reflect the issues at hand. I think the best incremental approach is to first open a PR that solves the data finding of the volcello data with other mips than Ofx;

Then we need to think about how to have the coupled case (tas, volcello) work optimally by running the preproc steps on both the vars and combining data from both to obtain the volume_statistics. This is not at all straightforward in the context of single-variable preprocessing if not using derive.

@ledm
Copy link
Contributor Author

ledm commented Jan 14, 2020

Thanks V, I'll get a PR sent off for this fix.

As an aside: derive isn't an option for this calculation. Derive makes a full copy of all data needed before it runs the derivation calculation. In this case, that requires making a copy of several tens of GB for both thetao and volcello. It's incredibly slow - can you imagine how long it takes sci3 to save a cube size (240, 35, 1080, 1440)? several hours...

@ledm
Copy link
Contributor Author

ledm commented Jan 14, 2020

This is a pretty nasty flaw for the task I'm trying to do at the minute. I can think of a workaround, but it would be a few days of extra work for me. If I were a normal user (instead of an ESMValTool core developper with a couple years ESMValTool experience), this would be an insurmountable obstacle.

If there's a straightforward way to apply the preprocessing chain to the fx files, I'd rather do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmor Related to the CMOR standard
Projects
None yet
Development

No branches or pull requests

3 participants