Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Earth System Data Cube (ESDC) cmorizer #2799

Merged
merged 59 commits into from
Feb 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
c018503
Added zarr dependency. Could be optional instead.
bsolino Sep 8, 2022
d16362d
Added ESDC dataset, and started cmorizer
bsolino Sep 8, 2022
8ec5023
Added tas to variables and saved
bsolino Sep 9, 2022
e04c933
More detailed cmor configuration
bsolino Sep 16, 2022
b85bbab
Finished cmorizer, added option to connect to cloud
bsolino Sep 16, 2022
96a9d3d
Merge branch 'main' into esdc-cmorizer-pilot
bsolino Sep 16, 2022
35757dd
Added Earth System Data Cube dataset
bsolino Sep 16, 2022
f419956
Linting and formatting changes
bsolino Sep 16, 2022
9c47c0b
Changed to use open_dataset, to prevent deprecation of open_zarr
bsolino Sep 27, 2022
74e5b7c
Corrected codacy issues
bsolino Sep 27, 2022
c44e283
Added reference
bsolino Sep 27, 2022
73dc68b
Solved small codacy issues
bsolino Sep 27, 2022
c2e4d95
Improved documentation and log messages, deleted comments, removed co…
bsolino Sep 27, 2022
9b5b564
Corrected linting errors
bsolino Sep 27, 2022
bc5d315
Linting correction
bsolino Sep 27, 2022
c30f34b
Added dataset to list of supported datasets
bsolino Sep 27, 2022
2d663fd
Merge branch 'main' into esdc-cmorizer-pilot
bsolino Oct 4, 2022
8ea8f2e
Update doc/sphinx/source/input.rst
bsolino Oct 4, 2022
821cf18
Corrected table formatting
bsolino Oct 4, 2022
5067713
Update esmvaltool/cmorizers/data/formatters/datasets/esdc.py
bsolino Oct 4, 2022
53a21e2
Merge branch 'esdc-cmorizer-pilot' of github.com:ESMValGroup/ESMValTo…
bsolino Oct 4, 2022
b77ca62
Formatting: Removed empty line
bsolino Oct 4, 2022
4835123
Updated reference file to ESMValTool standards
bsolino Jan 24, 2023
9007e2b
Merge branch 'main' into esdc-cmorizer-pilot
bsolino Jan 25, 2023
7acd618
Update to version 3.0.1
bsolino Jan 25, 2023
62a2b7b
Refactored to make closer to other esmvaltool cmorizers
bsolino Jan 25, 2023
95749ec
Corrected formatting issues
bsolino Jan 25, 2023
7968b95
Corrected commented out chunking option
bsolino Jan 25, 2023
5d759f5
Merge branch 'main' into esdc-cmorizer-pilot
bsolino Jan 26, 2023
c2f89fb
Merge branch 'esdc-cmorizer-pilot' of github.com:ESMValGroup/ESMValTo…
bsolino Jan 26, 2023
617794e
Removed support for wildcards on pattern
bsolino Jan 26, 2023
76b748e
Updated to 3.0.1
bsolino Jan 26, 2023
e2bbde8
Updated datasets.yml
bsolino Jan 26, 2023
cac8945
Added empty line at end
bsolino Jan 27, 2023
9c65e3e
Added download instructions
bsolino Jan 27, 2023
cd7522c
Removed version folder from the local file path
bsolino Jan 27, 2023
18e72b1
Solved small error
bsolino Jan 27, 2023
173db65
Added empty line at end
bsolino Jan 27, 2023
31b7d4d
Merge branch 'main' into esdc-cmorizer-pilot
bsolino Jan 27, 2023
b1b3da8
Added missing zarr dependencies
bsolino Feb 1, 2023
f30bb6d
Removed TODO
bsolino Feb 1, 2023
d32b74b
Update access data
bsolino Feb 1, 2023
6349bd8
Merge branch 'esdc-cmorizer-pilot' of github.com:ESMValGroup/ESMValTo…
bsolino Feb 1, 2023
a298909
Update esmvaltool/recipes/examples/recipe_check_obs.yml
bsolino Feb 1, 2023
6e38855
Better download example
bsolino Feb 1, 2023
f983939
Merge branch 'esdc-cmorizer-pilot' of github.com:ESMValGroup/ESMValTo…
bsolino Feb 1, 2023
b423cc1
Update esmvaltool/cmorizers/data/formatters/datasets/esdc.py
bsolino Feb 1, 2023
97673ba
Update esmvaltool/cmorizers/data/formatters/datasets/esdc.py
bsolino Feb 1, 2023
438aec2
Merge branch 'esdc-cmorizer-pilot' of github.com:ESMValGroup/ESMValTo…
bsolino Feb 1, 2023
d7118fb
Improved downloading instructions
bsolino Feb 1, 2023
f7c7859
Made cmorizer functions private
bsolino Feb 1, 2023
94a8bcd
No quality control for long docstring line
bsolino Feb 1, 2023
f0b9a3b
At least two spaces
bsolino Feb 1, 2023
c2562cc
Merge branch 'main' into esdc-cmorizer-pilot
bsolino Feb 1, 2023
4877463
Reformatting
bsolino Feb 2, 2023
1c0e2b2
fix line too long
valeriupredoi Feb 2, 2023
c084e7b
rearrange deps
valeriupredoi Feb 2, 2023
4d74344
rearrange deps
valeriupredoi Feb 2, 2023
1b72144
rearrange deps
valeriupredoi Feb 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/sphinx/source/input.rst
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,8 @@ A list of the datasets for which a CMORizers is available is provided in the fol
+------------------------------+------------------------------------------------------------------------------------------------------+------+-----------------+
| ESACCI-WATERVAPOUR | prw (Amon) | 3 | Python |
+------------------------------+------------------------------------------------------------------------------------------------------+------+-----------------+
| ESDC | tas, tasmax, tasmin (Amon) | 2 | Python |
+------------------------------+------------------------------------------------------------------------------------------------------+------+-----------------+
| ESRL | co2s (Amon) | 2 | NCL |
+------------------------------+------------------------------------------------------------------------------------------------------+------+-----------------+
| FLUXCOM | gpp (Lmon) | 3 | Python |
Expand Down
2 changes: 2 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ channels:
dependencies:
- pip!=21.3
- python>=3.8
- aiohttp
- cartopy
- cdo>=1.9.7
- cdsapi
Expand Down Expand Up @@ -62,6 +63,7 @@ dependencies:
- xesmf==0.3.0
- xgboost>1.6.1 # github.com/ESMValGroup/ESMValTool/issues/2779
- xlsxwriter
- zarr
# Python packages needed for testing
- flake8
- pytest >=3.9,!=6.0.0rc1,!=6.0.0
Expand Down
2 changes: 2 additions & 0 deletions environment_osx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ channels:
dependencies:
- pip!=21.3
- python>=3.8
- aiohttp
- cartopy
- cdo>=1.9.7
- cdsapi
Expand Down Expand Up @@ -62,6 +63,7 @@ dependencies:
- xesmf==0.3.0
- xgboost>1.6.1 # github.com/ESMValGroup/ESMValTool/issues/2779
- xlsxwriter
- zarr
# Python packages needed for testing
- flake8
- pytest >=3.9,!=6.0.0rc1,!=6.0.0
Expand Down
26 changes: 26 additions & 0 deletions esmvaltool/cmorizers/data/cmor_config/ESDC.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
filename: 'esdc-8d-{grid}-{chunking}-{version}.zarr'

attributes:
project_id: OBS6
dataset_id: ESDC
version: 3.0.1
tier: 2
grid: 0.25deg
chunking: 1x720x1440
# chunking: 256x128x128
modeling_realm: reanaly
source: http://data.rsc4earth.de/EarthSystemDataCube/
reference: 'esdc'
comment: ''

variables:
tas:
mip: Amon
raw: air_temperature_2m
tasmax:
mip: Amon
raw: max_air_temperature_2m
tasmin:
mip: Amon
raw: min_air_temperature_2m
12 changes: 12 additions & 0 deletions esmvaltool/cmorizers/data/datasets.yml
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,18 @@ datasets:
data/tcwv/dataset3_1/CDR-*/...
All files need to be in one directory, not in yearly subdirectories.

ESDC:
tier: 2
source: http://data.rsc4earth.de/EarthSystemDataCube/
last_access: 2023-01-26
info: |
It is not necessary to download the data, as the cmorizer script can access
it directly from the cloud if it is not available locally.

To download a dataset, the dataset folder can be explored on the source
website, and downloaded using wget:
```wget -m -nH -np -R "index.html*" http://data.rsc4earth.de/EarthSystemDataCube/v3.0.1/```

ESRL:
tier: 2
source: http://www.esrl.noaa.gov/gmd/dv/data/index.php
Expand Down
149 changes: 149 additions & 0 deletions esmvaltool/cmorizers/data/formatters/datasets/esdc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
"""ESMValTool CMORizer for Earth System Data Cube data.

Tier
Tier 2: other freely-available dataset.

Source
http://data.rsc4earth.de/EarthSystemDataCube/

Last access
20230126

Download and processing instructions
It is not necessary to download the data, as the cmorizer script can access
it directly from the cloud if it is not available locally.

To download a dataset, the dataset folder can be explored on the source
website, and downloaded using wget:
```wget -m -nH -np -R "index.html*" http://data.rsc4earth.de/EarthSystemDataCube/v3.0.1/```
""" # noqa: E501
import logging
from copy import deepcopy
from pathlib import Path

import cf_units
import iris.std_names
import xarray as xr
from esmvalcore.preprocessor import monthly_statistics

from esmvaltool.cmorizers.data import utilities as utils

logger = logging.getLogger(__name__)


def _fix_cube(var, cube, cfg):
"""General fixes for all cubes."""
cmor_info = cfg['cmor_table'].get_variable(var['mip'], var['short_name'])

# Set correct names
cube.var_name = cmor_info.short_name
if cmor_info.standard_name:
cube.standard_name = cmor_info.standard_name
cube.long_name = cmor_info.long_name

# Set calendar to gregorian instead of proleptic gregorian
old_unit = cube.coord('time').units
if old_unit.calendar == 'proleptic_gregorian':
logger.info("Converting time units to gregorian")
cube.coord('time').units = cf_units.Unit(old_unit.origin,
calendar='gregorian')
utils.fix_coords(cube)
cube.convert_units(cmor_info.units)
if 'height2m' in cmor_info.dimensions:
utils.add_height2m(cube)
# Conversion from 8-d to monthly frequency
cube = monthly_statistics(cube, operator="mean")

# Fix metadata
attrs = cfg['attributes']
attrs['mip'] = var['mip']
utils.fix_var_metadata(cube, cmor_info)
utils.set_global_atts(cube, attrs)

return cube


def _open_zarr(path):
"""Open zarr dataset."""
logger.info('Opening zarr in "%s"', path)
try:
zarr_dataset = xr.open_dataset(path, engine='zarr')
return zarr_dataset
except KeyError as exception:
# Happens when the zarr folder is missing metadata, e.g. when
# it is a zarr array instead of a zarr dataset.
logger.error('Could not open zarr dataset "%s": "KeyError: %s"', path,
exception)
raise exception


def _extract_variable(zarr_path, var, cfg, out_dir):
"""Open and cmorize cube."""
attributes = deepcopy(cfg['attributes'])
all_attributes = {
**attributes,
**var
} # add the mip to the other attributes
raw_name = var['raw']
zarr_dataset = _open_zarr(zarr_path)
cube_xr = zarr_dataset[raw_name]

# Invalid standard names must be removed before converting to iris
standard_name = cube_xr.attrs.get('standard_name', None)
if (standard_name is not None
and standard_name not in iris.std_names.STD_NAMES):
del cube_xr.attrs['standard_name']
logger.info('Removed invalid standard name "%s".', standard_name)

cube_iris = cube_xr.to_iris()
cube = _fix_cube(var, cube_iris, cfg)

utils.save_variable(cube=cube,
var=var['short_name'],
outdir=out_dir,
attrs=all_attributes,
unlimited_dimensions=['time'])


def cmorization(in_dir, out_dir, cfg, cfg_user, start_date, end_date):
"""Cmorize the dataset."""
if start_date:
logger.warning('start_date set to "%s", but will be ignored',
start_date)
if end_date:
logger.warning('end_date set to "%s", but will be ignored', end_date)

attributes = cfg['attributes']
variables = cfg['variables']
version = attributes['version']
filename_pattern = cfg['filename'].format(grid=attributes['grid'],
chunking=attributes['chunking'],
version=version)

local_path = Path(in_dir)
in_files = list(local_path.glob(filename_pattern))
logger.debug('Pattern %s matched: %s', Path(local_path, filename_pattern),
in_files)

if len(in_files) > 1:
logger.warning(
'Pattern has matched "%i" files, '
'but only the first one will be used.', len(in_files))
logger.warning('The following files will be ignored.: "%s"',
in_files[1:])
zarr_path = in_files[0]
elif len(in_files) == 0:
logger.info(
'No local matches for pattern "%s", '
'attempting connection to the cloud.',
Path(local_path, filename_pattern))
if '*' in filename_pattern:
logger.warning(
'Detected a wildcard character in path (*), '
'online connection to \"%s\" may not work', filename_pattern)
zarr_path = f'{attributes["source"]}/v{version}/{filename_pattern}'

for short_name, var in variables.items():
if 'short_name' not in var:
var['short_name'] = short_name
_extract_variable(zarr_path, var, cfg, out_dir)
12 changes: 12 additions & 0 deletions esmvaltool/recipes/examples/recipe_check_obs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,18 @@ diagnostics:
scripts: null


ESDC:
description: ESDC check
variables:
tas:
tasmin:
tasmax:
additional_datasets:
- {dataset: ESDC, project: OBS6, mip: Amon, tier: 2,
type: reanaly, version: 3.0.1,
start_year: 1979, end_year: 2021}
scripts: null

ESRL:
description: ESRL check
variables:
Expand Down
11 changes: 11 additions & 0 deletions esmvaltool/references/esdc.bibtex
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
@article{esdc,
doi = {10.5194/esd-11-201-2020},
url = {https://esd.copernicus.org/articles/11/201/2020/},
year = {2020},
volume = {11},
number = {1},
pages = {201--234},
author = {Mahecha, M. D. and Gans, F. and Brandt, G. and Christiansen, R. and Cornell, S. E. and Fomferra, N. and Kraemer, G. and Peters, J. and Bodesheim, P. and Camps-Valls, G. and Donges, J. F. and Dorigo, W. and Estupinan-Suarez, L. M. and Gutierrez-Velez, V. H. and Gutwin, M. and Jung, M. and Londo\~no, M. C. and Miralles, D. G. and Papastefanou, P. and Reichstein, M.},
title = {Earth system data cubes unravel global multivariate dynamics},
journal = {Earth System Dynamics}
}
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
# Installation dependencies
# Use with pip install . to install from source
'install': [
'aiohttp',
'cartopy',
'cdo',
'cdsapi',
Expand Down Expand Up @@ -65,6 +66,7 @@
'xesmf==0.3.0',
'xgboost>1.6.1', # github.com/ESMValGroup/ESMValTool/issues/2779
'xlsxwriter',
'zarr',
],
# Test dependencies
# Execute `pip install .[test]` once and the use `pytest` to run tests
Expand Down