Skip to content

Commit

Permalink
Update xcdat.open_mfdataset time decoding logic (#161)
Browse files Browse the repository at this point in the history
- Fixes issue with incorrect time values being decoded in `decode_time_units()` for non-CF compliant time units by using the time values as offsets to the reference date in the "units" attribute
- Fixes calling `open_mfdataset(decode_times=False)` when datasets have the same numerically encoded time values, but differing non-CF compliant time units (e.g., "months since 2000-01-01", "months since 2001-01-01"), resulting in time values being dropped

Summary of Changes
- Add optional boolean kwarg `decode_times` to `open_dataset()` and `open_mfdataset()`
  -  Add conditionals to handle this kwarg when True or False
- Add optional callable kwarg  `preprocess` to `open_mfdataset()`
  - Add `_preprocess_non_cf_dataset()` function to decode datasets' time values with non-CF compliant units before concatenating (handles cases where the datasets have the same time values and different time units, which would otherwise result in dropping of time values)
- Update `decode_non_cf_time()`
  - Rename from `decode_time_units()` to `decode_non_cf_time()`
  - Remove logic for checking cf compliance, which is now handled by `_has_cf_compliant_time()`
  - Fix incorrect start date for decoded time coordinates by forming them using offsets and reference dates, instead of reference date as the start point and a fixed `pd.date_range()`
    - Using `pd.date_range()` incorrectly assumes no gaps/missing data and that coordinate points started at the beginning of the month. It also did not handle calendar types correctly (e.g,. leap years), and would reset offsets at the beginning of the month or year if they weren't already.
  - Add decoding of time bounds
- Add utility function `_split_time_units_attr()` for splitting "units" attribute into units and reference date strings
- Update docstrings of methods
- Update test fixtures for correctness and readable syntax

Co-authored-by: Tom Vo <[email protected]>
  • Loading branch information
pochedls and tomvothecoder authored Jan 4, 2022
1 parent 85786c5 commit 5d2dda1
Show file tree
Hide file tree
Showing 5 changed files with 796 additions and 297 deletions.
3 changes: 2 additions & 1 deletion docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@ Top-level API

dataset.open_dataset
dataset.open_mfdataset
dataset.has_cf_compliant_time
dataset.decode_non_cf_time
dataset.infer_or_keep_var
dataset.decode_time_units
dataset.get_inferred_var

.. currentmodule:: xarray
Expand Down
50 changes: 25 additions & 25 deletions tests/fixtures.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,18 +32,20 @@
],
dims=["time"],
attrs={
"axis": "T",
"long_name": "time",
"standard_name": "time",
"axis": "T",
},
)
time_non_cf = xr.DataArray(
data=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
dims=["time"],
attrs={
"units": "months since 2000-01-01",
"calendar": "standard",
"axis": "T",
"long_name": "time",
"standard_name": "time",
"axis": "T",
},
)

Expand Down Expand Up @@ -72,18 +74,18 @@
time_bnds_non_cf = xr.DataArray(
name="time_bnds",
data=[
[datetime(1999, 12, 16, 12), datetime(2000, 1, 16, 12)],
[datetime(2000, 1, 16, 12), datetime(2000, 2, 15, 12)],
[datetime(2000, 2, 15, 12), datetime(2000, 3, 16, 12)],
[datetime(2000, 3, 16, 12), datetime(2000, 4, 16)],
[datetime(2000, 4, 16), datetime(2000, 5, 16, 12)],
[datetime(2000, 5, 16, 12), datetime(2000, 6, 16)],
[datetime(2000, 6, 16), datetime(2000, 7, 16, 12)],
[datetime(2000, 7, 16, 12), datetime(2000, 8, 16, 12)],
[datetime(2000, 8, 16, 12), datetime(2000, 9, 16)],
[datetime(2000, 9, 16), datetime(2000, 10, 16, 12)],
[datetime(2000, 10, 16, 12), datetime(2000, 11, 16)],
[datetime(2000, 11, 16), datetime(2000, 12, 16)],
[-1, 0],
[0, 1],
[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9],
[9, 10],
[10, 11],
],
coords={"time": time_non_cf},
dims=["time", "bnds"],
Expand Down Expand Up @@ -172,19 +174,18 @@ def generate_dataset(cf_compliant: bool, has_bounds: bool) -> xr.Dataset:
)

if cf_compliant:
ds = ds.assign({"time_bnds": time_bnds.copy()})
ds = ds.assign_coords({"time": time_cf.copy()})
ds.coords["time"] = time_cf.copy()
ds["time_bnds"] = time_bnds.copy()
elif not cf_compliant:
ds = ds.assign({"time_bnds": time_bnds_non_cf.copy()})
ds = ds.assign_coords({"time": time_non_cf.copy()})
ds["time"] = ds.time.assign_attrs(units="months since 2000-01-01")
ds.coords["time"] = time_non_cf.copy()
ds["time_bnds"] = time_bnds_non_cf.copy()

# If the "bounds" attribute is included in an existing DataArray and
# added to a new Dataset, it will get dropped. Therefore, it needs to be
# assigned to the DataArrays after they are added to Dataset.
ds["lat"] = ds.lat.assign_attrs(bounds="lat_bnds")
ds["lon"] = ds.lon.assign_attrs(bounds="lon_bnds")
ds["time"] = ds.time.assign_attrs(bounds="time_bnds")
ds["lat"].attrs["bounds"] = "lat_bnds"
ds["lon"].attrs["bounds"] = "lon_bnds"
ds["time"].attrs["bounds"] = "time_bnds"

elif not has_bounds:
ds = xr.Dataset(
Expand All @@ -193,9 +194,8 @@ def generate_dataset(cf_compliant: bool, has_bounds: bool) -> xr.Dataset:
)

if cf_compliant:
ds = ds.assign_coords({"time": time_cf.copy()})
ds.coords["time"] = time_cf.copy()
elif not cf_compliant:
ds = ds.assign_coords({"time": time_non_cf.copy()})
ds["time"] = ds.time.assign_attrs(units="months since 2000-01-01")
ds.coords["time"] = time_non_cf.copy()

return ds
Loading

0 comments on commit 5d2dda1

Please sign in to comment.