-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional fixes for PR #158 #173
Additional fixes for PR #158 #173
Conversation
|
||
ds = dataset[[data_var] + bounds_vars] | ||
ds.attrs["xcdat_infer"] = data_var | ||
# FIXME: This doesn't handle pathlib paths or a list of lists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to handle this still
f91be30
to
b15a776
Compare
9483cf9
to
ec6e20c
Compare
b15a776
to
378798f
Compare
- Rename to `decode_non_cf_time()` and remove logic for cf unit decoding - Fix incorrect start date for pandas date range by adding offset from first coordinate point Refactor function and variable names - Update `xcdat_infer` attr from None to "None" since `to_netcdf` does not support None type - Update fixtures for non cf time bounds and `generate_dataset()` Update `decode_non_cf_time()` to use offsets correctly - Using pd.date_range() is incorrect because it assumes no gaps or missing data. It also resets offsets to beginning of the month and year and does not consider leap years - pd.DataOffset considers relative arithmetic operations based on the calendar Update `decode_non_cf_time()` to handle time bounds - Update tests for `decode_non_cf_time()` Add functions to `__init__.py` - Update time fixtures with correct attributes Add `_preprocess_non_cf_dataset()` - Add test for callable in `_preprocess_non_cf_dataset`
378798f
to
10ffc6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your catch on this issue and fix is really valuable. I read through the code and did not see problems, but I think merging this and testing it with a wide swath of CMIP (and non-CMIP) data will help to further evaluate / validate these methods.
Thanks for the review! I'm glad to hear that the changes made sense to you. I also agree, we should stress test these changes using many different datasets. I will merge this PR to your PR. |
- Fixes issue with incorrect time values being decoded in `decode_time_units()` for non-CF compliant time units - The fix is to use the time values as offsets to the reference date in the "units" attribute - Fixes calling `open_mfdataset(decode_times=False)` when datasets have the same numerically encoded time values, but differing non-CF compliant time units (e.g., "months since 2000-01-01", "months since 2001-01-01"), resulting in time values being dropped. Summary of Changes - Add optional boolean kwarg `decode_times` to `open_dataset()` and `open_mfdataset()` - Add conditionals to handle this kwarg when True or False - Add optional callable kwarg `preprocess` to `open_mfdataset()` - Add `_preprocess_non_cf_dataset()` function to decode datasets' time values with non-CF compliant units before concatenating (handles cases where the datasets have the same time values and different time units, which would otherwise result in dropping of time values) - Update `decode_non_cf_time()` - Rename from `decode_time_units()` to `decode_non_cf_time()` - Remove logic for checking cf compliance, which is now handled by `_has_cf_compliant_time()` - Fix incorrect start date for decoded time coordinates by forming them using offsets and reference dates, instead of reference date as the start point and a fixed `pd.date_range()` - Using `pd.date_range()` incorrectly assumes no gaps/missing data and that coordinate points started at the beginning of the month. It also did not handle calendar types correctly (e.g,. leap years), and would reset offsets at the beginning of the month or year if they weren't already. - Add decoding of time bounds - Add utility function `_split_time_units_attr()` for splitting "units" attribute into units and reference date strings - Update docstrings of methods - Update test fixtures for correctness and readable syntax
- Fixes issue with incorrect time values being decoded in `decode_time_units()` for non-CF compliant time units - The fix is to use the time values as offsets to the reference date in the "units" attribute - Fixes calling `open_mfdataset(decode_times=False)` when datasets have the same numerically encoded time values, but differing non-CF compliant time units (e.g., "months since 2000-01-01", "months since 2001-01-01"), resulting in time values being dropped. Summary of Changes - Add optional boolean kwarg `decode_times` to `open_dataset()` and `open_mfdataset()` - Add conditionals to handle this kwarg when True or False - Add optional callable kwarg `preprocess` to `open_mfdataset()` - Add `_preprocess_non_cf_dataset()` function to decode datasets' time values with non-CF compliant units before concatenating (handles cases where the datasets have the same time values and different time units, which would otherwise result in dropping of time values) - Update `decode_non_cf_time()` - Rename from `decode_time_units()` to `decode_non_cf_time()` - Remove logic for checking cf compliance, which is now handled by `_has_cf_compliant_time()` - Fix incorrect start date for decoded time coordinates by forming them using offsets and reference dates, instead of reference date as the start point and a fixed `pd.date_range()` - Using `pd.date_range()` incorrectly assumes no gaps/missing data and that coordinate points started at the beginning of the month. It also did not handle calendar types correctly (e.g,. leap years), and would reset offsets at the beginning of the month or year if they weren't already. - Add decoding of time bounds - Add utility function `_split_time_units_attr()` for splitting "units" attribute into units and reference date strings - Update docstrings of methods - Update test fixtures for correctness and readable syntax
Description
This PR addresses the following issues (for PR #161):
decode_time_units()
for non-CF compliant time unitsopen_mfdataset(decode_times=False)
when datasets have the same numerically encoded time values, but differing non-CF compliant time units (e.g., "months since 2000-01-01", "months since 2001-01-01"), resulting in time values being dropped.Summary of Changes
decode_times
toopen_dataset()
andopen_mfdataset()
preprocess
toopen_mfdataset()
_preprocess_non_cf_dataset()
function to decode datasets' time values with non-CF compliant units before concatenating (handles cases where the datasets have the same time values and different time units, which would otherwise result in dropping of time values)decode_non_cf_time()
decode_time_units()
todecode_non_cf_time()
_has_cf_compliant_time()
pd.date_range()
pd.date_range()
incorrectly assumes no gaps/missing data and that coordinate points started at the beginning of the month. It also did not handle calendar types correctly (e.g,. leap years), and would reset offsets at the beginning of the month or year if they weren't already._split_time_units_attr()
for splitting "units" attribute into units and reference date stringsChecklist
If applicable: