-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update xcdat.open_mfdataset time decoding logic #161
Conversation
Codecov Report
@@ Coverage Diff @@
## main #161 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 8 8
Lines 381 423 +42
=========================================
+ Hits 381 423 +42
Continue to review full report at Codecov.
|
The check for CF compliance in Comments from #158
I expected
I believe your explanation might be right, since each file has their own units ("days since ..."). This might be causing xarray to not join the timestamps across files. Another important note is that the updates in this PR only handles multi-file datasets with CF compliant time. Example:
|
I'm running into a weird xarray issue in PR #107 involving datasets with CF compliant time units that are decoded using I opened up the issue on the xarray repo (pydata/xarray#6015). The updates in this branch should allow us to avoid using |
Okay - I did not realize this - I see that this is not working for non-CF compliant multi-file datasets:
The decoded time only goes to 2050 (it should go to 2100). It seems like this is an issue with |
I will investigate this issue using your example files once |
Yes, the issue is related to For your example, which includes non-CF time units, the reference date of January 1800 is being set as the start for the time series. Instead, it should be January 1800 + the first coordinate value (600 months in this case when
Related code: I'll include a fix in this PR, thanks! |
Continuing work in PR #173, which will merge into this PR. |
9483cf9
to
ec6e20c
Compare
d003a9f
to
0e904c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me now. I added an additional fix, which I've pointed out in a comment.
Feel free to merge after you do a final review.
@tomvothecoder: I am assuming these types of messages for
|
@pochedls Yeah, those are debug level messages that appear in the test suite. I might end up removing it because it will throw an error if |
@tomvothecoder - This looks good to me. Could you merge it (or I can merge it with some guidance from you after the holidays)? Thanks for all your contributions to this bug/PR! |
@pochedls Thanks for the final review! I'll handle rebasing and merging the PR today. |
- Fixes issue with incorrect time values being decoded in `decode_time_units()` for non-CF compliant time units - The fix is to use the time values as offsets to the reference date in the "units" attribute - Fixes calling `open_mfdataset(decode_times=False)` when datasets have the same numerically encoded time values, but differing non-CF compliant time units (e.g., "months since 2000-01-01", "months since 2001-01-01"), resulting in time values being dropped. Summary of Changes - Add optional boolean kwarg `decode_times` to `open_dataset()` and `open_mfdataset()` - Add conditionals to handle this kwarg when True or False - Add optional callable kwarg `preprocess` to `open_mfdataset()` - Add `_preprocess_non_cf_dataset()` function to decode datasets' time values with non-CF compliant units before concatenating (handles cases where the datasets have the same time values and different time units, which would otherwise result in dropping of time values) - Update `decode_non_cf_time()` - Rename from `decode_time_units()` to `decode_non_cf_time()` - Remove logic for checking cf compliance, which is now handled by `_has_cf_compliant_time()` - Fix incorrect start date for decoded time coordinates by forming them using offsets and reference dates, instead of reference date as the start point and a fixed `pd.date_range()` - Using `pd.date_range()` incorrectly assumes no gaps/missing data and that coordinate points started at the beginning of the month. It also did not handle calendar types correctly (e.g,. leap years), and would reset offsets at the beginning of the month or year if they weren't already. - Add decoding of time bounds - Add utility function `_split_time_units_attr()` for splitting "units" attribute into units and reference date strings - Update docstrings of methods - Update test fixtures for correctness and readable syntax
- Fix type annotation in docstrings
fa374e1
to
bd80636
Compare
Description
xcdat.open_mfdataset()
to usexarray
's in-built time decoding if the dataset has cf_compliant time unitsChecklist
If applicable: