-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dealing with the time dimension #64
Comments
Great proposal. I actually have a bunch of time functionality to migrate in here from my current project, some of this will be addressed. Would love to hear your thoughts on an upcoming PR? Some of the functionality discussed here should probably be optional (e.g. replacing the monthly time) to not break existing functionality? |
Definitely optional. I envision a keyword like correct_time=False in the top level function, then a user can switch this on or just import the underlying function for later use. Probably monthly data is the most pressing issue but hopefully this works also for other freqs |
|
Ill get one going in the next days. |
I will prioritize getting #62 to work, so we have the ability to test with actual data in the cloud. I think this will help a lot. |
Likely we can use https://cf-xarray.readthedocs.io/en/latest/whats-new.html like Ryan suggested #59 |
Great! I am still learning about cf-xarray, but I think I am getting close to implementing the cloud tests! Ill ping you once that is ready. |
I wanted to revive this over here and maybe get some more feedback. I have used this function in my workflow: def maybe_unpack_date(date):
"""Optionally Extracts the data value of a 1 element xarray datarray."""
# I should probably not do this here but instead in the higher level functions...
if isinstance(date, xr.DataArray):
date = date.data.tolist()
if isinstance(date, list):
if len(date) != 1:
raise RuntimeError(
"The passed date has the wrong format. Got [{date}] after conversion to list."
)
else:
date = date[0]
return date
def replace_time(ds, ref_date, start_idx=0, freq="1MS", calendar=None):
"""This function replaces the time encoding of a dataset acoording to `ref_date`.
The ref date can be any index of ds.time (default is 0; meaning the first timestep of ds will be replaced with `ref_date`).
"""
ref_date = maybe_unpack_date(ref_date)
# determine the start date
# propagate the date back (this assumes stricly monthly data)
year = ref_date.year - (start_idx // 12)
month = ref_date.month - (start_idx % 12)
if month <= 0:
# move the year one more back
year -= 1
month = 12 + month
attrs = ds.time.attrs
start = f"{int(year):04d}-{int(month):02d}"
calendar = ds.time.encoding.get("calendar", "standard")
ds = ds.assign_coords(
time=xr.cftime_range(start, periods=len(ds.time), freq=freq, calendar=calendar)
)
ds.time.attrs = attrs
return ds This serves two purposes:
I need to write some tests and make this a bit more elegant, but this could technically be used in Also, should we even attempt to expand this to other time frequencies? It currently is very much hardcoded for monthly data (we can raise an error if other frequencies are given). But I have not really worked with any other output in CMIP6 and would also love to get some feedback there. Finally, where should this live? I am thinking of having a |
What if files are missing? Maybe add a check on the last time step also? That the last from the input equals the newly created time? |
Could add freq=infer using xr.infer_freq? |
See pydata/xarray#5233 for a utility to convert calendars. You can help test and/or review it :) |
Oh sweet. Yeah Ill try to refactor with that in mind. |
So I have naively implemented some of this functionality in #126, but will be happy to refactor it with new xarray features. If you folks could have a quick look over there, I would very much appreciate it! |
Basically, over there you can now do from cmip6_preprocessing.drift_removal import replace time
ds = replace_time(ds)
# or specify the calendar
ds = replace_time(ds, calendar='your_fav_calendar') |
for a seamless use of multi-model CMIP6 data, the time dimension is also annoying because the calendar definitions are all over the place.
cmip6_preprocessing
already deals with spatial dimensions and units, maybe we can also get time to a common standard.Especially when working with monthly data, there is a variety of calendars (
360_day
,gregorian
,...) and frequencies (MS
,middle of the month
,M
) used, but essentially they all mean the same thing: a monthly mean. (For daily output this seems less of a problem because daily calendars are better defined.)How to proceed (proposal):
intake-esm
orxr
withuse_cftime=True
.time.size
, thentime=xr.cftime_range()
with common freq='MS or M' could be created. This function could even be used viapreprocessing
, as then thetime
coord/dim is concatinated. Alternatively, this could also just be used as a function applied to each model dataset afterwards.time
dimension starts as expected, e.g.historical
from1850-01
and raise a warning if not. (this wouldnt work nicely inpreprocessing
but rather as a function call afterintake-esm
)I think essentially most users do something like this when they attempt to get different models into one
xr.Dataset
.Thoughts?
The text was updated successfully, but these errors were encountered: