Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dealing with the time dimension #64

Closed
aaronspring opened this issue Dec 2, 2020 · 14 comments
Closed

dealing with the time dimension #64

aaronspring opened this issue Dec 2, 2020 · 14 comments
Assignees

Comments

@aaronspring
Copy link
Contributor

for a seamless use of multi-model CMIP6 data, the time dimension is also annoying because the calendar definitions are all over the place. cmip6_preprocessing already deals with spatial dimensions and units, maybe we can also get time to a common standard.

Especially when working with monthly data, there is a variety of calendars (360_day,gregorian,...) and frequencies (MS,middle of the month,M) used, but essentially they all mean the same thing: a monthly mean. (For daily output this seems less of a problem because daily calendars are better defined.)

How to proceed (proposal):

  • One first step would be to use intake-esm or xr with use_cftime=True.
  • monthly output: check year and month of first and last timestep. if this matches time.size, then time=xr.cftime_range() with common freq='MS or M' could be created. This function could even be used via preprocessing, as then the time coord/dim is concatinated. Alternatively, this could also just be used as a function applied to each model dataset afterwards.
  • daily: makes sense to align?
  • annual: makes sense to align? not too much annual CMIP6 output anyways
  • additional checks could get the experiment_id and see whether the time dimension starts as expected, e.g. historical from 1850-01 and raise a warning if not. (this wouldnt work nicely in preprocessing but rather as a function call after intake-esm)

I think essentially most users do something like this when they attempt to get different models into one xr.Dataset.

Thoughts?

@jbusecke
Copy link
Owner

jbusecke commented Dec 2, 2020

Great proposal. I actually have a bunch of time functionality to migrate in here from my current project, some of this will be addressed.

Would love to hear your thoughts on an upcoming PR?

Some of the functionality discussed here should probably be optional (e.g. replacing the monthly time) to not break existing functionality?

@aaronspring
Copy link
Contributor Author

aaronspring commented Dec 2, 2020

Definitely optional. I envision a keyword like correct_time=False in the top level function, then a user can switch this on or just import the underlying function for later use.

Probably monthly data is the most pressing issue but hopefully this works also for other freqs

@aaronspring
Copy link
Contributor Author

Would love to hear your thoughts on an upcoming PR?
Will you start a PR or shall I?
Sounds like you already have something...

@jbusecke
Copy link
Owner

jbusecke commented Dec 2, 2020

Ill get one going in the next days.

@jbusecke
Copy link
Owner

jbusecke commented Dec 2, 2020

I will prioritize getting #62 to work, so we have the ability to test with actual data in the cloud. I think this will help a lot.

@aaronspring
Copy link
Contributor Author

Likely we can use https://cf-xarray.readthedocs.io/en/latest/whats-new.html like Ryan suggested #59

@jbusecke
Copy link
Owner

jbusecke commented Dec 2, 2020

Great! I am still learning about cf-xarray, but I think I am getting close to implementing the cloud tests! Ill ping you once that is ready.

@jbusecke
Copy link
Owner

jbusecke commented May 6, 2021

I wanted to revive this over here and maybe get some more feedback. I have used this function in my workflow:

def maybe_unpack_date(date):
    """Optionally Extracts the data value of a 1 element xarray datarray."""
    # I should probably not do this here but instead in the higher level functions...
    if isinstance(date, xr.DataArray):
        date = date.data.tolist()
        if isinstance(date, list):
            if len(date) != 1:
                raise RuntimeError(
                    "The passed date has the wrong format. Got [{date}] after conversion to list."
                )
            else:
                date = date[0]
    return date

def replace_time(ds, ref_date, start_idx=0, freq="1MS", calendar=None):
    """This function replaces the time encoding of a dataset acoording to `ref_date`.
    The ref date can be any index of ds.time (default is 0; meaning the first timestep of ds will be replaced with `ref_date`).
    """
    ref_date = maybe_unpack_date(ref_date)

    # determine the start date
    # propagate the date back (this assumes stricly monthly data)

    year = ref_date.year - (start_idx // 12)
    month = ref_date.month - (start_idx % 12)

    if month <= 0:
        # move the year one more back
        year -= 1
        month = 12 + month

    attrs = ds.time.attrs

    start = f"{int(year):04d}-{int(month):02d}"
    calendar = ds.time.encoding.get("calendar", "standard")

    ds = ds.assign_coords(
        time=xr.cftime_range(start, periods=len(ds.time), freq=freq, calendar=calendar)
    )
    ds.time.attrs = attrs
    return ds

This serves two purposes:

  1. You can replace the time with any calendar you want
  2. You can shift the time by a given number of intervals (with start_idx). I need this for some detrending stuff that will come soon.

I need to write some tests and make this a bit more elegant, but this could technically be used in combined_preprocessing or as a standalone function to harmonize calendars of monthly data (we might want to let ref_date default to ds.time[0] to make this work out of the box).
I am almost inclined to do the latter for monthly output, but would love to hear opinions on whether this should be optional or not (and how to implement such a thing. It is currently not possible to pass kwargs to the preprocess function in inteake-esm as far as I know cc @andersy005).

Also, should we even attempt to expand this to other time frequencies? It currently is very much hardcoded for monthly data (we can raise an error if other frequencies are given). But I have not really worked with any other output in CMIP6 and would also love to get some feedback there.

Finally, where should this live? I am thinking of having a time_processing.py module, but I would like to eventually integrate detrending etc there and I am not sure if that is the most intuitive naming. Suggestions welcome.

@jbusecke jbusecke self-assigned this May 6, 2021
@aaronspring
Copy link
Contributor Author

What if files are missing? Maybe add a check on the last time step also? That the last from the input equals the newly created time?

@aaronspring
Copy link
Contributor Author

Could add freq=infer using xr.infer_freq?

@dcherian
Copy link

dcherian commented May 6, 2021

See pydata/xarray#5233 for a utility to convert calendars. You can help test and/or review it :)

@jbusecke
Copy link
Owner

jbusecke commented May 6, 2021

See pydata/xarray#5233 for a utility to convert calendars. You can help test and/or review it :)

Oh sweet. Yeah Ill try to refactor with that in mind.

@jbusecke
Copy link
Owner

jbusecke commented Jun 4, 2021

So I have naively implemented some of this functionality in #126, but will be happy to refactor it with new xarray features. If you folks could have a quick look over there, I would very much appreciate it!

@jbusecke
Copy link
Owner

Basically, over there you can now do

from cmip6_preprocessing.drift_removal import replace time
ds = replace_time(ds)
# or specify the calendar
ds = replace_time(ds, calendar='your_fav_calendar')

@jbusecke jbusecke closed this as completed Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants