Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CF conventions: time_bnds and time units #2565

Closed
fmaussion opened this issue Nov 23, 2018 · 4 comments
Closed

CF conventions: time_bnds and time units #2565

fmaussion opened this issue Nov 23, 2018 · 4 comments

Comments

@fmaussion
Copy link
Member

Problem

Here is the dump of a NetCDF file (download):

netcdf cesm.TREFHT.160001-200512.selection {
dimensions:
        time = UNLIMITED ; // (4872 currently)
        lat = 3 ;
        lon = 3 ;
        nbnd = 2 ;
variables:
        float TREFHT(time, lat, lon) ;
                TREFHT:units = "K" ;
                TREFHT:long_name = "Reference height temperature" ;
                TREFHT:cell_methods = "time: mean" ;
        double lat(lat) ;
                lat:long_name = "latitude" ;
                lat:units = "degrees_north" ;
        double lon(lon) ;
                lon:long_name = "longitude" ;
                lon:units = "degrees_east" ;
        double time(time) ;
                time:long_name = "time" ;
                time:units = "days since 0850-01-01 00:00:00" ;
                time:calendar = "noleap" ;
                time:bounds = "time_bnds" ;
        double time_bnds(time, nbnd) ;
                time_bnds:long_name = "time interval endpoints" ;

// global attributes:
                :Conventions = "CF-1.0" ;
                :source = "CAM" ;
...
}

When xarray decodes the time coordinates it also deletes the time:units attribute (this kind of makes sense, because the unit has no meaning when the time is converted to a CFTime object):

import xarray as xr
ds = xr.open_dataset(f)
ds.time
<xarray.DataArray 'time' (time: 4872)>
array([cftime.DatetimeNoLeap(1600, 2, 1, 0, 0, 0, 0, 0, 32),
       cftime.DatetimeNoLeap(1600, 3, 1, 0, 0, 0, 0, 0, 60),
       cftime.DatetimeNoLeap(1600, 4, 1, 0, 0, 0, 0, 3, 91), ...,
       cftime.DatetimeNoLeap(2005, 11, 1, 0, 0, 0, 0, 6, 305),
       cftime.DatetimeNoLeap(2005, 12, 1, 0, 0, 0, 0, 1, 335),
       cftime.DatetimeNoLeap(2006, 1, 1, 0, 0, 0, 0, 4, 1)], dtype=object)
Coordinates:
  * time     (time) object 1600-02-01 00:00:00 ... 2006-01-01 00:00:00
Attributes:
    long_name:  time
    bounds:     time_bnds

The problem is that I have no way to actually decode the time_bnds variable from xarray alone now, because the time_bnds variable doesn't store the time units. First, I thought that my file was not CF compliant but I've looked into the CF conventions and it looks like they are not prescribing that time_bnds should also have a units attribute.

Solution

I actually don't know what we should do here. I see a couple of ways:

  1. we don't care and leave it to the user (here: me) to open the file with netCDF4 to decode the time bounds
  2. we don't delete the time:units attribute after decoding
  3. we start to also decode the time_bnds when available, like we do with time

Thoughts? cc @spencerkclark @jhamman

@fmaussion
Copy link
Member Author

I am actually in favor of 3: also decode time_bnds

@spencerkclark
Copy link
Member

@fmaussion I have run into this issue before and it is a bit cumbersome. My workaround in the past has been to open the dataset without decoding the times initially, copy the units and calendar attributes from the time variable to the time_bnds variable, and then decode the times (xarray will decode any variable with a time-like units attribute into dates, regardless of its name).

In [1]: import xarray

In [2]: ds = xarray.open_dataset('cesm.TREFHT.160001-200512.selection.nc', decode_times=False)

In [3]: ds.time_bnds.attrs['units'] = ds.time.attrs['units']

In [4]: ds.time_bnds.attrs['calendar'] = ds.time.attrs['calendar']

In [5]: ds_decoded = xarray.decode_cf(ds)

I am actually in favor of 3: also decode time_bnds

It would be nice if we could handle this automatically, though one issue that concerns me is how we would automatically determine which variable represents the time bounds in an arbitrary netCDF file (as mentioned above we currently do not have any variable-name-specific logic in xarray, which I think is a good thing). I do notice the time variable in your file has a bounds attribute, which points to the name time_bnds. Is that something required by CF conventions? We might be able to rely on that.

@fmaussion
Copy link
Member Author

fmaussion commented Nov 23, 2018

Good points!

I do notice the time variable in your file has a bounds attribute, which points to the name time_bnds. Is that something required by CF conventions? We might be able to rely on that.

Yes, bounds is a CF name:
http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#cell-boundaries

It seems reasonable to assume that the bounds should share the same units as time, therefore I think that the workaround you are using could actually be implemented in xarray, but the actual implementation might be a bit messy...

(it would be much simpler if CF would prescribe the units to also be available at the the time_bnds level...)

@shoyer
Copy link
Member

shoyer commented Nov 24, 2018

You can find the decoded time units in encoding:

In [8]: ds.time.encoding['units']
Out[8]: 'days since 0850-01-01 00:00:00'

But I would also be in favor of decoding time units in bounds variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants