Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with GFS time reference #827

Closed
caiostringari opened this issue Apr 16, 2016 · 7 comments
Closed

Issue with GFS time reference #827

caiostringari opened this issue Apr 16, 2016 · 7 comments

Comments

@caiostringari
Copy link

I am currently translating some old ferret code into python. However, when downloading GFS operational data, there was an issue...

When downloaded from ferret, the GFS file has the following time reference (using ncdump -h):

double TIME(TIME) ;
        TIME:units = "days since 0001-01-01 00:00:00" ;
        TIME:long_name = "time" ;
        TIME:time_origin = "01-JAN-0001 00:00:00" ;
        TIME:axis = "T" ;
        TIME:standard_name = "time" ;

When using xarray to access the openDAP server and writing to disk using ds.to_netcdf(), the file has this time reference.

double time(time) ;
        string time:grads_dim = "t" ;
        string time:grads_mapping = "linear" ;
        string time:grads_size = "81" ;
        string time:grads_min = "00z15apr2016" ;
        string time:grads_step = "3hr" ;
        string time:long_name = "time" ;
        string time:minimum = "00z15apr2016" ;
        string time:maximum = "00z25apr2016" ;
        time:resolution = 0.125f ;
        string time:units = "days since 2001-01-01" ;
        time:calendar = "proleptic_gregorian" ;

This is not really an issue while using the data inside python because the dates are translated correct. However, in my work flux, I need this file to be read for other models such as WW3. For instance, trying to read it from WW3, results in:

Processing data
 --------------------------------------------------
           Time : 0015/03/15 00:00:00 UTC
                  reading ....
                  interpolating ....
                  writing ....
           Time : 0015/03/15 03:00:00 UTC

Looking at the reference time, ferret gives TIME:time_origin = "01-JAN-0001 00:00:00" while xarray gives string time:units = "days since 2001-01-01". Well, there are 2000 years missing...

I tried to fix it using something like:

ds.coords['reference_time'] = pd.Timestamp('1-1-1')

But the reference time didn't really updated. Is there an easy way to fix the reference time to match what is in the NOAA's openDAP server ?

@shoyer
Copy link
Member

shoyer commented Apr 17, 2016

When you're writing the data back to disk with to_netcdf, try writing something like: ds.to_netcdf('somefile.nc', encoding={'time': {'units': '01-JAN-0001 00:00:00'}}).

But I'm a little surprised this doesn't work by default. Xarray does use '2001-01-01' as a default reference time, but if you pulled the data from an existing dataset (rather than creating the time variable directly yourself such as with numpy or pandas), then it should save the original units in encoding attribute of the time variable, which should then be used to save the units that it writes to the file. If ds is the name of the dataset you open from OpenDAP or save to netcdf, what does the value of ds.time.encoding look like?

@caiostringari
Copy link
Author

ds.time.encoding results in

{'dtype': dtype('float64'),
 'original_shape': (81,),
 'source': 'http://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/gfs20160417/gfs_0p25_00z',
 'units': u'days since 1-1-1 00:00:0.0'}

I tried to use ds.to_netcdf('filename.nc', encoding={'time': {'units': u'days since 1-1-1 00:00:0.0'}}), but ncdump still shows "time" as having "days since 2001-01-01" as reference time:

double time(time) ;
        string time:grads_dim = "t" ;
        string time:grads_mapping = "linear" ;
        string time:grads_size = "81" ;
        string time:grads_min = "00z17apr2016" ;
        string time:grads_step = "3hr" ;
        string time:long_name = "time" ;
        string time:minimum = "00z17apr2016" ;
        string time:maximum = "00z27apr2016" ;
        time:resolution = 0.125f ;
        string time:units = "days since 2001-01-01" ;
        time:calendar = "proleptic_gregorian" ;

@shoyer
Copy link
Member

shoyer commented Apr 18, 2016

Ah, I finally figured out what's going on.

We use pandas to cleanup time units in an attempt to always write ISO-8601 compatible reference times. Unfortunately, pandas interprets dates like '1-1-1' or '01-JAN-0001' as January 1, 2001:

In [21]: pd.Timestamp('1-1-1 00:00:0.0')
Out[21]: Timestamp('2001-01-01 00:00:00')

In [25]: pd.Timestamp('01-JAN-0001 00:00:00')
Out[25]: Timestamp('2001-01-01 00:00:00')

One might argue this is a bug in pandas, but nonetheless that's what it does.

xarray can currently handle datetimes outside the range dates hangled by pandas (roughly 1700-2300), but only if pandas raises an OutOfBoundDatetime error.

Two fixes that we need for this:

  • use netCDF4's reference time decoding (if available) before trying to use pandas in decode_cf_datetime. Note that it is important to only decode only the one reference time if possible using netCDF4, because it's a lot faster to parse dates with vectorized operations with pandas/numpy.
  • stop using _cleanup_netcdf_time_units, since apparently it can go wrong.

cc @jhamman who has some experience with these issues

@caiostringari
Copy link
Author

I would rather stick only with xarray, but iris solved the issue for now...

Another alternative is to use NCO an do something like:ncatted -O -a units,time,o,c,'days since 0001-01-01 00:00:00'

@jhamman
Copy link
Member

jhamman commented Apr 19, 2016

Unfortunately, pandas interprets dates like '1-1-1' or '01-JAN-0001' as January 1, 2001

this is too bad.

use netCDF4's reference time decoding (if available) before trying to use pandas in decode_cf_datetime. Note that it is important to only decode only the one reference time if possible using netCDF4, because it's a lot faster to parse dates with vectorized operations with pandas/numpy.

This seems easy enough. It would be nice if we always had netcdftime available.

I can try to take a hack at this later this week (unless someone gets there first).

@stale
Copy link

stale bot commented Jan 28, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

@albertotb
Copy link

Also mentioned in #4422 and fixed in #4506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants