Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinate attributes as DataArray type doesn't export to netcdf #1906

Closed
mraspaud opened this issue Feb 13, 2018 · 5 comments
Closed

Coordinate attributes as DataArray type doesn't export to netcdf #1906

mraspaud opened this issue Feb 13, 2018 · 5 comments

Comments

@mraspaud
Copy link
Contributor

Code Sample, a copy-pastable example if possible

import numpy as np
import xarray as xr

arr = xr.DataArray([[1, 2, 3]], dims=['time', 'x'])
arr['time'] = np.array([1])
time_bnds = xr.DataArray([0, 1], dims='time_bounds')
arr['time'].attrs['bounds'] = time_bnds

dataset = xr.Dataset({'arr': arr,
                      'time_bnds': time_bnds})

dataset.to_netcdf('time_bnd.nc')

Problem description

This code produces a TypeError

Traceback (most recent call last):
  File "test_time_bounds.py", line 12, in <module>
    dataset.to_netcdf('time_bnd.nc')
  File "/home/a001673/.local/lib/python2.7/site-packages/xarray/core/dataset.py", line 1132, in to_netcdf
    unlimited_dims=unlimited_dims)
  File "/home/a001673/.local/lib/python2.7/site-packages/xarray/backends/api.py", line 598, in to_netcdf
    _validate_attrs(dataset)
  File "/home/a001673/.local/lib/python2.7/site-packages/xarray/backends/api.py", line 121, in _validate_attrs
    check_attr(k, v)
  File "/home/a001673/.local/lib/python2.7/site-packages/xarray/backends/api.py", line 112, in check_attr
    'files'.format(value))
TypeError: Invalid value for attr: <xarray.DataArray (time_bounds: 2)>
array([0, 1])
Dimensions without coordinates: time_bounds must be a number string, ndarray or a list/tuple of numbers/strings for serialization to netCDF files

This is a problem for me because we need to provide attributes to the coordinate variables and save the to netcdf in order to be CF compliant. There are workarounds (like saving the time_bnds as a regular variable and putting its name as an attribute of the time variable) , but the provided code seems to be the most intuitive way to do it.

Expected output

I would expect an output like this (ncdump -h):

netcdf time_bnd {
dimensions:
	time = 1 ;
	time_bounds = 2 ;
	x = 3 ;
variables:
	int64 time(time) ;
		time:bounds = "time_bnds" ;
	int64 time_bnds(time_bounds) ;
	int64 arr(time, x) ;

Output of xr.show_versions()

In [2]: xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-693.11.6.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None

xarray: 0.10.0
pandas: 0.21.0
numpy: 1.13.3
scipy: 0.18.1
netCDF4: 1.1.8
h5netcdf: 0.4.2
Nio: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.16.1
matplotlib: 2.1.0
cartopy: None
seaborn: None
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: 3.1.3
IPython: 5.5.0
sphinx: 1.6.6

@fmaussion
Copy link
Member

There are workarounds (like saving the time_bnds as a regular variable and putting its name as an attribute of the time variable)

This is not a workaround, this is what the CF conventions say to do: cell bounds need to be defined as variables, while the bounds attribute links to the name of this variable.

@fmaussion
Copy link
Member

Also DataArrays can have attributes, so storing them as attributes could lead to quite intricate situations ;)

@mraspaud
Copy link
Contributor Author

Also DataArrays can have attributes, so storing them as attributes could lead to quite intricate situations ;)

I agree totally, but when writing to netcdf, I was expecting xarray to take out the DataArrays out of attributes, replace them with their name, and save them as separate array in the netcdf files. But maybe that's not xarray's job do deal with ?

@jhamman
Copy link
Member

jhamman commented Feb 26, 2018

@mraspaud -

The way to do this with xarray is:

arr = xr.DataArray([[1, 2, 3]], dims=['time', 'x'])
arr['time'] = np.array([1])
time_bnds = xr.DataArray([[0, 1]], dims=('time', 'nv'))
arr['time'].attrs['bounds'] = 'time_bnds'

dataset = xr.Dataset({'arr': arr,
                      'time_bnds': time_bnds})

dataset.info()
xarray.Dataset {
dimensions:
	nv = 2 ;
	time = 1 ;
	x = 3 ;

variables:
	int64 time(time) ;
		time:bounds = time_bnds ;
	int64 arr(time, x) ;
	int64 time_bnds(time, nv) ;

// global attributes:
}

Note this is the same number of lines of code and is CF compliant. I personally don't see us going down the path of nesting xarray objects inside of attrs.

Is there more to discuss here or should we close this?

@mraspaud
Copy link
Contributor Author

I'm satisfied with this answer, thanks for taking the time !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants