You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
Conversion to a pandas DataFrame of a zero dimensional DataArray or Dataset fails.
What you expected to happen:
I would expect it to return a trivial DataFrame with one row and the respective coordinate / data set columns.
However, I am not sure if that conflicts with potential other round trips between xarray and pandas - e.g. for one-dimensional 1-sized data arrays.
Minimal Complete Verifiable Example:
da=DataArray([1, 2, 3], dims=("x",), coords=dict(x=[1, 2, 3]))
# I don't know of a way to construct such data array without the isel.# Essentially, below also works for higher dimensional data arrays and# results in a zero dimensional data array with all the coordinates of# the found minimum.da=da.isel(**da.argmin(dim=("x",)))
ds=Dataset({'a': da})
# fails with ValueError: cannot convert a scalar to a DataFrame# from xarray/core/dataarray.py", line 2664, in to_dataframeda.to_dataframe(name="foo")
# Expected: a DataFrame with two columns (x and foo) and one row# fails with ValueError: no valid index for a 0-dimensional object# from xarray/core/coordinates.py", line 106, in to_indexds.to_dataframe()
# Expected: a DataFrame with two columns (x and a) and one row
Anything else we need to know?:
I tested a little bit and got what I want with simply removing the
defto_dataframe(...):
...
ifself.ndim==0:
raiseValueError("cannot convert a scalar to a DataFrame")
block from dataarray.py and changing
defto_index(self, ordered_dims: Sequence[Hashable] =None) ->pd.Index:
...
iflen(ordered_dims) ==0:
returnpd.Index([0])
# raise ValueError("no valid index for a 0-dimensional object")
to not raise and instead return a trivial index in coordinates.py.
I that would be considered reasonable behavior I am happy to contribute the respective unit test and changes!
to_dataframe() always returns a DataFrame with an index based on coordinate values. I guess we could return a trivial integer index, but this feels a little weird/non-consistent to me particularly because it breaks round-tripping. On the other hand, it is basically exactly what pandas does.
One compromise might be adding an index=False option to to_dataframe() (e.g., ds.to_dataframe(index=False)), which would always create a trivial index (rather than a MultiIndex) and thus obviously break round-tripping.
I'm not a pandas expert, but maybe one can create a dummy index that enforces the size=1 constraint. E.g. an index which only supports one value (e.g. None or zero). That could potentially be used to fix the round-trip.
Also potentially related: #5202 (also contains discussions about the multiindex/dataset handling)
What happened:
Conversion to a pandas
DataFrame
of a zero dimensionalDataArray
orDataset
fails.What you expected to happen:
I would expect it to return a trivial
DataFrame
with one row and the respective coordinate / data set columns.However, I am not sure if that conflicts with potential other round trips between xarray and pandas - e.g. for one-dimensional 1-sized data arrays.
Minimal Complete Verifiable Example:
Anything else we need to know?:
I tested a little bit and got what I want with simply removing the
block from
dataarray.py
and changingto not raise and instead return a trivial index in
coordinates.py
.I that would be considered reasonable behavior I am happy to contribute the respective unit test and changes!
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.9.5 (default, May 19 2021, 11:32:47) [GCC 10.2.0] python-bits: 64 OS: Linux OS-release: 5.8.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 56.0.0 pip: 21.0.1 conda: None pytest: 6.2.3 IPython: 7.24.0 sphinx: NoneThe text was updated successfully, but these errors were encountered: