Conversion to pandas for zero-dimensional Data(Set|Array) #5598

martinitus · 2021-07-13T10:07:38Z

What happened:
Conversion to a pandas DataFrame of a zero dimensional DataArray or Dataset fails.

What you expected to happen:
I would expect it to return a trivial DataFrame with one row and the respective coordinate / data set columns.
However, I am not sure if that conflicts with potential other round trips between xarray and pandas - e.g. for one-dimensional 1-sized data arrays.

Minimal Complete Verifiable Example:

da = DataArray([1, 2, 3], dims=("x",), coords=dict(x=[1, 2, 3]))
# I don't know of a way to construct such data array without the isel.
# Essentially, below also works for higher dimensional data arrays and
# results in a zero dimensional data array with all the coordinates of
# the found minimum.
da = da.isel(**da.argmin(dim=("x",)))
ds = Dataset({'a': da})
# fails with ValueError: cannot convert a scalar to a DataFrame
# from xarray/core/dataarray.py", line 2664, in to_dataframe
da.to_dataframe(name="foo")
# Expected: a DataFrame with two columns (x and foo) and one row

# fails with ValueError: no valid index for a 0-dimensional object
# from xarray/core/coordinates.py", line 106, in to_index
ds.to_dataframe()
# Expected: a DataFrame with two columns (x and a) and one row

Anything else we need to know?:
I tested a little bit and got what I want with simply removing the

def to_dataframe(...):
    ...
    if self.ndim == 0:
        raise ValueError("cannot convert a scalar to a DataFrame")

block from dataarray.py and changing

def to_index(self, ordered_dims: Sequence[Hashable] = None) -> pd.Index:
   ...
    if len(ordered_dims) == 0:
       return pd.Index([0])
      #  raise ValueError("no valid index for a 0-dimensional object")

to not raise and instead return a trivial index in coordinates.py.

I that would be considered reasonable behavior I am happy to contribute the respective unit test and changes!

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.9.5 (default, May 19 2021, 11:32:47) [GCC 10.2.0] python-bits: 64 OS: Linux OS-release: 5.8.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 56.0.0 pip: 21.0.1 conda: None pytest: 6.2.3 IPython: 7.24.0 sphinx: None

The text was updated successfully, but these errors were encountered:

shoyer · 2021-07-13T17:40:13Z

to_dataframe() always returns a DataFrame with an index based on coordinate values. I guess we could return a trivial integer index, but this feels a little weird/non-consistent to me particularly because it breaks round-tripping. On the other hand, it is basically exactly what pandas does.

One compromise might be adding an index=False option to to_dataframe() (e.g., ds.to_dataframe(index=False)), which would always create a trivial index (rather than a MultiIndex) and thus obviously break round-tripping.

martinitus · 2021-07-14T08:05:37Z

I'm not a pandas expert, but maybe one can create a dummy index that enforces the size=1 constraint. E.g. an index which only supports one value (e.g. None or zero). That could potentially be used to fix the round-trip.

Also potentially related: #5202 (also contains discussions about the multiindex/dataset handling)

dcherian added the enhancement label Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion to pandas for zero-dimensional Data(Set|Array) #5598

Conversion to pandas for zero-dimensional Data(Set|Array) #5598

martinitus commented Jul 13, 2021 •

edited

Loading

shoyer commented Jul 13, 2021

martinitus commented Jul 14, 2021

Conversion to pandas for zero-dimensional Data(Set|Array) #5598

Conversion to pandas for zero-dimensional Data(Set|Array) #5598

Comments

martinitus commented Jul 13, 2021 • edited Loading

shoyer commented Jul 13, 2021

martinitus commented Jul 14, 2021

martinitus commented Jul 13, 2021 •

edited

Loading