Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot plot multiindexed (stacked) coordinate as hue variable #4562

Open
aspitarl opened this issue Nov 3, 2020 · 7 comments
Open

Cannot plot multiindexed (stacked) coordinate as hue variable #4562

aspitarl opened this issue Nov 3, 2020 · 7 comments

Comments

@aspitarl
Copy link

aspitarl commented Nov 3, 2020

import numpy as np
import pandas as pd
import xarray as xr

data = np.random.rand(50,5)

x_idx = np.linspace(0, 50)

mi_idx1 = ['a','b','c','d','e']
mi_idx2 = [1,2,3,4,5]

mi = pd.MultiIndex.from_arrays([mi_idx1,mi_idx2], names=['mi_idx1', 'mi_idx2'])

coords = {
    'x': x_idx,
    'mi': mi
}

da = xr.DataArray(data, coords=coords, dims = ['x', 'mi'])

da.plot(hue='mi')

It appears since version 0.16.0, that plotting with a multindex coordinate as a hue dimension no longer works. In version 0.15.1 I get a plot like this:

xarray_issue

However, when upgrading to 0.16.0 or 0.16.1 I get the following traceback:


ValueError                                Traceback (most recent call last)
<ipython-input-1-2de8705a6ec5> in <module>
     19 da = xr.DataArray(data, coords=coords, dims = ['x', 'mi'])
     20 
---> 21 da.plot(hue='mi')

~\anaconda3\envs\datanalysis\lib\site-packages\xarray\plot\plot.py in __call__(self, **kwargs)
    444 
    445     def __call__(self, **kwargs):
--> 446         return plot(self._da, **kwargs)
    447 
    448     # we can't use functools.wraps here since that also modifies the name / qualname

~\anaconda3\envs\datanalysis\lib\site-packages\xarray\plot\plot.py in plot(darray, row, col, col_wrap, ax, hue, rtol, subplot_kws, **kwargs)
    198     kwargs["ax"] = ax
    199 
--> 200     return plotfunc(darray, **kwargs)
    201 
    202 

~\anaconda3\envs\datanalysis\lib\site-packages\xarray\plot\plot.py in line(darray, row, col, figsize, aspect, size, ax, hue, x, y, xincrease, yincrease, xscale, yscale, xticks, yticks, xlim, ylim, add_legend, _labels, *args, **kwargs)
    293 
    294     ax = get_axis(figsize, size, aspect, ax)
--> 295     xplt, yplt, hueplt, xlabel, ylabel, hue_label = _infer_line_data(darray, x, y, hue)
    296 
    297     # Remove pd.Intervals if contained in xplt.values and/or yplt.values.

~\anaconda3\envs\datanalysis\lib\site-packages\xarray\plot\plot.py in _infer_line_data(darray, x, y, hue)
     66 
     67         if y is None:
---> 68             xname, huename = _infer_xy_labels(darray=darray, x=x, y=hue)
     69             xplt = darray[xname]
     70             if xplt.ndim > 1:

~\anaconda3\envs\datanalysis\lib\site-packages\xarray\plot\utils.py in _infer_xy_labels(darray, x, y, imshow, rgb)
    378         y, x = darray.dims
    379     elif x is None:
--> 380         _assert_valid_xy(darray, y, "y")
    381         x = darray.dims[0] if y == darray.dims[1] else darray.dims[1]
    382     elif y is None:

~\anaconda3\envs\datanalysis\lib\site-packages\xarray\plot\utils.py in _assert_valid_xy(darray, xy, name)
    410     if xy not in valid_xy:
    411         valid_xy_str = "', '".join(sorted(valid_xy))
--> 412         raise ValueError(f"{name} must be one of None, '{valid_xy_str}'")
    413 
    414 

ValueError: y must be one of None, 'mi_idx1', 'mi_idx2', 'x'

Setting x='x' does not fix the problem

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: English_United States.1252
libhdf5: 1.10.4
libnetcdf: 4.7.3

xarray: 0.16.0
pandas: 1.0.5
numpy: 1.18.5
scipy: 1.5.0
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.20.0
distributed: 2.20.0
matplotlib: 3.2.2
cartopy: None
seaborn: 0.10.1
numbagg: None
pint: 0.16.1
setuptools: 49.2.0.post20200714
pip: 20.1.1
conda: None
pytest: 5.4.3
IPython: 7.16.1
sphinx: 3.1.2

@mathause
Copy link
Collaborator

mathause commented Nov 3, 2020

That might be my doing here: #3938 It makes sense to allow a MultiIndex as a label but probably not as a coordinate. So this may need some thinking.

@aspitarl
Copy link
Author

aspitarl commented Nov 3, 2020

It makes sense to allow a MultiIndex as a label but probably not as a coordinate

To elaborate a little more on my particular use case, as it might give insight or an alternative solution: I often have time data taken under different experimental parameters, which are my coordinates. However, often the coordinate matrix is very sparse, meaning that my coordinate matrix might be 5x5x5, but I only have 10 data points or so somewhat randomly sampling this space. So being able to see all my 'test cases' with respect to hue/col etc is very useful to quickly examine the data and coordinate combinations, which helps once I want to unstack the array deal with all of the empty parameter space.

@mathause
Copy link
Collaborator

mathause commented Nov 4, 2020

Yes seems a sems like a sensible approach. Maybe you can use hue="mi_idx1" and set the label manually as a workaround. (Can't test right now)

@aspitarl
Copy link
Author

aspitarl commented Nov 16, 2020

I tested this (hue="mi_idx1") and it does not work.

I get the following error on 0.15.1:

ValueError: y must be a dimension name if x is not supplied

and on 0.16.1:

ValueError: ('mi', 'mi') must be a permuted list of ('x', 'mi'), unless ... is included

(also, sorry, I accidentally clicked close and comment)

@aspitarl aspitarl reopened this Nov 16, 2020
@mathause
Copy link
Collaborator

You have to specify x as well: da.plot(hue='mi_idx1', x="x"). But yes, this is not ideal - would be nice to get this working with hue="mi".

@benbovy
Copy link
Member

benbovy commented Sep 28, 2021

Note that we're planning to depreciate multi-index dimension (tuple) coordinates and keep only the multi-index levels as coordinates. The main reason is that this will better fit within Xarray's (forthcoming) updated data model with explicit indexes (so far we kept those tuple coordinates to make multi-index work with the concept of a "dimension" index coordinate, but this will no longer be necessary).

I see how convenient it is here to use those tuple coordinates as hue (legend) labels to show the value combinations, but there must be other ways to do that. For example, you could simply assign a new coordinate to the DataArray with tuple values (i.e., mi.values) without attaching any index to it.

@aspitarl
Copy link
Author

I tested the method of adding a new level with mi.values, which worked, though x='x' must still be specified:

import numpy as np
import pandas as pd
import xarray as xr

data = np.random.rand(50,5)

x_idx = np.linspace(0, 50)

mi_idx1 = ['a','b','c','d','e']
mi_idx2 = [1,2,3,4,5]

mi = pd.MultiIndex.from_arrays([mi_idx1,mi_idx2], names=['mi_idx1', 'mi_idx2'])

coords = {
    'x': x_idx,
    'mi': mi
}

da = xr.DataArray(data, coords=coords, dims = ['x', 'mi'])

da = da.assign_coords(mi_plot = ('mi', da.indexes['mi'].values)) # add a level for plot display

da.plot(hue='mi_plot', x='x')

It seems like this potentially happen behind the scenes, but I'm not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants