Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine_by_coords fails with DataArrays #3248

Closed
dcherian opened this issue Aug 23, 2019 · 11 comments · Fixed by #4696
Closed

combine_by_coords fails with DataArrays #3248

dcherian opened this issue Aug 23, 2019 · 11 comments · Fixed by #4696

Comments

@dcherian
Copy link
Contributor

MCVE Code Sample

da1 = xr.DataArray([1, 2, 3], dims='x', coords={'x': [0, 1, 2]})
da2 = xr.DataArray([3, 4, 5], dims='x', coords={'x': [2, 3, 4]})
xr.combine_by_coords([da1, da2])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-133-d27216ba5688> in <module>
      1 da1 = xr.DataArray([1, 2, 3], dims='x', coords={'x': [0, 1, 2]})
      2 da2 = xr.DataArray([3, 4, 5], dims='x', coords={'x': [2, 3, 4]})
----> 3 xr.combine_by_coords([da1, da2])

~/work/python/xarray/xarray/core/combine.py in combine_by_coords(datasets, compat, data_vars, coords, fill_value, join)
    619         compat=compat,
    620         fill_value=fill_value,
--> 621         join=join,
    622     )
    623 

~/work/python/xarray/xarray/core/merge.py in merge(objects, compat, join, fill_value)
    588             )
    589 
--> 590         obj = obj.to_dataset() if isinstance(obj, DataArray) else obj
    591         dict_like_objects.append(obj)
    592 

~/work/python/xarray/xarray/core/dataarray.py in to_dataset(self, dim, name)
    478             return self._to_dataset_split(dim)
    479         else:
--> 480             return self._to_dataset_whole(name)
    481 
    482     @property

~/work/python/xarray/xarray/core/dataarray.py in _to_dataset_whole(self, name, shallow_copy)
    426         if name is None:
    427             raise ValueError(
--> 428                 "unable to convert unnamed DataArray to a "
    429                 "Dataset without providing an explicit name"
    430             )

ValueError: unable to convert unnamed DataArray to a Dataset without providing an explicit name

To get what I want, I need
xr.combine_by_coords([da1.to_dataset(name='a'), da2.to_dataset(name='a')]).a

I think the issue is that the code uses to_dataset instead of _to_temp_dataset

@shoyer
Copy link
Member

shoyer commented Aug 24, 2019

+1 we should make this work!

@TomNicholas
Copy link
Member

@dcherian is this a merge problem, or a combine_by_coords problem?

@dcherian
Copy link
Contributor Author

What do you think of:

  1. If combine receives a list of unnamed DataArrays, we convert them to temp datasets and then pass datasets down to merge.

  2. If combine receives a list of named DataArrays, then merge will do the right thing.

  3. If combine receives a list containing both Datasets and unnamed DataArrays then merge will raise the right error.

So really it's just (1) that needs a special case in the combine functions?

@friedrichknuth
Copy link

Some additional information on the topic:

Combining named 1D data arrays works.

da1 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [1, 2, 3])])
da2 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [5, 6, 7])])
xr.combine_by_coords([da1, da2])

<xarray.Dataset>
Dimensions:  (x: 6)
Coordinates:
  * x        (x) int64 1 2 3 5 6 7
Data variables:
    foo      (x) float64 1.443 0.4889 0.9233 0.1946 -1.639 -1.455

However, when combining 2D gridded data...

da1 = xr.DataArray(name='foo', 
                  data=np.random.rand(3,3), 
                  coords=[('x', [1, 2, 3]),
                          ('y', [1, 2, 3])])

da2 = xr.DataArray(name='foo', 
                  data=np.random.rand(3,3), 
                  coords=[('x', [5, 6, 7]),
                          ('y', [5, 6, 7])])

xr.combine_by_coords([da1, da2])

...the method fails, despite passing a data variable name.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-145-77ae89136c1f> in <module>
      9                           ('y', [5, 6, 7])])
     10 
---> 11 xr.combine_by_coords([da1, da2])

~/xarray/xarray/core/combine.py in combine_by_coords(datasets, compat, data_vars, coords, fill_value, join)
    580 
    581     # Group by data vars
--> 582     sorted_datasets = sorted(datasets, key=vars_as_keys)
    583     grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys)
    584 

~/xarray/xarray/core/combine.py in vars_as_keys(ds)
    465 
    466 def vars_as_keys(ds):
--> 467     return tuple(sorted(ds))
    468 
    469 

~/xarray/xarray/core/common.py in __bool__(self)
    119 
    120     def __bool__(self: Any) -> bool:
--> 121         return bool(self.values)
    122 
    123     def __float__(self: Any) -> float:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Again, converting to a dataset bypasses the issue.

ds1 = da1.to_dataset()
ds2 = da2.to_dataset()
xr.combine_by_coords([ds1, ds2])

<xarray.Dataset>
Dimensions:  (x: 6, y: 6)
Coordinates:
  * x        (x) int64 1 2 3 5 6 7
  * y        (y) int64 1 2 3 5 6 7
Data variables:
    foo      (x, y) float64 0.5078 0.8981 0.8707 nan ... 0.4172 0.7259 0.8431

@dcherian
Copy link
Contributor Author

dcherian commented Sep 16, 2019

Thanks @friedrichknuth .

It looks like all the combine functions expect lists of Datasets, not DataArrays (see docstrings).

It should be relatively easily to make this work like merge which can take a list of Datasets or DataArrays. See this code in merge:

xarray/xarray/core/merge.py

Lines 593 to 602 in 756c941

dict_like_objects = list()
for obj in objects:
if not (isinstance(obj, (DataArray, Dataset, dict))):
raise TypeError(
"objects must be an iterable containing only "
"Dataset(s), DataArray(s), and dictionaries."
)
obj = obj.to_dataset() if isinstance(obj, DataArray) else obj
dict_like_objects.append(obj)

to_dataset will fail for unnamed DataArrays, but _to_temp_dataset() will succeed. It seems to me like we want lists of unnamed DataArrays to work (like in the first example), so call _to_temp_dataset when needed. Maybe @shoyer has an idea on how to implement that cleanly.

Can you send in a PR?

@aijams
Copy link
Contributor

aijams commented Nov 30, 2020

Hello,
I'm new to open source development and xarray. I would like to help with this issue, however it appears that nothing has been done for over a year. Is this issue stale or does it still need to be addressed?

@max-sixty
Copy link
Collaborator

Hi @aijams — this issue is still current! There are some notes from @TomNicholas on this PR: #3312. Though it looks like there may be an outstanding question on @shoyer there.

@TomNicholas any thoughts on whether this is ready for a contribution?

@TomNicholas
Copy link
Member

Hi @max-sixty good question, and it would be nice to fix this!

I think that if we go with the scheme suggested by @dcherian for handling this issue then it answers the question I posed in #3312 - the answer being that we care more about combine_by_coords happily accepting two unnamed DataArrays than we do about it being consistent with merge by throwing an error. If we're happy with that then I think it shouldn't be too difficult to finish off the PR in #3312?

(I think that this bug mentioned in #3312 is separate, and possibly deserves its own small issue.)

@aijams
Copy link
Contributor

aijams commented Dec 4, 2020

My comprehension of the combine_by_coords function is as follows:
Given a set of DataArrays (possibly obtained by splitting apart data sets), group them by name and paste the arrays from each group onto an array of nans with the same dimension set as each member of that group.
The result for each group will be a DataArray with an entry for each element of the Cartesian product of that group's coordinates.

I'm assuming the main issue here is to have combine_by_coords put a set of unnamed DataArrays into a single group and paste them together.

@TomNicholas, does this seem accurate to you?

@TomNicholas
Copy link
Member

@aijams while that isn't quite how I think about it, I think that is an accurate description of what combine_by_coords does 😊

I'm assuming the main issue here is to have combine_by_coords put a set of unnamed DataArrays into a single group and paste them together.

Yes - which involves making the assumption that these different unnamed DataArrays are actually the same physical variable, and so can be meaningfully combined into a group.

@aijams
Copy link
Contributor

aijams commented Feb 10, 2021

It looks like this issue has been neglected for a while.
@TomNicholas, can you take a look at this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants