combine_by_coordinates to handle unnamed data arrays. #4696

aijams · 2020-12-15T18:02:53Z

Closes combine_by_coords fails with DataArrays #3248
Tests added
Passes isort . && black . && mypy . && flake8
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

…converter to combine_by_coords to check for all DataArray case and convert to datasets.

…rrays and dataset input and with empty list.

…yling tools to match the other docstrings.

…-coords

…xarray into aijams/combine-by-coords

…-coords

dcherian

Thanks for your patience @aijams

Sorry for the delay. I've added a few requests.

dcherian · 2021-02-10T21:37:52Z

xarray/tests/test_combine.py

+
+    def test_combine_coords_empty_list(self):
+        expected = Dataset()
+        actual = combine_by_coords([])


I think this should raise an error

I saw a function (test_empty_input) above test_combine_coords_empty_list that does exactly the same thing. It looks like this function isn't called anywhere, so I think it should be removed, especially if combining an empty list of datasets should raise an error as you suggest.

dcherian · 2021-02-10T21:38:29Z

xarray/tests/test_combine.py

+            DataArray([0, 1], dims=("x"), coords=({"x": [0, 1]})),
+            DataArray([2, 3], dims=("x"), coords=({"x": [2, 3]})),
+        ]
+        expected = Dataset({"_": ("x", [0, 1, 2, 3])}, coords={"x": [0, 1, 2, 3]})


We should return a DataArray in this case.

So the idea is to start with a nested list of DataArrays and end up with a DataArray. This will change the return type from Dataset to (DataArray or Dataset).

I wonder if combine_by_coords should accept a list of both DataArrays and Datasets? In such a case, I believe it should consider the DataArrays as a separate variable to those in the Datasets. The dummy variable would then be merged with the other variables in the resulting Dataset. Otherwise, this function could take either a list of DataArrays or a list of Datasets.

As for combine_nested, it doesn't make sense in my mind to have it take both DataArrays and Datasets at the same time, since it aligns its input into a hypercube according to the nested list structure.

I will have combine_by_coords take the "either or" approach for now. Let me know if you think it should take both types of input simultaneously.

@aijams the case of a list of both DataArrays and Datasets is the same as@dcherian's case (3) in this comment. We have to pass the objects down to merge so that it can raise the correct error.

I think it would be nice if we covered the case {DataArrays}->DataArray, but I'm not sure what the easiest way to do that is. If the dataarrays are unnamed, then the combined result will have some sort of default name, which we could check for and then demote to DataArray before returning. But if the DataArrays are named then at the return step I don't think we will know whether the input was originally a DA or DS before it got promoted?

dcherian · 2021-02-10T21:40:59Z

xarray/core/combine.py

+            return False
+    return True
+
+
 def combine_by_coords(


can we also modify combine_nested so the two are consistent.

I wrote a test test_combine_nested_unnamed_data_arrays that passes a list of unnamed DataArrays into combine_nested and it produces the expected output. Can you clarify what about combine_nested you want to be consistent?

I think I know what @dcherian meant - at first glance it looks like the _combine_single_variable_hypercube refactoring is a change that would also simplify the code in combine_nested. But looking more closely I don't think it actually makes sense to do that, does it? It seems about as neat as it can be as is.

TomNicholas · 2021-03-31T14:04:16Z

xarray/core/combine.py

+    # If a set of unnamed data arrays is provided, these arrays are assumed to belong
+    # to the same variable and should be combined.
+    if _all_unnamed_data_arrays(data_objects):
+        datasets = [Dataset({"_": data_array}) for data_array in data_objects]


Should we be using ._to_temp_dataset here? See @dcherian 's comment here.

TomNicholas · 2021-03-31T14:14:00Z

xarray/tests/test_combine.py

+            DataArray([0, 1], dims=("x"), coords=({"x": [0, 1]})),
+            DataArray([2, 3], dims=("x"), coords=({"x": [2, 3]})),
+        ]
+        expected = Dataset({"_": ("x", [0, 1, 2, 3])}, coords={"x": [0, 1, 2, 3]})


@aijams the case of a list of both DataArrays and Datasets is the same as@dcherian's case (3) in this comment. We have to pass the objects down to merge so that it can raise the correct error.

I think it would be nice if we covered the case {DataArrays}->DataArray, but I'm not sure what the easiest way to do that is. If the dataarrays are unnamed, then the combined result will have some sort of default name, which we could check for and then demote to DataArray before returning. But if the DataArrays are named then at the return step I don't think we will know whether the input was originally a DA or DS before it got promoted?

…-coords-2

…-coords

pep8speaks · 2021-05-04T22:06:59Z

Hello @aijams! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-06-10 16:15:56 UTC

max-sixty

Thanks @aijams !

@dcherian knows this so much better than me, so I'll let him take another look.

FYI the whatsnew may need to be shifted to the current version

max-sixty · 2021-05-10T15:08:27Z

xarray/core/combine.py

@@ -593,8 +652,9 @@ def combine_by_coords(

    Parameters
    ----------
-    datasets : sequence of xarray.Dataset
-        Dataset objects to combine.
+    data_objects : sequence of xarray.Dataset or sequence of xarray.DataArray


Is renaming a breaking change?

It technically is a breaking change - if someone was previously passing combine_by_coords(datasets=...) then this change will break their code. But renaming the argument does make sense with this PR. Not sure whether that means this justifies a deprecation cycle?

I think it probably does. But only a few extra lines and easy to copy from elsewhere.

I have no idea what justifies a deprecation cycle in your project, or how one is performed. Can someone give me some guidance on this, seeing as this change will probably need one according to @max-sixty?
I also agree that renaming the argument makes sense here as data arrays and data sets are distinguished as two different things.

I have no idea what justifies a deprecation cycle in your project, or how one is performed.

No worries! (It's very briefly mentioned in our contributing guide, but maybe we should expand that...)

xarray has loads of regular users, and we don't want them to find that downloading a new version of xarray breaks their perfectly good code, even in some minor way. Therefore we normally hold their hand through any changes by warning them in the version before, or possibly by anticipating the way in which we might need to make sure that their old way of using the functions still works temporarily, to give them time to switch to our new way.

In this case the only people who could be affected are people who are currently passing datasets as a named argument to combine_by_coords, so I think we want to catch that specific possibility and tell them to change the named argument to data_objects in future. So perhaps by adding the argument back in with a temporary check like

import warnings def combine_by_coords(data_objects, ..., datasets=None): # TODO remove after version 0.19, see PR4696 if datasets is not None: warnings.warn("The datasets argument has been renamed to `data_objects`. In future passing a value for datasets will raise an error.") data_objects = datasets

Does that make sense?

That makes sense. I'm somewhat concerned about users who ignore the warning (many software projects I've seen generate a lot of warnings), however I don't think there's much that can be done since the method signature will have to change anyway.

dcherian · 2021-05-10T18:27:11Z

@dcherian knows this so much better than me, so I'll let him take another look.

I defer to @TomNicholas =)

…-coords

TomNicholas · 2021-05-20T23:51:40Z

xarray/core/combine.py

 def combine_by_coords(
-    data_objects,
+    data_objects=[],


Suggested change

data_objects=[],

data_objects,

(It's considered bad practice to have mutable default arguments to functions in python.)

I put this in because if someone calls this method with datasets as a named parameter the data_objects argument would be unspecified and their code would break with an unspecified argument error. This is part of the deprecation warning below.

You make a good point, but that means the default argument should be None, not an empty list, as None is immutable.

TomNicholas · 2021-05-20T23:52:51Z

xarray/tests/test_combine.py

-                ValueError,
-                match=r"Can't combine datasets with unnamed arrays."
-            ):
+            ValueError, match=r"Can't combine datasets with unnamed arrays."


Suggested change

ValueError, match=r"Can't combine datasets with unnamed arrays."

ValueError, match=r"Can't combine datasets with unnamed dataarrays."

Tiny clarification that this means datasets with other xarray.datarrays, not something about the numpy arrays inside the xarray.dataset objects.

Good clarification.

TomNicholas

I think that (aside from two miniscule suggestions) this is ready to merge!

aijams · 2021-05-21T14:43:34Z

Sorry about the closing/reopening. I thought I was supposed to do something to get the code merged in to the official master. I accidentally pressed the wrong button thinking it did something else.

…-coords

aijams · 2021-07-02T20:56:57Z

It looks like this PR has been neglected for a while. I haven't heard from a maintainer whether they will be merging in this PR any time soon. Edit: I just noticed that @TomNicholas plans on adding type hints to the two functions combine_nested and combine_by_coords changed in this PR.

TomNicholas · 2021-07-02T21:34:20Z

Sorry @aijams ! I was following our broad rule of "two maintainers approve before merging", but this has sat here for absolutely ages and had feedback from others previously, so maybe I will just merge it now.

We really need a better system for catching PRs that just sit around :/

github-actions · 2021-07-02T21:58:59Z

Unit Test Results

        6 files ±0         6 suites ±0 47m 2s ⏱️ ±0s
16 173 tests ±0 14 458 ✔️ ±0 1 715 💤 ±0 0 ❌ ±0
90 234 runs ±0 82 130 ✔️ ±0 8 104 💤 ±0 0 ❌ ±0

Results for commit 3d1d134. ± Comparison against base commit 3d1d134.

max-sixty · 2021-07-02T22:53:06Z

Sorry @aijams ! Thank you for the reminder and forgive the delay. We appreciate the contribution (more than our responsiveness suggests!)

doc/whats-new.rst

aijams added 12 commits December 10, 2020 10:42

Added test for combine_by_coords changes.

b35de8e

Modified test case to expect a dataset instead of a DataArray. Added …

f966e76

…converter to combine_by_coords to check for all DataArray case and convert to datasets.

Added tests to check combine_by_coords for exception with mixed DataA…

68b7b49

…rrays and dataset input and with empty list.

Formatting changes after running black

540961f

Added underscore to helper function to label as private.

1c9b4c2

Black formatting changes for whats-new doc file.

cb5ed5e

Removed imports in docstring that were automatically added by code st…

77020c0

…yling tools to match the other docstrings.

Merge branch 'master' into aijams/combine-by-coords

6af896b

Merge remote-tracking branch 'upstream/master' into aijams/combine-by…

f06371a

…-coords

Merge branch 'aijams/combine-by-coords' of https://github.com/aijams/…

7cdeabb

…xarray into aijams/combine-by-coords

Removed duplicate new item line in whats-new.

6190839

Merge remote-tracking branch 'upstream/master' into aijams/combine-by…

3055000

…-coords

dcherian reviewed Feb 10, 2021

View reviewed changes

TomNicholas reviewed Mar 31, 2021

View reviewed changes

aijams added 9 commits April 15, 2021 10:02

combine methods now accept unnamed DataArrays as input.

cbc002f

Merge remote-tracking branch 'upstream/master' into aijams/combine-by…

11a868b

…-coords-2

combine nested test checks nested lists of unnamed DataArrays.

89ac962

Made combine_by_coords more readable.

5f3afa5

Cosmetic changes to code style.

feb90ce

Merging changes from first PR.

db5b906

Merge remote-tracking branch 'upstream/master' into aijams/combine-by…

e884f52

…-coords

Removed extra test from merge with previous PR.

0044bb9

Merge remote-tracking branch 'upstream/master' into aijams/combine-by…

44548ee

…-coords

aijams added 2 commits May 4, 2021 18:44

Updated test to use pytest.raises instead of raises_regex.

5fe8323

Merged latests changes from upstream.

55f53b9

max-sixty reviewed May 10, 2021

View reviewed changes

aijams added 2 commits May 11, 2021 09:51

Added breaking-change entry to whats new page.

805145c

Merged new changes from master branch.

3eed47a

aijams added 9 commits May 12, 2021 07:59

Removed duplicate entries from whats new page.

2c43030

Removed TODO message

f6fae25

Added test for combine_nested.

81ec1ff

Merge remote-tracking branch 'upstream/master' into aijams/combine-by…

caaee74

…-coords

Added check to combine methods to clarify parameter requirements.

637d4cc

Reassigned description of changes to bug fixes category.

b5940a1

Merge remote-tracking branch 'upstream/master' into aijams/combine-by…

d02da23

…-coords

Minor style changes.

04cd5f8

Added blank line for style purposes.

e58a9e2

TomNicholas reviewed May 20, 2021

View reviewed changes

TomNicholas approved these changes May 20, 2021

View reviewed changes

max-sixty added the plan to merge Final call for comments label May 21, 2021

aijams closed this May 21, 2021

aijams reopened this May 21, 2021

Merge remote-tracking branch 'upstream/master' into aijams/combine-by…

c0fc4f1

…-coords

TomNicholas mentioned this pull request Jun 23, 2021

Type hints for combine functions #5519

Merged

2 tasks

TomNicholas merged commit 3d1d134 into pydata:main Jul 2, 2021

TomNicholas mentioned this pull request Jul 8, 2021

Release v0.19? #5588

Closed

8 tasks

dcherian reviewed Jul 8, 2021

View reviewed changes

doc/whats-new.rst Show resolved Hide resolved

TomNicholas added a commit that referenced this pull request Jul 8, 2021

Move summary of #4696 to correct release

bf27e2c

keewis mentioned this pull request Jul 23, 2021

remove deprecations scheduled for 0.19 #5630

Merged

2 tasks

This was referenced Sep 30, 2021

Combine by coords dataarray bugfix #5834

Merged

Combine_by_coords not working on named DataArrays where the data is a Dask Array. #5833

Closed

combine_nested dataarrays #5835

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

combine_by_coordinates to handle unnamed data arrays. #4696

combine_by_coordinates to handle unnamed data arrays. #4696

aijams commented Dec 15, 2020 •

edited

Loading

dcherian left a comment

dcherian Feb 10, 2021

aijams Feb 15, 2021

dcherian Feb 10, 2021

aijams Feb 15, 2021

aijams Feb 19, 2021

TomNicholas Mar 31, 2021 •

edited

Loading

dcherian Feb 10, 2021

aijams Apr 15, 2021

TomNicholas May 11, 2021

TomNicholas Mar 31, 2021

TomNicholas Mar 31, 2021 •

edited

Loading

pep8speaks commented May 4, 2021 •

edited

Loading

max-sixty left a comment

max-sixty May 10, 2021

TomNicholas May 10, 2021

max-sixty May 10, 2021

aijams May 11, 2021

TomNicholas May 11, 2021

aijams May 11, 2021

dcherian commented May 10, 2021

TomNicholas May 20, 2021

aijams May 21, 2021

TomNicholas May 21, 2021

TomNicholas May 20, 2021

aijams May 21, 2021

TomNicholas left a comment

aijams commented May 21, 2021

aijams commented Jul 2, 2021 •

edited

Loading

TomNicholas commented Jul 2, 2021

github-actions bot commented Jul 2, 2021

max-sixty commented Jul 2, 2021

	ValueError, match=r"Can't combine datasets with unnamed arrays."
	ValueError, match=r"Can't combine datasets with unnamed dataarrays."

combine_by_coordinates to handle unnamed data arrays. #4696

combine_by_coordinates to handle unnamed data arrays. #4696

Conversation

aijams commented Dec 15, 2020 • edited Loading

dcherian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomNicholas Mar 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomNicholas Mar 31, 2021 • edited Loading

Choose a reason for hiding this comment

pep8speaks commented May 4, 2021 • edited Loading

Comment last updated at 2021-06-10 16:15:56 UTC

max-sixty left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcherian commented May 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomNicholas left a comment

Choose a reason for hiding this comment

aijams commented May 21, 2021

aijams commented Jul 2, 2021 • edited Loading

TomNicholas commented Jul 2, 2021

github-actions bot commented Jul 2, 2021

Unit Test Results

max-sixty commented Jul 2, 2021

aijams commented Dec 15, 2020 •

edited

Loading

TomNicholas Mar 31, 2021 •

edited

Loading

TomNicholas Mar 31, 2021 •

edited

Loading

pep8speaks commented May 4, 2021 •

edited

Loading

aijams commented Jul 2, 2021 •

edited

Loading