-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use getitem_with_mask in reindex_variables #1847
Conversation
@@ -63,7 +63,7 @@ | |||
"netcdf4": [""], | |||
"scipy": [""], | |||
"bottleneck": ["", null], | |||
"dask": ["", null], | |||
"dask": [""], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to run benchmarks separately with/without dask. This just makes the whole suite take twice as long as necessary. We already need to write separate benchmark cases, given that the syntax for using dask is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer - other than failing test on Travis, this seems like a nice cleanup/refactor. Nothing popped out at me as something that needs to changed.
Once the tests are passing, ping me again, as well as @fujiisoup and we'll give it a final review.
|
||
def time_2d_fine_some_missing(self): | ||
self.ds.reindex(x=np.arange(0, 100, 0.5), y=np.arange(0, 100, 0.5), | ||
method='nearest', tolerance=0.1).load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have unit test coverage for this test w/ dask? I see it was failing in the ASV benchmark before the changes in alignment.py
so my guess is we don't have test coverage here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double check but I think the test failure was just ASV timing out. When I run a similar test case before (see my comment on the other PR) it was extremely slow.
It looks that our In [1]: import numpy as np
...: import xarray as xr
...: a = xr.Variable('x', np.array(['a', 'b', 'c']))
In [2]: a._getitem_with_mask([1, 2, -1])
Out[2]:
<xarray.Variable (x: 3)>
array(['b', 'c', 'nan'],
dtype='<U32') nan is converted to string. I guess we need to cast arrays manually in In [9]: dtype, fill_value = xr.core.dtypes.maybe_promote(a.dtype)
...: a.astype(dtype)._getitem_with_mask([1, 2, -1])
Out[9]:
<xarray.Variable (x: 3)>
array(['b', 'c', nan], dtype=object) |
@fujiisoup Indeed, that seems to be the case. I made a fixed version of I believe this is ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, but it already looks nice :)
xarray/core/dtypes.py
Outdated
|
||
|
||
def result_type(*arrays_and_dtypes): | ||
"""Like np.result_type, but number + string -> object (not string). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also take care of bool-float pair
In [1]: import numpy as np
...: import xarray as xr
...:
...: x = xr.DataArray([True, False])
...: x.where(x)
...:
Out[1]:
<xarray.DataArray (dim_0: 2)>
array([ 1., nan])
Dimensions without coordinates: dim_0
I added fixes (bool, float) -> object and (str, bytes) -> object, and also applied these to From a user-facing perspective, this now fixes dtypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@smithsp that link doesn't work. Did you mean to refer to something else? |
@shoyer Thank you for inquiring after the link. The link is to a private repository (although we have never denied someone who requests access.) It may become public some day. We encountered a problem when switching to version 10 or 11 of xarray, because of this fix, as we seemed to be combining strings and unicode in a |
@smithsp OK, I'd be happy to take a look if you're able to track it down into a self-contained example. |
I think that the behavior is consistent with expected upcasting, (so no bug on your side), but we haven't figured out where we are using unicode vs string. |
This is an internal refactor of reindexing/alignment to use
Variable.getitem_with_mask
.As noted back in #1751 (comment), there is a nice improvement for alignment with dask (~100x improvement) but we are slower in several cases with NumPy (2-3x).
ASV results (smaller ratio is better):
Note that
reindexing.ReindexDask.time_2d_fine_some_missing
timed out previously, which I think indicates that it took longer than 60 seconds.git diff upstream/master **/*py | flake8 --diff
(remove if you did not edit any Python files)whats-new.rst
for all changes andapi.rst
for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)