Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typing of GroupBy & Co. #6702

Merged
merged 38 commits into from
Jun 29, 2022
Merged

Typing of GroupBy & Co. #6702

merged 38 commits into from
Jun 29, 2022

Conversation

headtr1ck
Copy link
Collaborator

@headtr1ck headtr1ck commented Jun 17, 2022

This PR adds typing support for groupby, coarsen, rolling, weighted and resample.

There are several open points:

  1. Coarsen is missing type annotations for reductions like max, they get added dynamically.
  2. The Groupby group-key type is quite wide. Does anyone have any idea on how to type it correctly? For now it is still Any.
  3. Several function signatures were inconsistent between the DataArray and Dataset versions (looking at you: map). I took the freedom to align them (required for mypy), hopefully this does not break too much.
  4. I was moving the generation functions from DataWithCoords to DataArray and Dataset, which adds some copy-paste of code (I tried to keep it minimal) but was unavoidable for typing support. (Adds the bonus that the corresponding modules are now only imported when required).

@max-sixty
Copy link
Collaborator

This looks v impressive!

I am traveling for the weekend so won't be able to look at it fully until next week — if someone else could take a glance, then great, otherwise we can merge given @headtr1ck 's track record and I can review next week.

Or if no one gets to this then I will set a reminder for next week.

Copy link
Contributor

@Illviljan Illviljan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @headtr1ck ! I've done a first pass on this and have some minor thoughts.
I'm hesitant on adding #ignores without a comment though. They are easily accepted as truths and forgotten about and sometimes hides real issues.

@dcherian dcherian requested a review from shoyer June 22, 2022 15:46
Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple minor clean-up suggestions, but generally this looks great to me! I can already see several latent bugs that adding type checking fixed.

Thank you @headtr1ck !

Comment on lines 113 to 119
if not isinstance(self._dim, str):
raise NotImplementedError(
"Only str dimensions supported by now. Please raise an issue on Github."
)
kwargs[self._dim] = upsampled_index
kwargs["method"] = method
return self._obj.reindex(*args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove *args above, this can just be:

Suggested change
if not isinstance(self._dim, str):
raise NotImplementedError(
"Only str dimensions supported by now. Please raise an issue on Github."
)
kwargs[self._dim] = upsampled_index
kwargs["method"] = method
return self._obj.reindex(*args, **kwargs)
kwargs["method"] = method
return self._obj.reindex({self._dim: upsampled_index}, **kwargs)

Which I believe should pass type checking, even for non-string dimension names.

For methods like reindex with a signature like reindex(indexers=None, method=None, tolerance=None, copy=True, fill_value=<NA>, **indexers_kwargs), we generally try to avoid using **indexers_kwargs internally inside Xarray in favor of indexers, for exactly this sort of reason.

Copy link
Collaborator Author

@headtr1ck headtr1ck Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now getting rid of the _upsample method all together, I found that way cleaner.
If you are ok with this change, feel free to resolve this conversation, otherwise I can revert this commit.

Copy link
Contributor

@Illviljan Illviljan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now. :)

@Illviljan Illviljan added the plan to merge Final call for comments label Jun 26, 2022
@andersy005
Copy link
Member

Thank you, @headtr1ck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants