-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: progress toward making groupby work with multiple arguments #924
Conversation
It definitely doesn't work properly yet, totally mixing up coordinates, data variables and multi-indexes.
@@ -131,7 +133,7 @@ class GroupBy(object): | |||
DataArray.groupby | |||
""" | |||
def __init__(self, obj, group, squeeze=False, grouper=None, bins=None, | |||
cut_kwargs={}): | |||
cut_kwargs={}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😳 these PEP8 violations are from my PR. Sorry! I have since started linting...
This looks like a really useful addition. It would be useful for me to have an example of how this is supposed to work. The tests are a starting point, but perhaps kind of trivial cases. If I create the following 2D dataset: ds = xr.Dataset({'foo': (['x','y'], np.random.rand(2,4))}) and then do ds.groupby(['x','y']).sum() What should the coordinates of the output be? Every unique combination of |
@rabernat I updated the top post with examples. So yes, for your example, the coordinates of the output would have every unique combination of Once we figure out squeezing out grouped/stacked dimensions (not quite working yet), this will let us write things like |
Hi, is there any active work on that feature? It would be really cool to have it. |
@RafalSkolasinski This pull request was mostly working, but still needs some significant work to clean it up and update it to the current version of the codebase. I don't think anyone is working on it currently (I'm not) but I'm sure someone will get to it eventually. |
@shoyer I am considering contributing to this feature. Could you give me more details what needs to be done? |
@RafalSkolasinski Sure, here is the current list:
|
@RafalSkolasinski and @shoyer, can I please get an update on this PR? This is something we need sometime soon too (cc @milenaveneziani). |
I don't have any progress to report since my last comment. |
@pwolfram Unfortunately nothing from my side yet. |
This functionality is quite important to us as well. How might we help? Should I simply fork |
@chunweiyuan - yes, you are welcome to give this a shot. |
Just to refresh here-- what needs done to finish this off? |
@shoyer, it looks like your list above is the place to start from your branch, correct? |
Thanks @shoyer, I find this feature extremely useful as I keep running into use cases where I can use it. Thanks for the update, given changes to xarray it sounds like the prudent course of action is as you outline. Thanks again for the quick reply! |
Closes pydata#924 Closes pydata#1056 Closes pydata#9332 xref pydata#324
Fixes #324
It definitely doesn't work properly yet, totally mixing up coordinates,
data variables and multi-indexes (as shown by the failing tests).
A simple example:
More examples:
https://gist.github.com/shoyer/5cfa4d5751e8a78a14af25f8442ad8d5