-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add Numpy Array Interface support to Pandas objects #8321
Conversation
f394229
to
c29c051
Compare
|
oops yeah, didn't mean to imply this was done. I actually threw this up real quick cuz I'm not sure the helper func belongs in common.py. afaik common.py shouldn't know about NDFrame, though maybe it's okay because the import happens within the function. I need a guard against everything that implements the array interface, but that we explicitly handle. Right now, that appears to be |
So I found #5698. I think the confusion lies in that I think we're supposed to implement the |
25f0087
to
c8b7eff
Compare
I know the numpy docs say that these methods are for ndarray subclasses, but my experience writing a handful of ndarray-like objects (non-subclasses) is that numpy checks for these attributes (duck typing) instead of checking for actual ndarray subclasses. I would consider this a documentation bug for numpy.
So, roughly speaking, these are two different ndarray interfaces that serve different needs. Are you need/want to be checking for the later? Maybe you could describe what exactly you are trying to do with this? :) |
@shoyer +1 on the numpy docs being buggy! Yeah, looking through the numpy code it looks like checks for array-likeness in one big function through a series of I put the check above our big |
I think you should simply use This would technicaly also allow Python Array classes too the others are for 'c' checks IIRC (never clear on how this works) |
@jreback yeah, for a long time I've only really known I'll look through and see how we're using |
@jreback so I notice that there was an effort to start abstracting away our dependency on |
My 2 cents is that it's not worth much effort to abstract away our dependency on numpy until there are any even remotely plausible alternatives in sight. Until then it's very hard to say even exactly what we will need. |
777a458
to
ea11f59
Compare
TST: test handling Numpy Array Interface, also explicitly test handling rpy2 objects Moved logic to is_array_like, added __array__ check changed pandas check from NDFrame to pandasobject updated with array_like and fixed rpy2 tests
ea11f59
to
db00a07
Compare
Here's a gist with 3 vbench runs. https://gist.github.com/dalejung/cdf7e3f059f0f80f6486 nothing really sticks out to me, it kind of jumps around with different tests showing up as slow. |
---- | ||
Remember that ndarrays and NDFrames are array-like. | ||
""" | ||
# numpy ndarray subclass api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps this function could just be:
array_like_attrs = ['__array__', '__array_interface__', '__array_struct__']
return any(hasattr(obj, attr) for attr in array_like_attrs)
I don't think you need to check the types of the attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's only appropriate to check for 'array' the others are not necessarily and c-level anyhow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is true. PIL exposes its data to numpy via __array_interface__
and does not have a __array__
. rpy2 implements __array_struct__
and not __array__
. Looking around github there are projects that only implement __array_interface__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nobody writes ndarray-like objects to spec :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tell me about it. Problem is, it's hard to detangle should work vs does work. At this point, they are probably the same thing since there's enough code out there depending on current behaviors. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too complicated. isn't this:
return any([ hasattr(obj, attr) for attr in ['__array__','__array_interface__','__array_true'] ])
equivalent / simplier / faster? (this is going to be called a lot, pls show a perf check as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback look at my first comment in this thread!
also, you definitely want to do the generator compression any(...)
rather than the list compression any([...])
(the later would will always do every check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dalejung I think to make things consistent, can you also if you happend to document this as well would be gr8! |
@dalejung can you revisit. I like some of the things going on here, but this devolved a bit. |
closing pls reopen if/when updated |
WIP