-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExtensionArray Ops discussion #19577
Comments
Migrated from #19520:
I've been planning on moving the Index/Series arithmetic/comparison ops into Array subclasses, so that Most of the comparison ops are ready to make that jump; I can prioritize it if getting that done quickly will help you out. One caveat I'm concerned about is that the existing implementations in the Index subclasses have gotten tangled up with a bunch of unrelated Index machinery. Ideally I'd like these operations to go into self-contained mixin classes that rely on constructor methods, but are independent of slicing/concat/reindex/dropna/... mentioned above. |
@TomAugspurger two quick namespace-gameplan questions. Assuming the arith/comparison methods currently in DatetimeIndexOpsMixin/DTI/TDI/PI get moved into analogous array classes, do you envision these getting a) mixed into the appropriate Index/Block subclasses or b) accessed via composition? If the latter, what name ( Because the datetimelike methods wrap some of the base Index methods, some of those will need to move up too. Where do you envision something like BaseArray living? |
It will be composition (that is what Block is already doing), and for Index it will be the "_values" attribute IIRC |
@TomAugspurger So now I've hit upon this issue of how to deal with the binary operators, and the question is what should these operators return (in the case of the arithmetic operators). Consider the
So if the So there are 3 options as I see it:
I think I'd favor (3), but I could live with (2). I'd prefer to not do (1), as that is a lot of effort that it seems people would have to repeat. Thoughts? |
I think initially we should go for option 1, and make sure that pandas actually dispatches to it. But I don't fully understand your option 3. Can you clarify a bit more? I don't think option 2 is actually an option (IMO it is also independent of deciding where the actual operation is implemented), because, as you mention, the result of an operation does not necessarily need to be of the same type (additional example: substraction of datetimes gives timedelta. That is definitely a case we need to handle) |
@jorisvandenbossche Yes, I see now that option 2 isn't viable. My idea on option 3 is something like this. The subclass implements a The default behavior is to just do an element-by-element operation. If the class of the underlying dtype has implemented the operator, then it gets called automatically, and all is well. |
Split from #19520
How to handle things like
__eq__
,__add__
, etc.Neither the Python 2 or Python 3 object defaults are appropriate, so I think we should make it abstract or provide a default implementation for some / all of these.
We should be able to do a default implementation for the simple case of binop(extension_array, extension_array) by
@jorisvandenbossche raises the point that for some extension arrays, casting to an object ndarray can be expensive. It'd be unfortunate to do the casting if we're just going to raise a
TypeError
when thebinop
is done. In this case the subclass will need to override all those ops (or we could add class attribute like_comparable
and check that before doing__eq__
or__lt__
, etc).How much coercion do we want to attempt? I would vote to stay simple and only attempt a comparison if
Otherwise we return NotImplemented. Subclasses can of course choose more or less aggressive coercion rules.
When boxed in a
Series
orIndex
, I think the goal should be.values
comparsionThe text was updated successfully, but these errors were encountered: