Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support ExtensionArray operators via a mixin #21261

Merged
merged 19 commits into from
Jun 29, 2018
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 50 additions & 4 deletions doc/source/extending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Extension Types

.. warning::

The :class:`pandas.api.extension.ExtensionDtype` and :class:`pandas.api.extension.ExtensionArray` APIs are new and
The :class:`pandas.api.extensions.ExtensionDtype` and :class:`pandas.api.extensions.ExtensionArray` APIs are new and
experimental. They may change between versions without warning.

Pandas defines an interface for implementing data types and arrays that *extend*
Expand All @@ -79,10 +79,10 @@ on :ref:`ecosystem.extensions`.

The interface consists of two classes.

:class:`~pandas.api.extension.ExtensionDtype`
:class:`~pandas.api.extensions.ExtensionDtype`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A :class:`pandas.api.extension.ExtensionDtype` is similar to a ``numpy.dtype`` object. It describes the
A :class:`pandas.api.extensions.ExtensionDtype` is similar to a ``numpy.dtype`` object. It describes the
data type. Implementors are responsible for a few unique items like the name.

One particularly important item is the ``type`` property. This should be the
Expand All @@ -91,7 +91,7 @@ extension array for IP Address data, this might be ``ipaddress.IPv4Address``.

See the `extension dtype source`_ for interface definition.

:class:`~pandas.api.extension.ExtensionArray`
:class:`~pandas.api.extensions.ExtensionArray`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This class provides all the array-like functionality. ExtensionArrays are
Expand All @@ -113,6 +113,52 @@ by some other storage type, like Python lists.
See the `extension array source`_ for the interface definition. The docstrings
and comments contain guidance for properly implementing the interface.

.. _extending.extension.operator:

:class:`~pandas.api.extensions.ExtensionArray` Operator Support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a ref

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, there are no operators defined for the class :class:`~pandas.api.extensions.ExtensionArray`.
There are two approaches for providing operator support for your ExtensionArray:

1. Define each of the operators on your ExtensionArray subclass.
2. Use an operator implementation from pandas that depends on operators that are already defined
on the underlying elements (scalars) of the ExtensionArray.

For the first approach, you define selected operators, e.g., ``_add__``, ``__le__``, etc. that
you want your ExtensionArray subclass to support.

The second approach assumes that the underlying elements (i.e., scalar type) of the ExtensionArray
have the individual operators already defined. In other words, if your ExtensionArray
named ``MyExtensionArray`` is implemented so that each element is an instance
of the class ``MyExtensionElement``, then if the operators are defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would somewhere use "scalar type" here, as that is the terminology we use above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

for ``MyExtensionElement``, the second approach will automatically
define the operators for ``MyExtensionArray``.

A mixin class, :class:`~pandas.api.extensions.ExtensionScalarOpsMixin` supports this second
approach. If developing an ``ExtensionArray`` subclass, for example ``MyExtensionArray``,
simply include ``ExtensionScalarOpsMixin`` as a parent class of ``MyExtensionArray``
and then call the methods :meth:`~MyExtensionArray._add_arithmetic_ops` and/or
:meth:`~MyExtensionArray._add_comparison_ops` to hook the operators into
your ``MyExtensionArray`` class, as follows:

.. code-block:: python

class MyExtensionArray(ExtensionArray, ExtensionScalarOpsMixin):
pass

MyExtensionArray._add_arithmetic_ops()
MyExtensionArray._add_comparison_ops()

Note that since ``pandas`` automatically calls the underlying operator on each
element one-by-one, this might not be as performant as implementing your own
version of the associated operators directly on the ExtensionArray.

.. _extending.extension.testing:

Testing Extension Arrays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a ref to this section too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

^^^^^^^^^^^^^^^^^^^^^^^^

We provide a test suite for ensuring that your extension arrays satisfy the expected
behavior. To use the test suite, you must provide several pytest fixtures and inherit
from the base test class. The required fixtures are found in
Expand Down
39 changes: 39 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,45 @@ New features

.. _whatsnew_0240.enhancements.other:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this label belongs to the title below your added content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ref here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.. _whatsnew_0240.enhancements.extension_array_operators

``ExtensionArray`` operator support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A ``Series`` based on ``ExtensionArray`` now supports arithmetic and comparison
operators. There are two approaches for providing operator support for an ExtensionArray:

1. Define each of the operators on your ExtensionArray subclass.
2. Use an operator implementation from pandas that depends on operators that are already defined
on the underlying elements (scalars) of the ExtensionArray.

To use the first approach where you define your own implementation of the operators,
you define each operator such as `__add__`, __le__`, etc. on your ExtensionArray
subclass.

For the second approach, which is appropriate if your ExtensionArray contains
elements that already have the operators
defined on a per-element basis, pandas provides a mixin,
:class:`ExtensionScalarOpsMixin` that you can use to
define the operators on your ExtensionArray subclass.
If developing an ``ExtensionArray`` subclass, for example ``MyExtensionArray``,
simply include ``ExtensionScalarOpsMixin`` as a parent class of ``MyExtensionArray``
and then call the methods :meth:`~MyExtensionArray._add_arithmetic_ops` and/or
:meth:`~MyExtensionArray._add_comparison_ops` to hook the operators into
your ``MyExtensionArray`` class, as follows:

.. code-block:: python

class MyExtensionArray(ExtensionArray, ExtensionScalarOpsMixin):
pass

MyExtensionArray._add_arithmetic_ops()
MyExtensionArray._add_comparison_ops()

See the :ref:`ExtensionArray Operator Support
<extending.extension.operator>` documentation section for details on both
ways of adding operator support.

Other Enhancements
^^^^^^^^^^^^^^^^^^
- :func:`to_datetime` now supports the ``%Z`` and ``%z`` directive when passed into ``format`` (:issue:`13486`)
Expand Down
3 changes: 2 additions & 1 deletion pandas/api/extensions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@
register_index_accessor,
register_series_accessor)
from pandas.core.algorithms import take # noqa
from pandas.core.arrays.base import ExtensionArray # noqa
from pandas.core.arrays.base import (ExtensionArray, # noqa
ExtensionScalarOpsMixin)
from pandas.core.dtypes.dtypes import ExtensionDtype # noqa
12 changes: 11 additions & 1 deletion pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,8 @@ def observed(request):
'__mul__', '__rmul__',
'__floordiv__', '__rfloordiv__',
'__truediv__', '__rtruediv__',
'__pow__', '__rpow__']
'__pow__', '__rpow__',
'__mod__', '__rmod__']
if not PY3:
_all_arithmetic_operators.extend(['__div__', '__rdiv__'])

Expand All @@ -96,6 +97,15 @@ def all_arithmetic_operators(request):
return request.param


@pytest.fixture(params=['__eq__', '__ne__', '__le__',
'__lt__', '__ge__', '__gt__'])
def all_compare_operators(request):
"""
Fixture for dunder names for common compare operations
"""
return request.param


@pytest.fixture(params=[None, 'gzip', 'bz2', 'zip',
pytest.param('xz', marks=td.skip_if_no_lzma)])
def compression(request):
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/arrays/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from .base import ExtensionArray # noqa
from .base import (ExtensionArray, # noqa
ExtensionScalarOpsMixin)
from .categorical import Categorical # noqa
127 changes: 127 additions & 0 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,13 @@
"""
import numpy as np

import operator

from pandas.errors import AbstractMethodError
from pandas.compat.numpy import function as nv
from pandas.compat import set_function_name, PY3
from pandas.core.dtypes.common import is_list_like
from pandas.core import ops

_not_implemented_message = "{} does not implement {}."

Expand Down Expand Up @@ -610,3 +615,125 @@ def _ndarray_values(self):
used for interacting with our indexers.
"""
return np.array(self)


class ExtensionOpsMixin(object):
"""
A base class for linking the operators to their dunder names
"""
@classmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line above this for PEP8?

Copy link
Contributor Author

@Dr-Irv Dr-Irv Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passed the PEP8 tests....... But I will add it in.

def _add_arithmetic_ops(cls):
cls.__add__ = cls._create_arithmetic_method(operator.add)
cls.__radd__ = cls._create_arithmetic_method(ops.radd)
cls.__sub__ = cls._create_arithmetic_method(operator.sub)
cls.__rsub__ = cls._create_arithmetic_method(ops.rsub)
cls.__mul__ = cls._create_arithmetic_method(operator.mul)
cls.__rmul__ = cls._create_arithmetic_method(ops.rmul)
cls.__pow__ = cls._create_arithmetic_method(operator.pow)
cls.__rpow__ = cls._create_arithmetic_method(ops.rpow)
cls.__mod__ = cls._create_arithmetic_method(operator.mod)
cls.__rmod__ = cls._create_arithmetic_method(ops.rmod)
cls.__floordiv__ = cls._create_arithmetic_method(operator.floordiv)
cls.__rfloordiv__ = cls._create_arithmetic_method(ops.rfloordiv)
cls.__truediv__ = cls._create_arithmetic_method(operator.truediv)
cls.__rtruediv__ = cls._create_arithmetic_method(ops.rtruediv)
if not PY3:
cls.__div__ = cls._create_arithmetic_method(operator.div)
cls.__rdiv__ = cls._create_arithmetic_method(ops.rdiv)

cls.__divmod__ = cls._create_arithmetic_method(divmod)
cls.__rdivmod__ = cls._create_arithmetic_method(ops.rdivmod)

@classmethod
def _add_comparison_ops(cls):
cls.__eq__ = cls._create_comparison_method(operator.eq)
cls.__ne__ = cls._create_comparison_method(operator.ne)
cls.__lt__ = cls._create_comparison_method(operator.lt)
cls.__gt__ = cls._create_comparison_method(operator.gt)
cls.__le__ = cls._create_comparison_method(operator.le)
cls.__ge__ = cls._create_comparison_method(operator.ge)


class ExtensionScalarOpsMixin(ExtensionOpsMixin):
"""A mixin for defining the arithmetic and logical operations on
an ExtensionArray class, where it is assumed that the underlying objects
have the operators already defined.

Usage
------
If you have defined a subclass MyExtensionArray(ExtensionArray), then
use MyExtensionArray(ExtensionArray, ExtensionScalarOpsMixin) to
get the arithmetic operators. After the definition of MyExtensionArray,
insert the lines

MyExtensionArray._add_arithmetic_ops()
MyExtensionArray._add_comparison_ops()

to link the operators to your class.
"""

@classmethod
def _create_method(cls, op, coerce_to_dtype=True):
"""
A class method that returns a method that will correspond to an
operator for an ExtensionArray subclass, by dispatching to the
relevant operator defined on the individual elements of the
ExtensionArray.

Parameters
----------
op: function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after "op" and before the colon (numpy docstring standard, long version: http://pandas.pydata.org/pandas-docs/stable/contributing_docstring.html)

and the same below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

An operator that takes arguments op(a, b)
coerce_to_dtype: bool
boolean indicating whether to attempt to convert
the result to the underlying ExtensionArray dtype
(default True)

Returns
-------
A method that can be bound to a method of a class

Example
-------
Given an ExtensionArray subclass called MyExtensionArray, use

>>> __add__ = cls._create_method(operator.add)

in the class definition of MyExtensionArray to create the operator
for addition, that will be based on the operator implementation
of the underlying elements of the ExtensionArray

"""

def _binop(self, other):
def convert_values(param):
if isinstance(param, ExtensionArray) or is_list_like(param):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this is_list_like(param) more strict as is_array_like ?

For example, if you create ExtensionArray of sets, and do an operation where the right value is a single set, the current code will not work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or similar with dicts (for which we already have a dummy implementation in the tests)

(I am also fine with leaving this for later, as there are other places where we have problems with iterable scalar elements)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I think I want to leave it for now. Because you'd like to be able to do an operation such as EABackedSeries + list(objects) and using is_array_type means you have to have a dtype.

ovalues = param
else: # Assume its an object
ovalues = [param] * len(self)
return ovalues
lvalues = self
rvalues = convert_values(other)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably do alignment as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to, for a few reasons:

  • The _binop method from here is a method on ExtensionArray (i.e., isinstance(self, ExtensionArray) == True
  • This method is called by dispatch_to_extension_ops in ops.py, which is called after alignment has already been done in _arith_method_SERIES and alignment is ensured in _comp_method_SERIES

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, this is also tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Yes, this is tested because I am using the same tests that are used for general operators on Series, which tests things that are misaligned, etc.

See pandas/tests/extension/tests/decimal/test_decimal.py that leverages the TestSeriesOperators.test_operators method from pandas\tests\series\test_operators .

As an aside, doing it this way uncovered the issues with .get and .combine that I submitted.


# If the operator is not defined for the underlying objects,
# a TypeError should be raised
res = [op(a, b) for (a, b) in zip(lvalues, rvalues)]

if coerce_to_dtype:
try:
res = self._from_sequence(res)
except TypeError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this fails, I think we should still convert it to an array instead of keeping it as a list?

Or does that happen on another level?
(anyway, also for usability with the ExtensionArray itself, I think coercing it to an array makes more sense)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we convert to an array, then we could have a dtype problem. This allows the result to be of any type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we convert to an array, then we could have a dtype problem

It will be converted to an array anyhow, if not here, then at the level above when the res is passed to the series constructor. So personally I would already do the conversion here (we can use the same logic / code as what is done in the series constructor to coerce a passed list)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche But why repeat that logic? If we leave it as a list, then the Series constructor will do the inference on the dtype.


return res

op_name = ops._get_op_name(op, True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use a parameter name instead of the positional argument ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that, but not specifying the parameter is consistent with all the other usages of _get_op_name, so I think I should be consistent with code that is elsewhere.

return set_function_name(_binop, op_name, cls)

@classmethod
def _create_arithmetic_method(cls, op):
return cls._create_method(op)

@classmethod
def _create_comparison_method(cls, op):
return cls._create_method(op, False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, coerce_to_dtype=False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

31 changes: 31 additions & 0 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
is_bool_dtype,
is_list_like,
is_scalar,
is_extension_array_dtype,
_ensure_object)
from pandas.core.dtypes.cast import (
maybe_upcast_putmask, find_common_type,
Expand Down Expand Up @@ -993,6 +994,26 @@ def _construct_divmod_result(left, result, index, name, dtype):
)


def dispatch_to_extension_op(op, left, right):
"""
Assume that left or right is a Series backed by an ExtensionArray,
apply the operator defined by op.
"""

# The op calls will raise TypeError if the op is not defined
# on the ExtensionArray
if is_extension_array_dtype(left):
res_values = op(left.values, right)
else:
# We know that left is not ExtensionArray and is Series and right is
# ExtensionArray. Want to force ExtensionArray op to get called
res_values = op(list(left.values), right.values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this needed to fix failing tests? (this was not here in a previous version I think? and eg the dispatch to index is only dealing with the left case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added a few commits ago. I discovered that the tests were not properly testing the reverse operators. So there were a couple of changes related to that:

  1. Changed pandas/tests/extension/base/ops.py:BaseOpsUtil to properly test the reverse operator.
  2. Changed the tests in pandas/core/ops.py:_arith_method_SERIES.wrapper() pandas/core/ops.py:_comp_method_SERIES.wrapper() to check if the "right" argument is an extension array
  3. The change you quoted above to detect when the reverse operator is called.

Without these changes, an operator such as pd.Series([1,2,3]) + pd.Series(ExtensionArray(data)) would return an object dtype rather than the ExtensionArray dtype.

As best as I can tell, dispatch_to_index_op is only used on datetime, timedelta and Categorical objects. In those cases, if the index is on the right and some other type is on the left, the other type returns NotImplemented, which causes python to then call the reverse operator on the right argument. With extension types, I think we should allow a Series of any object to be on the left, and then have the extension operator figure out if the operation is valid.

Another possible implementation might be to add a test in the wrappers that says that if is_extension_dtype(right) return NotImplemented, which will then make python call the reverse operator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible implementation might be to add a test in the wrappers that says that if is_extension_dtype(right) return NotImplemented

This is much closer to what we've had in mind with the recent refactoring in ops. Would this look something like splitting up the conditions on 1082-1085? We might need to up-cast right to be a Series in order for the deference chain to go through; I'll need to double-check.

Why is the list(left.values) necessary? I would have thought left.values would get the job done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel Wrote:

Another possible implementation might be to add a test in the wrappers that says that if is_extension_dtype(right) return NotImplemented

This is much closer to what we've had in mind with the recent refactoring in ops. Would this look something like splitting up the conditions on 1082-1085? We might need to up-cast right to be a Series in order for the deference chain to go through; I'll need to double-check.

Yes, I think it would be something like:

       elif is_extension_array_dtype(left):
           return dispatch_to_extension_op(op, left, right)
       elif (is_extension_array_dtype(right) and                 
              not is_categorical_dtype(right)):
           return NotImplemented

But I would have to test the concept. I don't think the upcast is necessary. We know that left is a Series. So python would then see that op(left, right) returns NotImplemented, and then try to call reverse_op(right, left), so that the reverse operator is called, but using the implementation of right.

I'm not going to try making that change without feedback from others.

Why is the list(left.values) necessary? I would have thought left.values would get the job done.

Because left.values could be any type (numpy.ndarray, pd.Categorical, etc.), and we don't want the version of op to be called that is associated with the class of left.values . So by making it a list, we are forcing the reverse operator to be called on right.values, which is the EA implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible implementation might be to add a test in the wrappers that says that if is_extension_dtype(right) return NotImplemented, which will then make python call the reverse operator.

I think this is the better way to go.

For me it is fine to not yet do this in this PR, but then I would also not include the above change (and live with that the first iteration in this PR is not yet ideal for cases where the ExtensionArray is the right variable)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I make that change (using NotImplemented)?

I would at least try this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Yes, I will try this in a new PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche (and @jbrockmendel) I tried the NotImplemented idea and it doesn't work. I think the reason is as follows. Let's say that we have one Series called s1 that has a normal (int or double) dtype. And s2 has a dtype of the EA. Let's say you write the expression s1 + s2. Python sees that you are adding two Series objects. So internally, if we return NotImplemented because s2 is an EA, then Python will say "Well, you don't know how to add a Series to a Series, so I'm not going to bother calling the reverse operator." In other words, the reverse operator is only called when the two types passed to the operator are different, and NotImplemented is returned when applying the operator to the first argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure I follow. Are you saying that both s1.__add__(s2) and s2.__add__(s1) return NotImplemented?

The dispatch logic should do something like

out = s1.values + s2
return Series(out, name=...)

It is s1.values.__add__ that should return NotImplemented in this case. Is this not an option for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I've tried two different alternatives (regarding replacing this code

else:
        # We know that left is not ExtensionArray and is Series and right is
        # ExtensionArray.  Want to force ExtensionArray op to get called
        res_values = op(list(left.values), right.values)

that this discussion refers to). Note the section of code is hit when left.values is not an EA, but right.values is an EA.

  1. Use
else:
    return NotImplemented
  1. Use
else:
    res_values = op(left.values, right.values)

In the first case, python sees that s1.__add__(s2) is NotImplemented, and because s1 and s2 are both of type Series, then s2.__radd__(s1) is never called.

In the second case, left.values is a numpy array, so then the numpy __add__ method is called, and it is perfectly happy to return an array of dtype that is set to object rather than the dtype of the EA. numpy ends up calling the __add__ method for each of the elements.


res_name = get_op_result_name(left, right)
return left._constructor(res_values, index=left.index,
name=res_name)


def _arith_method_SERIES(cls, op, special):
"""
Wrapper function for Series arithmetic operations, to avoid
Expand Down Expand Up @@ -1061,6 +1082,11 @@ def wrapper(left, right):
raise TypeError("{typ} cannot perform the operation "
"{op}".format(typ=type(left).__name__, op=str_rep))

elif (is_extension_array_dtype(left) or
(is_extension_array_dtype(right) and
not is_categorical_dtype(right))):
return dispatch_to_extension_op(op, left, right)

lvalues = left.values
rvalues = right
if isinstance(rvalues, ABCSeries):
Expand Down Expand Up @@ -1238,6 +1264,11 @@ def wrapper(self, other, axis=None):
return self._constructor(res_values, index=self.index,
name=res_name)

elif (is_extension_array_dtype(self) or
(is_extension_array_dtype(other) and
not is_categorical_dtype(other))):
return dispatch_to_extension_op(op, self, other)

elif isinstance(other, ABCSeries):
# By this point we have checked that self._indexed_same(other)
res_values = na_op(self.values, other.values)
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/extension/base/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ class TestMyDtype(BaseDtypeTests):
from .groupby import BaseGroupbyTests # noqa
from .interface import BaseInterfaceTests # noqa
from .methods import BaseMethodsTests # noqa
from .ops import BaseArithmeticOpsTests, BaseComparisonOpsTests # noqa
from .missing import BaseMissingTests # noqa
from .reshaping import BaseReshapingTests # noqa
from .setitem import BaseSetitemTests # noqa
Loading