Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/PERF: Avoid listifying in dispatch_to_extension_op #23155

Merged
merged 16 commits into from
Oct 19, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions doc/source/extending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,12 @@ There are two approaches for providing operator support for your ExtensionArray:
2. Use an operator implementation from pandas that depends on operators that are already defined
on the underlying elements (scalars) of the ExtensionArray.

.. note::

Regardless of the approach, you may want to set ``__array_priority__``
if you want your implementation to be called when involved in binary operations
with NumPy arrays.

For the first approach, you define selected operators, e.g., ``__add__``, ``__le__``, etc. that
you want your ``ExtensionArray`` subclass to support.

Expand Down Expand Up @@ -173,6 +179,16 @@ or not that succeeds depends on whether the operation returns a result
that's valid for the ``ExtensionArray``. If an ``ExtensionArray`` cannot
be reconstructed, an ndarray containing the scalars returned instead.

For ease of implementation and consistency with operations between pandas
and NumPy ndarrays, we recommend *not* handling Series and Indexes in your binary ops.
Instead, you should detect these cases and return ``NotImplemented``.
When pandas encounters an operation like ``op(Series, ExtensionArray)``, pandas
will

1. unbox the array from the ``Series`` (roughly ``Series.values``)
2. call ``result = op(values, ExtensionArray)``
3. re-box the result in a ``Series``

.. _extending.extension.testing:

Testing Extension Arrays
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -875,6 +875,7 @@ Numeric
- Bug in :meth:`DataFrame.apply` where, when supplied with a string argument and additional positional or keyword arguments (e.g. ``df.apply('sum', min_count=1)``), a ``TypeError`` was wrongly raised (:issue:`22376`)
- Bug in :meth:`DataFrame.astype` to extension dtype may raise ``AttributeError`` (:issue:`22578`)
- Bug in :class:`DataFrame` with ``timedelta64[ns]`` dtype arithmetic operations with ``ndarray`` with integer dtype incorrectly treating the narray as ``timedelta64[ns]`` dtype (:issue:`23114`)
- Bug in :meth:`Series.rpow` with object dtype ``NaN`` for ``1 ** NA`` instead of ``1`` (:issue:`22922`).

Strings
^^^^^^^
Expand Down
33 changes: 27 additions & 6 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

import operator

from pandas.core.dtypes.generic import ABCSeries, ABCIndexClass
from pandas.errors import AbstractMethodError
from pandas.compat.numpy import function as nv
from pandas.compat import set_function_name, PY3
Expand Down Expand Up @@ -109,6 +110,7 @@ def _from_sequence(cls, scalars, dtype=None, copy=False):
compatible with the ExtensionArray.
copy : boolean, default False
If True, copy the underlying data.

Returns
-------
ExtensionArray
Expand Down Expand Up @@ -724,7 +726,13 @@ def _reduce(self, name, skipna=True, **kwargs):

class ExtensionOpsMixin(object):
"""
A base class for linking the operators to their dunder names
A base class for linking the operators to their dunder names.

.. note::

You may want to set ``__array_priority__`` if you want your
implementation to be called when involved in binary operations
with NumPy arrays.
"""

@classmethod
Expand Down Expand Up @@ -761,12 +769,14 @@ def _add_comparison_ops(cls):


class ExtensionScalarOpsMixin(ExtensionOpsMixin):
"""A mixin for defining the arithmetic and logical operations on
an ExtensionArray class, where it is assumed that the underlying objects
have the operators already defined.
"""
A mixin for defining ops on an ExtensionArray.

It is assumed that the underlying scalar objects have the operators
already defined.

Usage
------
Notes
-----
If you have defined a subclass MyExtensionArray(ExtensionArray), then
use MyExtensionArray(ExtensionArray, ExtensionScalarOpsMixin) to
get the arithmetic operators. After the definition of MyExtensionArray,
Expand All @@ -776,6 +786,12 @@ class ExtensionScalarOpsMixin(ExtensionOpsMixin):
MyExtensionArray._add_comparison_ops()

to link the operators to your class.

.. note::

You may want to set ``__array_priority__`` if you want your
implementation to be called when involved in binary operations
with NumPy arrays.
"""

@classmethod
Expand Down Expand Up @@ -825,6 +841,11 @@ def convert_values(param):
else: # Assume its an object
ovalues = [param] * len(self)
return ovalues

if isinstance(other, (ABCSeries, ABCIndexClass)):
# rely on pandas to unbox and dispatch to us
return NotImplemented

lvalues = self
rvalues = convert_values(other)

Expand Down
43 changes: 31 additions & 12 deletions pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
import copy
import numpy as np

from pandas._libs.lib import infer_dtype

from pandas._libs import lib
from pandas.util._decorators import cache_readonly
from pandas.compat import u, range, string_types
from pandas.compat import set_function_name
Expand Down Expand Up @@ -171,7 +172,7 @@ def coerce_to_array(values, dtype, mask=None, copy=False):

values = np.array(values, copy=copy)
if is_object_dtype(values):
inferred_type = infer_dtype(values)
inferred_type = lib.infer_dtype(values)
if inferred_type not in ['floating', 'integer',
'mixed-integer', 'mixed-integer-float']:
raise TypeError("{} cannot be converted to an IntegerDtype".format(
Expand Down Expand Up @@ -280,6 +281,8 @@ def _coerce_to_ndarray(self):
data[self._mask] = self._na_value
return data

__array_priority__ = 1000 # higher than ndarray so ops dispatch to us
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just put this in the base class? (for the ops mixin)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a little too invasive for a base class. I’d rather leave that up to the subclasser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what arithmetic subclass would not want this set?

is there an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I'm not sure if there's a way to unset it, if you don't want to set it in a subclass (you don't want to opt into numpy's array stuff at all).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just find this a detail which would likely be forgotten in any subclass, I don't see a harm and much upset in setting it onthe base class (you can always unset if you really really think you need to).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you unset it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know if setting __array_priority__ = 0 is enough to "unset" it, and I don't know what all setting __array_priority__ in the first place opts you into.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document this in the Mixin itself though (if you are not going to set it by defaulrt). It is so non-obvious that you need to do this.


def __array__(self, dtype=None):
"""
the array interface, return my values
Expand All @@ -288,12 +291,6 @@ def __array__(self, dtype=None):
return self._coerce_to_ndarray()

def __iter__(self):
"""Iterate over elements of the array.

"""
# This needs to be implemented so that pandas recognizes extension
# arrays as list-like. The default implementation makes successive
# calls to ``__getitem__``, which may be slower than necessary.
for i in range(len(self)):
if self._mask[i]:
yield self.dtype.na_value
Expand Down Expand Up @@ -504,13 +501,21 @@ def cmp_method(self, other):

op_name = op.__name__
mask = None

if isinstance(other, (ABCSeries, ABCIndexClass)):
# Rely on pandas to unbox and dispatch to us.
return NotImplemented

if isinstance(other, IntegerArray):
other, mask = other._data, other._mask

elif is_list_like(other):
other = np.asarray(other)
if other.ndim > 0 and len(self) != len(other):
raise ValueError('Lengths must match to compare')

other = lib.item_from_zerodim(other)

# numpy will show a DeprecationWarning on invalid elementwise
# comparisons, this will raise in the future
with warnings.catch_warnings():
Expand Down Expand Up @@ -586,14 +591,21 @@ def integer_arithmetic_method(self, other):

op_name = op.__name__
mask = None

if isinstance(other, (ABCSeries, ABCIndexClass)):
other = getattr(other, 'values', other)
# Rely on pandas to unbox and dispatch to us.
return NotImplemented

if isinstance(other, IntegerArray):
other, mask = other._data, other._mask
elif getattr(other, 'ndim', 0) > 1:
if getattr(other, 'ndim', 0) > 1:
raise NotImplementedError(
"can only perform ops with 1-d structures")

if isinstance(other, IntegerArray):
other, mask = other._data, other._mask

elif getattr(other, 'ndim', None) == 0:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved from dispatch_to_extension_array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to keep this one in an elif, so that we avoid the else block raising a TypeError.

other = other.item()

elif is_list_like(other):
other = np.asarray(other)
if not other.ndim:
Expand All @@ -612,6 +624,13 @@ def integer_arithmetic_method(self, other):
else:
mask = self._mask | mask

# 1 ** np.nan is 1. So we have to unmask those.
if op_name == 'pow':
mask = np.where(self == 1, False, mask)

elif op_name == 'rpow':
mask = np.where(other == 1, False, mask)

with np.errstate(all='ignore'):
result = op(self._data, other)

Expand Down
27 changes: 23 additions & 4 deletions pandas/core/arrays/sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -1471,15 +1471,32 @@ def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
'power': 'pow',
'remainder': 'mod',
'divide': 'div',
'equal': 'eq',
'not_equal': 'ne',
'less': 'lt',
'less_equal': 'le',
'greater': 'gt',
'greater_equal': 'ge',
}

flipped = {
'lt': '__gt__',
'le': '__ge__',
'gt': '__lt__',
'ge': '__le__',
'eq': '__eq__',
'ne': '__ne__',
}

op_name = ufunc.__name__
op_name = aliases.get(op_name, op_name)

if op_name in special and kwargs.get('out') is None:
if isinstance(inputs[0], type(self)):
return getattr(self, '__{}__'.format(op_name))(inputs[1])
else:
return getattr(self, '__r{}__'.format(op_name))(inputs[0])
name = flipped.get(op_name, '__r{}__'.format(op_name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to @jbrockmendel since we do this a couple of times iIRC, we should have a more generic way of doing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we implement __array_ufunc__ on more arrays, we'll need to do it in those places too.

I think pandas.core.ops._op_descriptions may have enough info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't quite work since the comparison ops don't define reversed, which may be sensible (haven't really thought it through).

return getattr(self, name)(inputs[0])

if len(inputs) == 1:
# No alignment necessary.
Expand Down Expand Up @@ -1528,7 +1545,8 @@ def sparse_arithmetic_method(self, other):
op_name = op.__name__

if isinstance(other, (ABCSeries, ABCIndexClass)):
other = getattr(other, 'values', other)
# Rely on pandas to dispatch to us.
return NotImplemented

if isinstance(other, SparseArray):
return _sparse_array_op(self, other, op, op_name)
Expand Down Expand Up @@ -1573,10 +1591,11 @@ def cmp_method(self, other):
op_name = op_name[:-1]

if isinstance(other, (ABCSeries, ABCIndexClass)):
other = getattr(other, 'values', other)
# Rely on pandas to unbox and dispatch to us.
return NotImplemented

if not is_scalar(other) and not isinstance(other, type(self)):
# convert list-like to ndarary
# convert list-like to ndarray
other = np.asarray(other)

if isinstance(other, np.ndarray):
Expand Down
34 changes: 14 additions & 20 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -862,6 +862,13 @@ def masked_arith_op(x, y, op):
# mask is only meaningful for x
result = np.empty(x.size, dtype=x.dtype)
mask = notna(xrav)

# 1 ** np.nan is 1. So we have to unmask those.
if op == pow:
mask = np.where(x == 1, False, mask)
elif op == rpow:
mask = np.where(y == 1, False, mask)

if mask.any():
with np.errstate(all='ignore'):
result[mask] = op(xrav[mask], y)
Expand Down Expand Up @@ -1202,29 +1209,16 @@ def dispatch_to_extension_op(op, left, right):

# The op calls will raise TypeError if the op is not defined
# on the ExtensionArray
# TODO(jreback)
# we need to listify to avoid ndarray, or non-same-type extension array
# dispatching

if is_extension_array_dtype(left):

new_left = left.values
if isinstance(right, np.ndarray):

# handle numpy scalars, this is a PITA
# TODO(jreback)
new_right = lib.item_from_zerodim(right)
if is_scalar(new_right):
new_right = [new_right]
new_right = list(new_right)
elif is_extension_array_dtype(right) and type(left) != type(right):
new_right = list(right)
else:
new_right = right

# unbox Series and Index to arrays
if isinstance(left, (ABCSeries, ABCIndexClass)):
new_left = left._values
else:
new_left = left

new_left = list(left.values)
if isinstance(right, (ABCSeries, ABCIndexClass)):
new_right = right._values
else:
new_right = right

res_values = op(new_left, new_right)
Expand Down
Loading