Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __array_ufunc__ to Series / Array #23293

Merged
merged 60 commits into from
Jul 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
2fffac3
Add __array_ufunc__ to Series / Array
jorisvandenbossche Oct 23, 2018
c5a4664
expand IntegerArray.__array_ufunc__
jorisvandenbossche Oct 23, 2018
dd332a4
fix Series.__array_ufunc__ and consolidate dispatch
jorisvandenbossche Oct 23, 2018
71c058e
test Series array_ufunc fallback to numpy array for DecimalArray
jorisvandenbossche Oct 23, 2018
a0d11d9
fix import
jorisvandenbossche Oct 23, 2018
4cfeb9b
first dispatch before getting underlying values (eg for Series[Period…
jorisvandenbossche Oct 23, 2018
607f8a6
fix Categorical: disallow all ufunc apart from ops
jorisvandenbossche Oct 23, 2018
c4fcae7
simplify calling ufunc on underlying values
jorisvandenbossche Oct 23, 2018
65dea1b
fix categorical not existing ops
jorisvandenbossche Oct 23, 2018
134df14
np.positive not available for older numpy versions
jorisvandenbossche Oct 24, 2018
5239b70
fix multiple return values
jorisvandenbossche Oct 24, 2018
3d91885
skip IntegerArray tests for older numpy versions
jorisvandenbossche Oct 24, 2018
429f15c
also deal with no return value
jorisvandenbossche Oct 24, 2018
41f4158
clean-up debugging left-over
jorisvandenbossche Oct 24, 2018
0d6a663
TST: Additional tests for Series ufuncs
TomAugspurger Jun 19, 2019
8f46391
fixup release note
TomAugspurger Jun 19, 2019
44e3c7e
fixups
TomAugspurger Jun 19, 2019
e179913
remove stale comment
TomAugspurger Jun 19, 2019
27208c1
Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…
TomAugspurger Jun 19, 2019
0b1e745
xfail ufunc(series, index)
TomAugspurger Jun 20, 2019
9be1dff
32-bit compat
TomAugspurger Jun 20, 2019
775c2ef
fixup
TomAugspurger Jun 20, 2019
4d7f249
wip
TomAugspurger Jun 20, 2019
0b359d7
fixup release note
TomAugspurger Jun 19, 2019
bbbf269
Merge remote-tracking branch 'upstream/master' into series-array-ufunc
TomAugspurger Jun 21, 2019
64d8908
more
TomAugspurger Jun 21, 2019
d1788b0
lint
TomAugspurger Jun 21, 2019
ef5d508
Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…
TomAugspurger Jun 21, 2019
fe0ee4e
Merge branch 'series-array-ufunc' into jorisvandenbossche-array-ufunc
TomAugspurger Jun 21, 2019
971e347
fixup! more
TomAugspurger Jun 21, 2019
95e8aef
remove dead code
TomAugspurger Jun 21, 2019
7bfd584
todos
TomAugspurger Jun 21, 2019
06e5739
Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…
TomAugspurger Jun 21, 2019
feee015
remove compat
TomAugspurger Jun 21, 2019
3702b9b
object dtype tests
TomAugspurger Jun 21, 2019
a0f84ed
wip
TomAugspurger Jun 21, 2019
d83fe7a
doc, types
TomAugspurger Jun 21, 2019
edad466
compat
TomAugspurger Jun 22, 2019
e4ae8dc
fixups
TomAugspurger Jun 23, 2019
db60f6c
Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…
TomAugspurger Jun 24, 2019
a9bd6ef
added matmul
TomAugspurger Jun 24, 2019
1a8b807
start docs
TomAugspurger Jun 24, 2019
0b0466d
compat
TomAugspurger Jun 24, 2019
1f67866
Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…
TomAugspurger Jun 26, 2019
d3089bd
ignore for numpydev
TomAugspurger Jun 27, 2019
6e770e8
Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…
TomAugspurger Jun 27, 2019
15a3fb1
handle reduce
TomAugspurger Jun 27, 2019
b5e7f45
Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…
TomAugspurger Jun 27, 2019
4f4bd93
update
TomAugspurger Jun 27, 2019
5dbff49
fixups
TomAugspurger Jun 29, 2019
b623be2
Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…
TomAugspurger Jun 29, 2019
2237233
raise for reduce
TomAugspurger Jun 29, 2019
5b5c547
more tests
TomAugspurger Jun 29, 2019
10bc2cc
more tests
TomAugspurger Jun 29, 2019
5380b77
35 compat
TomAugspurger Jun 29, 2019
9f4d110
remove old test
TomAugspurger Jun 30, 2019
6c15ee7
Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…
TomAugspurger Jul 1, 2019
ab48bd8
fixup
TomAugspurger Jul 1, 2019
30fced8
Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…
TomAugspurger Jul 1, 2019
7486d26
Fixups
TomAugspurger Jul 1, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions doc/source/development/extending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,25 @@ will
2. call ``result = op(values, ExtensionArray)``
3. re-box the result in a ``Series``

.. _extending.extension.ufunc:

NumPy Universal Functions
^^^^^^^^^^^^^^^^^^^^^^^^^

:class:`Series` implements ``__array_ufunc__``. As part of the implementation,
pandas unboxes the ``ExtensionArray`` from the :class:`Series`, applies the ufunc,
and re-boxes it if necessary.

If applicable, we highly recommend that you implement ``__array_ufunc__`` in your
extension array to avoid coercion to an ndarray. See
`the numpy documentation <https://docs.scipy.org/doc/numpy/reference/generated/numpy.lib.mixins.NDArrayOperatorsMixin.html>`__
for an example.

As part of your implementation, we require that you defer to pandas when a pandas
container (:class:`Series`, :class:`DataFrame`, :class:`Index`) is detected in ``inputs``.
If any of those is present, you should return ``NotImplemented``. Pandas will take care of
unboxing the array from the container and re-calling the ufunc with the unwrapped input.

.. _extending.extension.testing:

Testing extension arrays
Expand Down
50 changes: 42 additions & 8 deletions doc/source/getting_started/dsintro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -731,28 +731,62 @@ DataFrame interoperability with NumPy functions
.. _dsintro.numpy_interop:

Elementwise NumPy ufuncs (log, exp, sqrt, ...) and various other NumPy functions
can be used with no issues on DataFrame, assuming the data within are numeric:
can be used with no issues on Series and DataFrame, assuming the data within
are numeric:

.. ipython:: python

np.exp(df)
np.asarray(df)

The dot method on DataFrame implements matrix multiplication:
DataFrame is not intended to be a drop-in replacement for ndarray as its
indexing semantics and data model are quite different in places from an n-dimensional
array.

:class:`Series` implements ``__array_ufunc__``, which allows it to work with NumPy's
`universal functions <https://docs.scipy.org/doc/numpy/reference/ufuncs.html>`_.

The ufunc is applied to the underlying array in a Series.

.. ipython:: python

df.T.dot(df)
ser = pd.Series([1, 2, 3, 4])
np.exp(ser)

Similarly, the dot method on Series implements dot product:
Like other parts of the library, pandas will automatically align labeled inputs
as part of a ufunc with multiple inputs. For example, using :meth:`numpy.remainder`
on two :class:`Series` with differently ordered labels will align before the operation.

.. ipython:: python

s1 = pd.Series(np.arange(5, 10))
s1.dot(s1)
ser1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
ser2 = pd.Series([1, 3, 5], index=['b', 'a', 'c'])
ser1
ser2
np.remainder(ser1, ser2)

DataFrame is not intended to be a drop-in replacement for ndarray as its
indexing semantics are quite different in places from a matrix.
As usual, the union of the two indices is taken, and non-overlapping values are filled
with missing values.

.. ipython:: python

ser3 = pd.Series([2, 4, 6], index=['b', 'c', 'd'])
ser3
np.remainder(ser1, ser3)

When a binary ufunc is applied to a :class:`Series` and :class:`Index`, the Series
implementation takes precedence and a Series is returned.

.. ipython:: python

ser = pd.Series([1, 2, 3])
idx = pd.Index([4, 5, 6])

np.maximum(ser, idx)

NumPy ufuncs are safe to apply to :class:`Series` backed by non-ndarray arrays,
for example :class:`SparseArray` (see :ref:`sparse.calculation`). If possible,
the ufunc is applied without converting the underlying data to an ndarray.

Console display
~~~~~~~~~~~~~~~
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
Computational tools
===================


Statistical functions
---------------------

Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -886,6 +886,7 @@ Sparse
- Introduce a better error message in :meth:`Series.sparse.from_coo` so it returns a ``TypeError`` for inputs that are not coo matrices (:issue:`26554`)
- Bug in :func:`numpy.modf` on a :class:`SparseArray`. Now a tuple of :class:`SparseArray` is returned (:issue:`26946`).


Build Changes
^^^^^^^^^^^^^

Expand All @@ -896,6 +897,7 @@ ExtensionArray

- Bug in :func:`factorize` when passing an ``ExtensionArray`` with a custom ``na_sentinel`` (:issue:`25696`).
- :meth:`Series.count` miscounts NA values in ExtensionArrays (:issue:`26835`)
- Added ``Series.__array_ufunc__`` to better handle NumPy ufuncs applied to Series backed by extension arrays (:issue:`23293`).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a general item for Series.__array_ufunc__ being added? Although, in principle it should not change any behaviour, as we already supported ufuncs?

- Keyword argument ``deep`` has been removed from :meth:`ExtensionArray.copy` (:issue:`27083`)

Other
Expand Down
11 changes: 11 additions & 0 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,17 @@ class ExtensionArray:
attributes called ``.values`` or ``._values`` to ensure full compatibility
with pandas internals. But other names as ``.data``, ``._data``,
``._items``, ... can be freely used.

If implementing NumPy's ``__array_ufunc__`` interface, pandas expects
that

1. You defer by raising ``NotImplemented`` when any Series are present
in `inputs`. Pandas will extract the arrays and call the ufunc again.
2. You define a ``_HANDLED_TYPES`` tuple as an attribute on the class.
Pandas inspect this to determine whether the ufunc is valid for the
types present.

See :ref:`extending.extension.ufunc` for more.
"""
# '_typ' is for pandas.core.dtypes.generic.ABCExtensionArray.
# Don't override this.
Expand Down
15 changes: 15 additions & 0 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
from pandas.core.dtypes.inference import is_hashable
from pandas.core.dtypes.missing import isna, notna

from pandas.core import ops
from pandas.core.accessor import PandasDelegate, delegate_names
import pandas.core.algorithms as algorithms
from pandas.core.algorithms import factorize, take, take_1d, unique1d
Expand Down Expand Up @@ -1292,6 +1293,20 @@ def __array__(self, dtype=None):
ret = np.asarray(ret)
return ret

def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
# for binary ops, use our custom dunder methods
result = ops.maybe_dispatch_ufunc_to_dunder_op(
self, ufunc, method, *inputs, **kwargs)
if result is not NotImplemented:
return result

# for all other cases, raise for now (similarly as what happens in
# Series.__array_prepare__)
raise TypeError("Object with dtype {dtype} cannot perform "
"the numpy op {op}".format(
dtype=self.dtype,
op=ufunc.__name__))

def __setstate__(self, state):
"""Necessary for making this object picklable"""
if not isinstance(state, dict):
Expand Down
49 changes: 48 additions & 1 deletion pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import numbers
import sys
from typing import Type
import warnings
Expand All @@ -17,7 +18,7 @@
from pandas.core.dtypes.generic import ABCIndexClass, ABCSeries
from pandas.core.dtypes.missing import isna, notna

from pandas.core import nanops
from pandas.core import nanops, ops
from pandas.core.arrays import ExtensionArray, ExtensionOpsMixin
from pandas.core.tools.numeric import to_numeric

Expand Down Expand Up @@ -344,6 +345,52 @@ def __array__(self, dtype=None):
"""
return self._coerce_to_ndarray()

_HANDLED_TYPES = (np.ndarray, numbers.Number)

def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
# For IntegerArray inputs, we apply the ufunc to ._data
# and mask the result.
if method == 'reduce':
# Not clear how to handle missing values in reductions. Raise.
raise NotImplementedError("The 'reduce' method is not supported.")
out = kwargs.get('out', ())

for x in inputs + out:
if not isinstance(x, self._HANDLED_TYPES + (IntegerArray,)):
return NotImplemented

# for binary ops, use our custom dunder methods
result = ops.maybe_dispatch_ufunc_to_dunder_op(
self, ufunc, method, *inputs, **kwargs)
if result is not NotImplemented:
return result

mask = np.zeros(len(self), dtype=bool)
inputs2 = []
for x in inputs:
if isinstance(x, IntegerArray):
mask |= x._mask
inputs2.append(x._data)
else:
inputs2.append(x)

def reconstruct(x):
# we don't worry about scalar `x` here, since we
# raise for reduce up above.

if is_integer_dtype(x.dtype):
m = mask.copy()
return IntegerArray(x, m)
else:
x[mask] = np.nan
return x

result = getattr(ufunc, method)(*inputs2, **kwargs)
if isinstance(result, tuple):
tuple(reconstruct(x) for x in result)
else:
return reconstruct(result)

def __iter__(self):
for i in range(len(self)):
if self._mask[i]:
Expand Down
42 changes: 6 additions & 36 deletions pandas/core/arrays/sparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
from pandas.core.base import PandasObject
import pandas.core.common as com
from pandas.core.missing import interpolate_2d
import pandas.core.ops as ops

import pandas.io.formats.printing as printing

Expand Down Expand Up @@ -1665,42 +1666,11 @@ def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
if not isinstance(x, self._HANDLED_TYPES + (SparseArray,)):
return NotImplemented

special = {'add', 'sub', 'mul', 'pow', 'mod', 'floordiv', 'truediv',
'divmod', 'eq', 'ne', 'lt', 'gt', 'le', 'ge', 'remainder'}
aliases = {
'subtract': 'sub',
'multiply': 'mul',
'floor_divide': 'floordiv',
'true_divide': 'truediv',
'power': 'pow',
'remainder': 'mod',
'divide': 'div',
'equal': 'eq',
'not_equal': 'ne',
'less': 'lt',
'less_equal': 'le',
'greater': 'gt',
'greater_equal': 'ge',
}

flipped = {
'lt': '__gt__',
'le': '__ge__',
'gt': '__lt__',
'ge': '__le__',
'eq': '__eq__',
'ne': '__ne__',
}

op_name = ufunc.__name__
op_name = aliases.get(op_name, op_name)

if op_name in special and kwargs.get('out') is None:
if isinstance(inputs[0], type(self)):
return getattr(self, '__{}__'.format(op_name))(inputs[1])
else:
name = flipped.get(op_name, '__r{}__'.format(op_name))
return getattr(self, name)(inputs[0])
# for binary ops, use our custom dunder methods
result = ops.maybe_dispatch_ufunc_to_dunder_op(
self, ufunc, method, *inputs, **kwargs)
if result is not NotImplemented:
return result

if len(inputs) == 1:
# No alignment necessary.
Expand Down
Loading