Skip to content

Commit

Permalink
API/BUG: Fix Series ops inconsistencies (#13894)
Browse files Browse the repository at this point in the history
- series comparison operator to check whether labels are identical (currently: ignores labels)
- series boolean operator to align with labels (currently: only keeps left index)
  • Loading branch information
sinhrks authored and jorisvandenbossche committed Aug 25, 2016
1 parent e23e6f1 commit 5152cdd
Show file tree
Hide file tree
Showing 5 changed files with 450 additions and 50 deletions.
138 changes: 138 additions & 0 deletions doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,143 @@ New Behavior:

type(s.tolist()[0])

.. _whatsnew_0190.api.series_ops:

``Series`` operators for different indexes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Following ``Series`` operators has been changed to make all operators consistent,
including ``DataFrame`` (:issue:`1134`, :issue:`4581`, :issue:`13538`)

- ``Series`` comparison operators now raise ``ValueError`` when ``index`` are different.
- ``Series`` logical operators align both ``index``.

.. warning::
Until 0.18.1, comparing ``Series`` with the same length has been succeeded even if
these ``index`` are different (the result ignores ``index``). As of 0.19.0, it raises ``ValueError`` to be more strict. This section also describes how to keep previous behaviour or align different indexes using flexible comparison methods like ``.eq``.


As a result, ``Series`` and ``DataFrame`` operators behave as below:

Arithmetic operators
""""""""""""""""""""

Arithmetic operators align both ``index`` (no changes).

.. ipython:: python

s1 = pd.Series([1, 2, 3], index=list('ABC'))
s2 = pd.Series([2, 2, 2], index=list('ABD'))
s1 + s2

df1 = pd.DataFrame([1, 2, 3], index=list('ABC'))
df2 = pd.DataFrame([2, 2, 2], index=list('ABD'))
df1 + df2

Comparison operators
""""""""""""""""""""

Comparison operators raise ``ValueError`` when ``index`` are different.

Previous Behavior (``Series``):

``Series`` compares values ignoring ``index`` as long as both lengthes are the same.

.. code-block:: ipython

In [1]: s1 == s2
Out[1]:
A False
B True
C False
dtype: bool

New Behavior (``Series``):

.. code-block:: ipython

In [2]: s1 == s2
Out[2]:
ValueError: Can only compare identically-labeled Series objects

.. note::
To achieve the same result as previous versions (compare values based on locations ignoring ``index``), compare both ``.values``.

.. ipython:: python

s1.values == s2.values

If you want to compare ``Series`` aligning its ``index``, see flexible comparison methods section below.

Current Behavior (``DataFrame``, no change):

.. code-block:: ipython

In [3]: df1 == df2
Out[3]:
ValueError: Can only compare identically-labeled DataFrame objects

Logical operators
"""""""""""""""""

Logical operators align both ``index``.

Previous Behavior (``Series``):

Only left hand side ``index`` is kept.

.. code-block:: ipython

In [4]: s1 = pd.Series([True, False, True], index=list('ABC'))
In [5]: s2 = pd.Series([True, True, True], index=list('ABD'))
In [6]: s1 & s2
Out[6]:
A True
B False
C False
dtype: bool

New Behavior (``Series``):

.. ipython:: python

s1 = pd.Series([True, False, True], index=list('ABC'))
s2 = pd.Series([True, True, True], index=list('ABD'))
s1 & s2

.. note::
``Series`` logical operators fill ``NaN`` result with ``False``.

.. note::
To achieve the same result as previous versions (compare values based on locations ignoring ``index``), compare both ``.values``.

.. ipython:: python

s1.values & s2.values

Current Behavior (``DataFrame``, no change):

.. ipython:: python

df1 = pd.DataFrame([True, False, True], index=list('ABC'))
df2 = pd.DataFrame([True, True, True], index=list('ABD'))
df1 & df2

Flexible comparison methods
"""""""""""""""""""""""""""

``Series`` flexible comparison methods like ``eq``, ``ne``, ``le``, ``lt``, ``ge`` and ``gt`` now align both ``index``. Use these operators if you want to compare two ``Series``
which has the different ``index``.

.. ipython:: python

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([2, 2, 2], index=['b', 'c', 'd'])
s1.eq(s2)
s1.ge(s2)

Previously, it worked as the same as comparison operators (see above).

.. _whatsnew_0190.api.promote:

``Series`` type promotion on assignment
Expand Down Expand Up @@ -1107,6 +1244,7 @@ Bug Fixes
- Bug in using NumPy ufunc with ``PeriodIndex`` to add or subtract integer raise ``IncompatibleFrequency``. Note that using standard operator like ``+`` or ``-`` is recommended, because standard operators use more efficient path (:issue:`13980`)

- Bug in operations on ``NaT`` returning ``float`` instead of ``datetime64[ns]`` (:issue:`12941`)
- Bug in ``Series`` flexible arithmetic methods (like ``.add()``) raises ``ValueError`` when ``axis=None`` (:issue:`13894`)

- Bug in ``pd.read_csv`` in Python 2.x with non-UTF8 encoded, multi-character separated data (:issue:`3404`)

Expand Down
85 changes: 66 additions & 19 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,17 +311,6 @@ def get_op(cls, left, right, name, na_op):
is_datetime_lhs = (is_datetime64_dtype(left) or
is_datetime64tz_dtype(left))

if isinstance(left, ABCSeries) and isinstance(right, ABCSeries):
# avoid repated alignment
if not left.index.equals(right.index):
left, right = left.align(right, copy=False)

index, lidx, ridx = left.index.join(right.index, how='outer',
return_indexers=True)
# if DatetimeIndex have different tz, convert to UTC
left.index = index
right.index = index

if not (is_datetime_lhs or is_timedelta_lhs):
return _Op(left, right, name, na_op)
else:
Expand Down Expand Up @@ -603,6 +592,33 @@ def _is_offset(self, arr_or_obj):
return False


def _align_method_SERIES(left, right, align_asobject=False):
""" align lhs and rhs Series """

# ToDo: Different from _align_method_FRAME, list, tuple and ndarray
# are not coerced here
# because Series has inconsistencies described in #13637

if isinstance(right, ABCSeries):
# avoid repeated alignment
if not left.index.equals(right.index):

if align_asobject:
# to keep original value's dtype for bool ops
left = left.astype(object)
right = right.astype(object)

left, right = left.align(right, copy=False)

index, lidx, ridx = left.index.join(right.index, how='outer',
return_indexers=True)
# if DatetimeIndex have different tz, convert to UTC
left.index = index
right.index = index

return left, right


def _arith_method_SERIES(op, name, str_rep, fill_zeros=None, default_axis=None,
**eval_kwargs):
"""
Expand Down Expand Up @@ -655,6 +671,8 @@ def wrapper(left, right, name=name, na_op=na_op):
if isinstance(right, pd.DataFrame):
return NotImplemented

left, right = _align_method_SERIES(left, right)

converted = _Op.get_op(left, right, name, na_op)

left, right = converted.left, converted.right
Expand Down Expand Up @@ -763,8 +781,9 @@ def wrapper(self, other, axis=None):

if isinstance(other, ABCSeries):
name = _maybe_match_name(self, other)
if len(self) != len(other):
raise ValueError('Series lengths must match to compare')
if not self._indexed_same(other):
msg = 'Can only compare identically-labeled Series objects'
raise ValueError(msg)
return self._constructor(na_op(self.values, other.values),
index=self.index, name=name)
elif isinstance(other, pd.DataFrame): # pragma: no cover
Expand All @@ -786,6 +805,7 @@ def wrapper(self, other, axis=None):

return self._constructor(na_op(self.values, np.asarray(other)),
index=self.index).__finalize__(self)

elif isinstance(other, pd.Categorical):
if not is_categorical_dtype(self):
msg = ("Cannot compare a Categorical for op {op} with Series "
Expand Down Expand Up @@ -860,9 +880,10 @@ def wrapper(self, other):
fill_int = lambda x: x.fillna(0)
fill_bool = lambda x: x.fillna(False).astype(bool)

self, other = _align_method_SERIES(self, other, align_asobject=True)

if isinstance(other, ABCSeries):
name = _maybe_match_name(self, other)
other = other.reindex_like(self)
is_other_int_dtype = is_integer_dtype(other.dtype)
other = fill_int(other) if is_other_int_dtype else fill_bool(other)

Expand Down Expand Up @@ -912,7 +933,32 @@ def wrapper(self, other):
'floordiv': {'op': '//',
'desc': 'Integer division',
'reversed': False,
'reverse': 'rfloordiv'}}
'reverse': 'rfloordiv'},

'eq': {'op': '==',
'desc': 'Equal to',
'reversed': False,
'reverse': None},
'ne': {'op': '!=',
'desc': 'Not equal to',
'reversed': False,
'reverse': None},
'lt': {'op': '<',
'desc': 'Less than',
'reversed': False,
'reverse': None},
'le': {'op': '<=',
'desc': 'Less than or equal to',
'reversed': False,
'reverse': None},
'gt': {'op': '>',
'desc': 'Greater than',
'reversed': False,
'reverse': None},
'ge': {'op': '>=',
'desc': 'Greater than or equal to',
'reversed': False,
'reverse': None}}

_op_names = list(_op_descriptions.keys())
for k in _op_names:
Expand Down Expand Up @@ -963,10 +1009,11 @@ def _flex_method_SERIES(op, name, str_rep, default_axis=None, fill_zeros=None,
@Appender(doc)
def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
# validate axis
self._get_axis_number(axis)
if axis is not None:
self._get_axis_number(axis)
if isinstance(other, ABCSeries):
return self._binop(other, op, level=level, fill_value=fill_value)
elif isinstance(other, (np.ndarray, ABCSeries, list, tuple)):
elif isinstance(other, (np.ndarray, list, tuple)):
if len(other) != len(self):
raise ValueError('Lengths must be equal')
return self._binop(self._constructor(other, self.index), op,
Expand All @@ -975,15 +1022,15 @@ def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
if fill_value is not None:
self = self.fillna(fill_value)

return self._constructor(op(self.values, other),
return self._constructor(op(self, other),
self.index).__finalize__(self)

flex_wrapper.__name__ = name
return flex_wrapper


series_flex_funcs = dict(flex_arith_method=_flex_method_SERIES,
flex_comp_method=_comp_method_SERIES)
flex_comp_method=_flex_method_SERIES)

series_special_funcs = dict(arith_method=_arith_method_SERIES,
comp_method=_comp_method_SERIES,
Expand Down
34 changes: 18 additions & 16 deletions pandas/io/tests/json/test_ujson.py
Original file line number Diff line number Diff line change
Expand Up @@ -1306,43 +1306,45 @@ def testSeries(self):

# column indexed
outp = Series(ujson.decode(ujson.encode(s))).sort_values()
self.assertTrue((s == outp).values.all())
exp = Series([10, 20, 30, 40, 50, 60],
index=['6', '7', '8', '9', '10', '15'])
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(ujson.encode(s), numpy=True)).sort_values()
self.assertTrue((s == outp).values.all())
tm.assert_series_equal(outp, exp)

dec = _clean_dict(ujson.decode(ujson.encode(s, orient="split")))
outp = Series(**dec)
self.assertTrue((s == outp).values.all())
self.assertTrue(s.name == outp.name)
tm.assert_series_equal(outp, s)

dec = _clean_dict(ujson.decode(ujson.encode(s, orient="split"),
numpy=True))
outp = Series(**dec)
self.assertTrue((s == outp).values.all())
self.assertTrue(s.name == outp.name)

outp = Series(ujson.decode(ujson.encode(
s, orient="records"), numpy=True))
self.assertTrue((s == outp).values.all())
outp = Series(ujson.decode(ujson.encode(s, orient="records"),
numpy=True))
exp = Series([10, 20, 30, 40, 50, 60])
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(ujson.encode(s, orient="records")))
self.assertTrue((s == outp).values.all())
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(
ujson.encode(s, orient="values"), numpy=True))
self.assertTrue((s == outp).values.all())
outp = Series(ujson.decode(ujson.encode(s, orient="values"),
numpy=True))
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(ujson.encode(s, orient="values")))
self.assertTrue((s == outp).values.all())
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(ujson.encode(
s, orient="index"))).sort_values()
self.assertTrue((s == outp).values.all())
exp = Series([10, 20, 30, 40, 50, 60],
index=['6', '7', '8', '9', '10', '15'])
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(ujson.encode(
s, orient="index"), numpy=True)).sort_values()
self.assertTrue((s == outp).values.all())
tm.assert_series_equal(outp, exp)

def testSeriesNested(self):
s = Series([10, 20, 30, 40, 50, 60], name="series",
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/indexes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -685,7 +685,8 @@ def test_equals_op(self):
index_a == series_d
with tm.assertRaisesRegexp(ValueError, "Lengths must match"):
index_a == array_d
with tm.assertRaisesRegexp(ValueError, "Series lengths must match"):
msg = "Can only compare identically-labeled Series objects"
with tm.assertRaisesRegexp(ValueError, msg):
series_a == series_d
with tm.assertRaisesRegexp(ValueError, "Lengths must match"):
series_a == array_d
Expand Down
Loading

0 comments on commit 5152cdd

Please sign in to comment.