Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/BUG: Fix Series ops inconsistencies #13894

Merged
merged 3 commits into from
Aug 25, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,143 @@ New Behavior:

type(s.tolist()[0])

.. _whatsnew_0190.api.series_ops:

``Series`` operators for different indexes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Following ``Series`` operators has been changed to make all operators consistent,
including ``DataFrame`` (:issue:`1134`, :issue:`4581`, :issue:`13538`)

- ``Series`` comparison operators now raise ``ValueError`` when ``index`` are different.
- ``Series`` logical operators align both ``index``.

.. warning::
Until 0.18.1, comparing ``Series`` with the same length has been succeeded even if
these ``index`` are different (the result ignores ``index``). As of 0.19.0, it raises ``ValueError`` to be more strict. This section also describes how to keep previous behaviour or align different indexes using flexible comparison methods like ``.eq``.


As a result, ``Series`` and ``DataFrame`` operators behave as below:

Arithmetic operators
""""""""""""""""""""

Arithmetic operators align both ``index`` (no changes).

.. ipython:: python

s1 = pd.Series([1, 2, 3], index=list('ABC'))
s2 = pd.Series([2, 2, 2], index=list('ABD'))
s1 + s2

df1 = pd.DataFrame([1, 2, 3], index=list('ABC'))
df2 = pd.DataFrame([2, 2, 2], index=list('ABD'))
df1 + df2

Comparison operators
""""""""""""""""""""

Comparison operators raise ``ValueError`` when ``index`` are different.

Previous Behavior (``Series``):

``Series`` compares values ignoring ``index`` as long as both lengthes are the same.

.. code-block:: ipython

In [1]: s1 == s2
Out[1]:
A False
B True
C False
dtype: bool

New Behavior (``Series``):

.. code-block:: ipython

In [2]: s1 == s2
Out[2]:
ValueError: Can only compare identically-labeled Series objects

.. note::
To achieve the same result as previous versions (compare values based on locations ignoring ``index``), compare both ``.values``.

.. ipython:: python

s1.values == s2.values

If you want to compare ``Series`` aligning its ``index``, see flexible comparison methods section below.

Current Behavior (``DataFrame``, no change):

.. code-block:: ipython

In [3]: df1 == df2
Out[3]:
ValueError: Can only compare identically-labeled DataFrame objects

Logical operators
"""""""""""""""""

Logical operators align both ``index``.

Previous Behavior (``Series``):

Only left hand side ``index`` is kept.

.. code-block:: ipython

In [4]: s1 = pd.Series([True, False, True], index=list('ABC'))
In [5]: s2 = pd.Series([True, True, True], index=list('ABD'))
In [6]: s1 & s2
Out[6]:
A True
B False
C False
dtype: bool

New Behavior (``Series``):

.. ipython:: python

s1 = pd.Series([True, False, True], index=list('ABC'))
s2 = pd.Series([True, True, True], index=list('ABD'))
s1 & s2

.. note::
``Series`` logical operators fill ``NaN`` result with ``False``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with how DataFrame behaves?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see #13896.


.. note::
To achieve the same result as previous versions (compare values based on locations ignoring ``index``), compare both ``.values``.

.. ipython:: python

s1.values & s2.values

Current Behavior (``DataFrame``, no change):

.. ipython:: python

df1 = pd.DataFrame([True, False, True], index=list('ABC'))
df2 = pd.DataFrame([True, True, True], index=list('ABD'))
df1 & df2

Flexible comparison methods
"""""""""""""""""""""""""""

``Series`` flexible comparison methods like ``eq``, ``ne``, ``le``, ``lt``, ``ge`` and ``gt`` now align both ``index``. Use these operators if you want to compare two ``Series``
which has the different ``index``.

.. ipython:: python

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([2, 2, 2], index=['b', 'c', 'd'])
s1.eq(s2)
s1.ge(s2)

Previously, it worked as the same as comparison operators (see above).

.. _whatsnew_0190.api.promote:

``Series`` type promotion on assignment
Expand Down Expand Up @@ -1069,6 +1206,7 @@ Bug Fixes
- Bug in using NumPy ufunc with ``PeriodIndex`` to add or subtract integer raise ``IncompatibleFrequency``. Note that using standard operator like ``+`` or ``-`` is recommended, because standard operators use more efficient path (:issue:`13980`)

- Bug in operations on ``NaT`` returning ``float`` instead of ``datetime64[ns]`` (:issue:`12941`)
- Bug in ``Series`` flexible arithmetic methods (like ``.add()``) raises ``ValueError`` when ``axis=None`` (:issue:`13894`)

- Bug in ``pd.read_csv`` in Python 2.x with non-UTF8 encoded, multi-character separated data (:issue:`3404`)

Expand Down
85 changes: 66 additions & 19 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,17 +311,6 @@ def get_op(cls, left, right, name, na_op):
is_datetime_lhs = (is_datetime64_dtype(left) or
is_datetime64tz_dtype(left))

if isinstance(left, ABCSeries) and isinstance(right, ABCSeries):
# avoid repated alignment
if not left.index.equals(right.index):
left, right = left.align(right, copy=False)

index, lidx, ridx = left.index.join(right.index, how='outer',
return_indexers=True)
# if DatetimeIndex have different tz, convert to UTC
left.index = index
right.index = index

if not (is_datetime_lhs or is_timedelta_lhs):
return _Op(left, right, name, na_op)
else:
Expand Down Expand Up @@ -603,6 +592,33 @@ def _is_offset(self, arr_or_obj):
return False


def _align_method_SERIES(left, right, align_asobject=False):
""" align lhs and rhs Series """

# ToDo: Different from _align_method_FRAME, list, tuple and ndarray
# are not coerced here
# because Series has inconsistencies described in #13637

if isinstance(right, ABCSeries):
# avoid repeated alignment
if not left.index.equals(right.index):

if align_asobject:
# to keep original value's dtype for bool ops
left = left.astype(object)
right = right.astype(object)

left, right = left.align(right, copy=False)

index, lidx, ridx = left.index.join(right.index, how='outer',
return_indexers=True)
# if DatetimeIndex have different tz, convert to UTC
left.index = index
right.index = index

return left, right


def _arith_method_SERIES(op, name, str_rep, fill_zeros=None, default_axis=None,
**eval_kwargs):
"""
Expand Down Expand Up @@ -654,6 +670,8 @@ def wrapper(left, right, name=name, na_op=na_op):
if isinstance(right, pd.DataFrame):
return NotImplemented

left, right = _align_method_SERIES(left, right)

converted = _Op.get_op(left, right, name, na_op)

left, right = converted.left, converted.right
Expand Down Expand Up @@ -761,8 +779,9 @@ def wrapper(self, other, axis=None):

if isinstance(other, ABCSeries):
name = _maybe_match_name(self, other)
if len(self) != len(other):
raise ValueError('Series lengths must match to compare')
if not self._indexed_same(other):
msg = 'Can only compare identically-labeled Series objects'
raise ValueError(msg)
return self._constructor(na_op(self.values, other.values),
index=self.index, name=name)
elif isinstance(other, pd.DataFrame): # pragma: no cover
Expand All @@ -784,6 +803,7 @@ def wrapper(self, other, axis=None):

return self._constructor(na_op(self.values, np.asarray(other)),
index=self.index).__finalize__(self)

elif isinstance(other, pd.Categorical):
if not is_categorical_dtype(self):
msg = ("Cannot compare a Categorical for op {op} with Series "
Expand Down Expand Up @@ -856,9 +876,10 @@ def wrapper(self, other):
fill_int = lambda x: x.fillna(0)
fill_bool = lambda x: x.fillna(False).astype(bool)

self, other = _align_method_SERIES(self, other, align_asobject=True)

if isinstance(other, ABCSeries):
name = _maybe_match_name(self, other)
other = other.reindex_like(self)
is_other_int_dtype = is_integer_dtype(other.dtype)
other = fill_int(other) if is_other_int_dtype else fill_bool(other)

Expand Down Expand Up @@ -908,7 +929,32 @@ def wrapper(self, other):
'floordiv': {'op': '//',
'desc': 'Integer division',
'reversed': False,
'reverse': 'rfloordiv'}}
'reverse': 'rfloordiv'},

'eq': {'op': '==',
'desc': 'Equal to',
'reversed': False,
'reverse': None},
'ne': {'op': '!=',
'desc': 'Not equal to',
'reversed': False,
'reverse': None},
'lt': {'op': '<',
'desc': 'Less than',
'reversed': False,
'reverse': None},
'le': {'op': '<=',
'desc': 'Less than or equal to',
'reversed': False,
'reverse': None},
'gt': {'op': '>',
'desc': 'Greater than',
'reversed': False,
'reverse': None},
'ge': {'op': '>=',
'desc': 'Greater than or equal to',
'reversed': False,
'reverse': None}}

_op_names = list(_op_descriptions.keys())
for k in _op_names:
Expand Down Expand Up @@ -959,10 +1005,11 @@ def _flex_method_SERIES(op, name, str_rep, default_axis=None, fill_zeros=None,
@Appender(doc)
def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
# validate axis
self._get_axis_number(axis)
if axis is not None:
self._get_axis_number(axis)
if isinstance(other, ABCSeries):
return self._binop(other, op, level=level, fill_value=fill_value)
elif isinstance(other, (np.ndarray, ABCSeries, list, tuple)):
elif isinstance(other, (np.ndarray, list, tuple)):
if len(other) != len(self):
raise ValueError('Lengths must be equal')
return self._binop(self._constructor(other, self.index), op,
Expand All @@ -971,15 +1018,15 @@ def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
if fill_value is not None:
self = self.fillna(fill_value)

return self._constructor(op(self.values, other),
return self._constructor(op(self, other),
self.index).__finalize__(self)

flex_wrapper.__name__ = name
return flex_wrapper


series_flex_funcs = dict(flex_arith_method=_flex_method_SERIES,
flex_comp_method=_comp_method_SERIES)
flex_comp_method=_flex_method_SERIES)

series_special_funcs = dict(arith_method=_arith_method_SERIES,
comp_method=_comp_method_SERIES,
Expand Down
34 changes: 18 additions & 16 deletions pandas/io/tests/json/test_ujson.py
Original file line number Diff line number Diff line change
Expand Up @@ -1306,43 +1306,45 @@ def testSeries(self):

# column indexed
outp = Series(ujson.decode(ujson.encode(s))).sort_values()
self.assertTrue((s == outp).values.all())
exp = Series([10, 20, 30, 40, 50, 60],
index=['6', '7', '8', '9', '10', '15'])
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(ujson.encode(s), numpy=True)).sort_values()
self.assertTrue((s == outp).values.all())
tm.assert_series_equal(outp, exp)

dec = _clean_dict(ujson.decode(ujson.encode(s, orient="split")))
outp = Series(**dec)
self.assertTrue((s == outp).values.all())
self.assertTrue(s.name == outp.name)
tm.assert_series_equal(outp, s)

dec = _clean_dict(ujson.decode(ujson.encode(s, orient="split"),
numpy=True))
outp = Series(**dec)
self.assertTrue((s == outp).values.all())
self.assertTrue(s.name == outp.name)

outp = Series(ujson.decode(ujson.encode(
s, orient="records"), numpy=True))
self.assertTrue((s == outp).values.all())
outp = Series(ujson.decode(ujson.encode(s, orient="records"),
numpy=True))
exp = Series([10, 20, 30, 40, 50, 60])
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(ujson.encode(s, orient="records")))
self.assertTrue((s == outp).values.all())
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(
ujson.encode(s, orient="values"), numpy=True))
self.assertTrue((s == outp).values.all())
outp = Series(ujson.decode(ujson.encode(s, orient="values"),
numpy=True))
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(ujson.encode(s, orient="values")))
self.assertTrue((s == outp).values.all())
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(ujson.encode(
s, orient="index"))).sort_values()
self.assertTrue((s == outp).values.all())
exp = Series([10, 20, 30, 40, 50, 60],
index=['6', '7', '8', '9', '10', '15'])
tm.assert_series_equal(outp, exp)

outp = Series(ujson.decode(ujson.encode(
s, orient="index"), numpy=True)).sort_values()
self.assertTrue((s == outp).values.all())
tm.assert_series_equal(outp, exp)

def testSeriesNested(self):
s = Series([10, 20, 30, 40, 50, 60], name="series",
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/indexes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -676,7 +676,8 @@ def test_equals_op(self):
index_a == series_d
with tm.assertRaisesRegexp(ValueError, "Lengths must match"):
index_a == array_d
with tm.assertRaisesRegexp(ValueError, "Series lengths must match"):
msg = "Can only compare identically-labeled Series objects"
with tm.assertRaisesRegexp(ValueError, msg):
series_a == series_d
with tm.assertRaisesRegexp(ValueError, "Lengths must match"):
series_a == array_d
Expand Down
Loading