API/BUG: Fix Series ops inconsistencies (#13894)

- series comparison operator to check whether labels are identical (currently: ignores labels) - series boolean operator to align with labels (currently: only keeps left index)
pandas-dev · Aug 25, 2016 · 5152cdd · 5152cdd
1 parent e23e6f1
commit 5152cdd
Show file tree

Hide file tree

Showing 5 changed files with 450 additions and 50 deletions.
diff --git a/doc/source/whatsnew/v0.19.0.txt b/doc/source/whatsnew/v0.19.0.txt
@@ -488,6 +488,143 @@ New Behavior:
 
    type(s.tolist()[0])
 
+.. _whatsnew_0190.api.series_ops:
+
+``Series`` operators for different indexes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Following ``Series`` operators has been changed to make all operators consistent,
+including ``DataFrame`` (:issue:`1134`, :issue:`4581`, :issue:`13538`)
+
+- ``Series`` comparison operators now raise ``ValueError`` when ``index`` are different.
+- ``Series`` logical operators align both ``index``.
+
+.. warning::
+   Until 0.18.1, comparing ``Series`` with the same length has been succeeded even if
+   these ``index`` are different (the result ignores ``index``). As of 0.19.0, it raises ``ValueError`` to be more strict. This section also describes how to keep previous behaviour or align different indexes using flexible comparison methods like ``.eq``.
+
+
+As a result, ``Series`` and ``DataFrame`` operators behave as below:
+
+Arithmetic operators
+""""""""""""""""""""
+
+Arithmetic operators align both ``index`` (no changes).
+
+.. ipython:: python
+
+   s1 = pd.Series([1, 2, 3], index=list('ABC'))
+   s2 = pd.Series([2, 2, 2], index=list('ABD'))
+   s1 + s2
+
+   df1 = pd.DataFrame([1, 2, 3], index=list('ABC'))
+   df2 = pd.DataFrame([2, 2, 2], index=list('ABD'))
+   df1 + df2
+
+Comparison operators
+""""""""""""""""""""
+
+Comparison operators raise ``ValueError`` when ``index`` are different.
+
+Previous Behavior (``Series``):
+
+``Series`` compares values ignoring ``index`` as long as both lengthes are the same.
+
+.. code-block:: ipython
+
+   In [1]: s1 == s2
+   Out[1]:
+   A    False
+   B     True
+   C    False
+   dtype: bool
+
+New Behavior (``Series``):
+
+.. code-block:: ipython
+
+   In [2]: s1 == s2
+   Out[2]:
+   ValueError: Can only compare identically-labeled Series objects
+
+.. note::
+   To achieve the same result as previous versions (compare values based on locations ignoring ``index``), compare both ``.values``.
+
+   .. ipython:: python
+
+      s1.values == s2.values
+
+   If you want to compare ``Series`` aligning its ``index``, see flexible comparison methods section below.
+
+Current Behavior (``DataFrame``, no change):
+
+.. code-block:: ipython
+
+   In [3]: df1 == df2
+   Out[3]:
+   ValueError: Can only compare identically-labeled DataFrame objects
+
+Logical operators
+"""""""""""""""""
+
+Logical operators align both ``index``.
+
+Previous Behavior (``Series``):
+
+Only left hand side ``index`` is kept.
+
+.. code-block:: ipython
+
+   In [4]: s1 = pd.Series([True, False, True], index=list('ABC'))
+   In [5]: s2 = pd.Series([True, True, True], index=list('ABD'))
+   In [6]: s1 & s2
+   Out[6]:
+   A     True
+   B    False
+   C    False
+   dtype: bool
+
+New Behavior (``Series``):
+
+.. ipython:: python
+
+   s1 = pd.Series([True, False, True], index=list('ABC'))
+   s2 = pd.Series([True, True, True], index=list('ABD'))
+   s1 & s2
+
+.. note::
+   ``Series`` logical operators fill ``NaN`` result with ``False``.
+
+.. note::
+   To achieve the same result as previous versions (compare values based on locations ignoring ``index``), compare both ``.values``.
+
+   .. ipython:: python
+
+      s1.values & s2.values
+
+Current Behavior (``DataFrame``, no change):
+
+.. ipython:: python
+
+   df1 = pd.DataFrame([True, False, True], index=list('ABC'))
+   df2 = pd.DataFrame([True, True, True], index=list('ABD'))
+   df1 & df2
+
+Flexible comparison methods
+"""""""""""""""""""""""""""
+
+``Series`` flexible comparison methods like ``eq``, ``ne``, ``le``, ``lt``, ``ge`` and ``gt`` now align both ``index``. Use these operators if you want to compare two ``Series``
+which has the different ``index``.
+
+.. ipython:: python
+
+   s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
+   s2 = pd.Series([2, 2, 2], index=['b', 'c', 'd'])
+   s1.eq(s2)
+   s1.ge(s2)
+
+Previously, it worked as the same as comparison operators (see above).
+
 .. _whatsnew_0190.api.promote:
 
 ``Series`` type promotion on assignment
@@ -1107,6 +1244,7 @@ Bug Fixes
 - Bug in using NumPy ufunc with ``PeriodIndex`` to add or subtract integer raise ``IncompatibleFrequency``. Note that using standard operator like ``+`` or ``-`` is recommended, because standard operators use more efficient path (:issue:`13980`)
 
 - Bug in operations on ``NaT`` returning ``float`` instead of ``datetime64[ns]`` (:issue:`12941`)
+- Bug in ``Series`` flexible arithmetic methods (like ``.add()``) raises ``ValueError`` when ``axis=None`` (:issue:`13894`)
 
 - Bug in ``pd.read_csv`` in Python 2.x with non-UTF8 encoded, multi-character separated data (:issue:`3404`)
 

diff --git a/pandas/core/ops.py b/pandas/core/ops.py
@@ -311,17 +311,6 @@ def get_op(cls, left, right, name, na_op):
         is_datetime_lhs = (is_datetime64_dtype(left) or
                            is_datetime64tz_dtype(left))
 
-        if isinstance(left, ABCSeries) and isinstance(right, ABCSeries):
-            # avoid repated alignment
-            if not left.index.equals(right.index):
-                left, right = left.align(right, copy=False)
-
-                index, lidx, ridx = left.index.join(right.index, how='outer',
-                                                    return_indexers=True)
-                # if DatetimeIndex have different tz, convert to UTC
-                left.index = index
-                right.index = index
-
         if not (is_datetime_lhs or is_timedelta_lhs):
             return _Op(left, right, name, na_op)
         else:
@@ -603,6 +592,33 @@ def _is_offset(self, arr_or_obj):
             return False
 
 
+def _align_method_SERIES(left, right, align_asobject=False):
+    """ align lhs and rhs Series """
+
+    # ToDo: Different from _align_method_FRAME, list, tuple and ndarray
+    # are not coerced here
+    # because Series has inconsistencies described in #13637
+
+    if isinstance(right, ABCSeries):
+        # avoid repeated alignment
+        if not left.index.equals(right.index):
+
+            if align_asobject:
+                # to keep original value's dtype for bool ops
+                left = left.astype(object)
+                right = right.astype(object)
+
+            left, right = left.align(right, copy=False)
+
+            index, lidx, ridx = left.index.join(right.index, how='outer',
+                                                return_indexers=True)
+            # if DatetimeIndex have different tz, convert to UTC
+            left.index = index
+            right.index = index
+
+    return left, right
+
+
 def _arith_method_SERIES(op, name, str_rep, fill_zeros=None, default_axis=None,
                          **eval_kwargs):
     """
@@ -655,6 +671,8 @@ def wrapper(left, right, name=name, na_op=na_op):
         if isinstance(right, pd.DataFrame):
             return NotImplemented
 
+        left, right = _align_method_SERIES(left, right)
+
         converted = _Op.get_op(left, right, name, na_op)
 
         left, right = converted.left, converted.right
@@ -763,8 +781,9 @@ def wrapper(self, other, axis=None):
 
         if isinstance(other, ABCSeries):
             name = _maybe_match_name(self, other)
-            if len(self) != len(other):
-                raise ValueError('Series lengths must match to compare')
+            if not self._indexed_same(other):
+                msg = 'Can only compare identically-labeled Series objects'
+                raise ValueError(msg)
             return self._constructor(na_op(self.values, other.values),
                                      index=self.index, name=name)
         elif isinstance(other, pd.DataFrame):  # pragma: no cover
@@ -786,6 +805,7 @@ def wrapper(self, other, axis=None):
 
             return self._constructor(na_op(self.values, np.asarray(other)),
                                      index=self.index).__finalize__(self)
+
         elif isinstance(other, pd.Categorical):
             if not is_categorical_dtype(self):
                 msg = ("Cannot compare a Categorical for op {op} with Series "
@@ -860,9 +880,10 @@ def wrapper(self, other):
         fill_int = lambda x: x.fillna(0)
         fill_bool = lambda x: x.fillna(False).astype(bool)
 
+        self, other = _align_method_SERIES(self, other, align_asobject=True)
+
         if isinstance(other, ABCSeries):
             name = _maybe_match_name(self, other)
-            other = other.reindex_like(self)
             is_other_int_dtype = is_integer_dtype(other.dtype)
             other = fill_int(other) if is_other_int_dtype else fill_bool(other)
 
@@ -912,7 +933,32 @@ def wrapper(self, other):
                     'floordiv': {'op': '//',
                                  'desc': 'Integer division',
                                  'reversed': False,
-                                 'reverse': 'rfloordiv'}}
+                                 'reverse': 'rfloordiv'},
+
+                    'eq': {'op': '==',
+                                 'desc': 'Equal to',
+                                 'reversed': False,
+                                 'reverse': None},
+                    'ne': {'op': '!=',
+                                 'desc': 'Not equal to',
+                                 'reversed': False,
+                                 'reverse': None},
+                    'lt': {'op': '<',
+                                 'desc': 'Less than',
+                                 'reversed': False,
+                                 'reverse': None},
+                    'le': {'op': '<=',
+                                 'desc': 'Less than or equal to',
+                                 'reversed': False,
+                                 'reverse': None},
+                    'gt': {'op': '>',
+                                 'desc': 'Greater than',
+                                 'reversed': False,
+                                 'reverse': None},
+                    'ge': {'op': '>=',
+                                 'desc': 'Greater than or equal to',
+                                 'reversed': False,
+                                 'reverse': None}}
 
 _op_names = list(_op_descriptions.keys())
 for k in _op_names:
@@ -963,10 +1009,11 @@ def _flex_method_SERIES(op, name, str_rep, default_axis=None, fill_zeros=None,
     @Appender(doc)
     def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
         # validate axis
-        self._get_axis_number(axis)
+        if axis is not None:
+            self._get_axis_number(axis)
         if isinstance(other, ABCSeries):
             return self._binop(other, op, level=level, fill_value=fill_value)
-        elif isinstance(other, (np.ndarray, ABCSeries, list, tuple)):
+        elif isinstance(other, (np.ndarray, list, tuple)):
             if len(other) != len(self):
                 raise ValueError('Lengths must be equal')
             return self._binop(self._constructor(other, self.index), op,
@@ -975,15 +1022,15 @@ def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
             if fill_value is not None:
                 self = self.fillna(fill_value)
 
-            return self._constructor(op(self.values, other),
+            return self._constructor(op(self, other),
                                      self.index).__finalize__(self)
 
     flex_wrapper.__name__ = name
     return flex_wrapper
 
 
 series_flex_funcs = dict(flex_arith_method=_flex_method_SERIES,
-                         flex_comp_method=_comp_method_SERIES)
+                         flex_comp_method=_flex_method_SERIES)
 
 series_special_funcs = dict(arith_method=_arith_method_SERIES,
                             comp_method=_comp_method_SERIES,

diff --git a/pandas/io/tests/json/test_ujson.py b/pandas/io/tests/json/test_ujson.py
@@ -1306,43 +1306,45 @@ def testSeries(self):
 
         # column indexed
         outp = Series(ujson.decode(ujson.encode(s))).sort_values()
-        self.assertTrue((s == outp).values.all())
+        exp = Series([10, 20, 30, 40, 50, 60],
+                     index=['6', '7', '8', '9', '10', '15'])
+        tm.assert_series_equal(outp, exp)
 
         outp = Series(ujson.decode(ujson.encode(s), numpy=True)).sort_values()
-        self.assertTrue((s == outp).values.all())
+        tm.assert_series_equal(outp, exp)
 
         dec = _clean_dict(ujson.decode(ujson.encode(s, orient="split")))
         outp = Series(**dec)
-        self.assertTrue((s == outp).values.all())
-        self.assertTrue(s.name == outp.name)
+        tm.assert_series_equal(outp, s)
 
         dec = _clean_dict(ujson.decode(ujson.encode(s, orient="split"),
                                        numpy=True))
         outp = Series(**dec)
-        self.assertTrue((s == outp).values.all())
-        self.assertTrue(s.name == outp.name)
 
-        outp = Series(ujson.decode(ujson.encode(
-            s, orient="records"), numpy=True))
-        self.assertTrue((s == outp).values.all())
+        outp = Series(ujson.decode(ujson.encode(s, orient="records"),
+                                   numpy=True))
+        exp = Series([10, 20, 30, 40, 50, 60])
+        tm.assert_series_equal(outp, exp)
 
         outp = Series(ujson.decode(ujson.encode(s, orient="records")))
-        self.assertTrue((s == outp).values.all())
+        tm.assert_series_equal(outp, exp)
 
-        outp = Series(ujson.decode(
-            ujson.encode(s, orient="values"), numpy=True))
-        self.assertTrue((s == outp).values.all())
+        outp = Series(ujson.decode(ujson.encode(s, orient="values"),
+                                   numpy=True))
+        tm.assert_series_equal(outp, exp)
 
         outp = Series(ujson.decode(ujson.encode(s, orient="values")))
-        self.assertTrue((s == outp).values.all())
+        tm.assert_series_equal(outp, exp)
 
         outp = Series(ujson.decode(ujson.encode(
             s, orient="index"))).sort_values()
-        self.assertTrue((s == outp).values.all())
+        exp = Series([10, 20, 30, 40, 50, 60],
+                     index=['6', '7', '8', '9', '10', '15'])
+        tm.assert_series_equal(outp, exp)
 
         outp = Series(ujson.decode(ujson.encode(
             s, orient="index"), numpy=True)).sort_values()
-        self.assertTrue((s == outp).values.all())
+        tm.assert_series_equal(outp, exp)
 
     def testSeriesNested(self):
         s = Series([10, 20, 30, 40, 50, 60], name="series",

diff --git a/pandas/tests/indexes/common.py b/pandas/tests/indexes/common.py
@@ -685,7 +685,8 @@ def test_equals_op(self):
             index_a == series_d
         with tm.assertRaisesRegexp(ValueError, "Lengths must match"):
             index_a == array_d
-        with tm.assertRaisesRegexp(ValueError, "Series lengths must match"):
+        msg = "Can only compare identically-labeled Series objects"
+        with tm.assertRaisesRegexp(ValueError, msg):
             series_a == series_d
         with tm.assertRaisesRegexp(ValueError, "Lengths must match"):
             series_a == array_d