Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dispatch scalar DataFrame ops to Series #22163

Merged
merged 19 commits into from
Aug 14, 2018
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
7681092
dispatch scalar DataFrame ops to Series
jbrockmendel Aug 2, 2018
b226abf
flake8 fixup
jbrockmendel Aug 2, 2018
3fd46bc
kludge-fix indexing errors
jbrockmendel Aug 2, 2018
0dff3a1
Merge branch 'master' of https://github.com/pandas-dev/pandas into df…
jbrockmendel Aug 2, 2018
caf2da0
scale back py2/py3 compat goals
jbrockmendel Aug 3, 2018
6ff8705
Merge branch 'master' of https://github.com/pandas-dev/pandas into df…
jbrockmendel Aug 3, 2018
0513e0b
update error message
jbrockmendel Aug 3, 2018
3a7b782
try to fix test_expressions failure going down the wrong path
jbrockmendel Aug 3, 2018
6636565
dummy commit to force CI
jbrockmendel Aug 4, 2018
edc3792
Merge branch 'master' of https://github.com/pandas-dev/pandas into df…
jbrockmendel Aug 7, 2018
3c65f93
post-merge cleanup
jbrockmendel Aug 7, 2018
2cdc58b
Merge branch 'master' of https://github.com/pandas-dev/pandas into df…
jbrockmendel Aug 8, 2018
4703db7
mark tests with GH Issues
jbrockmendel Aug 8, 2018
c090713
whatsnew? everything is new
jbrockmendel Aug 8, 2018
d683cb3
edit test for appveyor compat
jbrockmendel Aug 9, 2018
8c64cf6
Merge branch 'master' of https://github.com/pandas-dev/pandas into df…
jbrockmendel Aug 9, 2018
dbdea1a
API Changes section for DataFrame[timedelta64] - np.nan
jbrockmendel Aug 9, 2018
f1edec4
un-xfail
jbrockmendel Aug 9, 2018
9b62135
Merge branch 'master' of https://github.com/pandas-dev/pandas into df…
jbrockmendel Aug 10, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4949,6 +4949,14 @@ def _combine_match_columns(self, other, func, level=None, try_cast=True):
return self._constructor(new_data)

def _combine_const(self, other, func, errors='raise', try_cast=True):
if lib.is_scalar(other) or np.ndim(other) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is pretty annoything that we have to do this, I would make an explict function maybe is_any_scalar I think as we have these types of checks all over. pls make an issue for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

new_data = {i: func(self.iloc[:, i], other)
for i, col in enumerate(self.columns)}

result = self._constructor(new_data, index=self.index, copy=False)
result.columns = self.columns
return result

new_data = self._data.eval(func=func, other=other,
errors=errors,
try_cast=try_cast)
Expand Down
11 changes: 7 additions & 4 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -1311,7 +1311,7 @@ def na_op(x, y):
with np.errstate(all='ignore'):
result = method(y)
if result is NotImplemented:
raise TypeError("invalid type comparison")
return invalid_comparison(x, y, op)
else:
result = op(x, y)

Expand All @@ -1327,6 +1327,10 @@ def wrapper(self, other, axis=None):

res_name = get_op_result_name(self, other)

if isinstance(other, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is is_list_like (maybe after some other comparisons) enough here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM the isinstance(other, list) check is done below the isinstance(other, (np.ndarray, pd.Index)) check. Wrapping lists earlier let us send lists through that same ndarray/Index block. Ideally the catchall else: block can be reduced to only-scalars, but we're not there yet.

# TODO: same for tuples?
other = np.asarray(other)

if isinstance(other, ABCDataFrame): # pragma: no cover
# Defer to DataFrame implementation; fail early
return NotImplemented
Expand Down Expand Up @@ -1426,8 +1430,6 @@ def wrapper(self, other, axis=None):

else:
values = self.get_values()
if isinstance(other, list):
other = np.asarray(other)

with np.errstate(all='ignore'):
res = na_op(values, other)
Expand Down Expand Up @@ -1706,7 +1708,8 @@ def f(self, other, axis=default_axis, level=None, fill_value=None):
if fill_value is not None:
self = self.fillna(fill_value)

return self._combine_const(other, na_op, try_cast=True)
pass_op = op if lib.is_scalar(other) else na_op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are checking for a scalar here and above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of annoying. If lib.is_scalar(other) then we will be dispatching to the Series op, in which case we want to pass the "raw" op (e.g. operator.add) and not the wrapped op na_op.

This PR handles only scalars since that is a relatively easy case. A few PRs down the road we'll have all these ops dispatch to series, at which point this won't be necessary.

return self._combine_const(other, pass_op, try_cast=True)

f.__name__ = op_name

Expand Down
6 changes: 2 additions & 4 deletions pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ def test_df_float_none_comparison(self):
df = pd.DataFrame(np.random.randn(8, 3), index=range(8),
columns=['A', 'B', 'C'])

with pytest.raises(TypeError):
df.__eq__(None)
result = df.__eq__(None)
assert not result.any().any()

def test_df_string_comparison(self):
df = pd.DataFrame([{"a": 1, "b": "foo"}, {"a": 2, "b": "bar"}])
Expand Down Expand Up @@ -91,8 +91,6 @@ def test_df_add_flex_filled_mixed_dtypes(self):

class TestFrameArithmetic(object):

@pytest.mark.xfail(reason='GH#7996 datetime64 units not converted to nano',
strict=True)
def test_df_sub_datetime64_not_ns(self):
df = pd.DataFrame(pd.date_range('20130101', periods=3))
dt64 = np.datetime64('2013-01-01')
Expand Down
28 changes: 25 additions & 3 deletions pandas/tests/frame/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,8 @@ def test_getitem_boolean(self):
# test df[df > 0]
for df in [self.tsframe, self.mixed_frame,
self.mixed_float, self.mixed_int]:
if compat.PY3 and df is self.mixed_frame:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's strip out the mixed_frame to another function (even though that duplicates some code), bonus can parametrize this test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bonus can parametrize this test.

I don't think tsframe, mixed_frame, mixed_float, mixed_int are available in the namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to be made fixtures. this becomes so much easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, am starting to implement this in the test_arithmetic sequence of PRs. Will update this test when that lands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k thanks


data = df._get_numeric_data()
bif = df[df > 0]
Expand Down Expand Up @@ -2468,8 +2470,11 @@ def test_boolean_indexing_mixed(self):
assert_frame_equal(df2, expected)

df['foo'] = 'test'
with tm.assert_raises_regex(TypeError, 'boolean setting '
'on mixed-type'):
msg = ("boolean setting on mixed-type|"
"not supported between|"
"unorderable types")
with tm.assert_raises_regex(TypeError, msg):
# TODO: This message should be the same in PY2/PY3
df[df > 0.3] = 1

def test_where(self):
Expand Down Expand Up @@ -2502,6 +2507,10 @@ def _check_get(df, cond, check_dtypes=True):
# check getting
for df in [default_frame, self.mixed_frame,
self.mixed_float, self.mixed_int]:
if compat.PY3 and df is self.mixed_frame:
with pytest.raises(TypeError):
df > 0
continue
cond = df > 0
_check_get(df, cond)

Expand Down Expand Up @@ -2549,6 +2558,10 @@ def _check_align(df, cond, other, check_dtypes=True):
assert (rs.dtypes == df.dtypes).all()

for df in [self.mixed_frame, self.mixed_float, self.mixed_int]:
if compat.PY3 and df is self.mixed_frame:
with pytest.raises(TypeError):
df > 0
continue

# other is a frame
cond = (df > 0)[1:]
Expand Down Expand Up @@ -2594,6 +2607,10 @@ def _check_set(df, cond, check_dtypes=True):

for df in [default_frame, self.mixed_frame, self.mixed_float,
self.mixed_int]:
if compat.PY3 and df is self.mixed_frame:
with pytest.raises(TypeError):
df > 0
continue

cond = df > 0
_check_set(df, cond)
Expand Down Expand Up @@ -2759,9 +2776,14 @@ def test_where_datetime(self):
C=np.random.randn(5)))

stamp = datetime(2013, 1, 3)
result = df[df > stamp]
with pytest.raises(TypeError):
df > stamp

result = df[df.iloc[:, :-1] > stamp]

expected = df.copy()
expected.loc[[0, 1], 'A'] = np.nan
expected.loc[:, 'C'] = np.nan
assert_frame_equal(result, expected)

def test_where_none(self):
Expand Down
13 changes: 9 additions & 4 deletions pandas/tests/frame/test_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,10 +206,15 @@ def test_timestamp_compare(self):
right_f = getattr(operator, right)

# no nats
expected = left_f(df, Timestamp('20010109'))
result = right_f(Timestamp('20010109'), df)
assert_frame_equal(result, expected)

if left in ['eq', 'ne']:
expected = left_f(df, Timestamp('20010109'))
result = right_f(Timestamp('20010109'), df)
assert_frame_equal(result, expected)
else:
with pytest.raises(TypeError):
left_f(df, Timestamp('20010109'))
with pytest.raises(TypeError):
right_f(Timestamp('20010109'), df)
# nats
expected = left_f(df, Timestamp('nat'))
result = right_f(Timestamp('nat'), df)
Expand Down
9 changes: 4 additions & 5 deletions pandas/tests/indexes/timedeltas/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,6 @@ def test_timedelta_ops_with_missing_values(self):
scalar1 = pd.to_timedelta('00:00:01')
scalar2 = pd.to_timedelta('00:00:02')
timedelta_NaT = pd.to_timedelta('NaT')
NA = np.nan

actual = scalar1 + scalar1
assert actual == scalar2
Expand Down Expand Up @@ -646,10 +645,10 @@ def test_timedelta_ops_with_missing_values(self):
actual = df1 - timedelta_NaT
tm.assert_frame_equal(actual, dfn)

actual = df1 + NA
tm.assert_frame_equal(actual, dfn)
actual = df1 - NA
tm.assert_frame_equal(actual, dfn)
with pytest.raises(TypeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this raising? this is a big change if you don't allow nan to act as NaT in ops

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current behavior for Series and Index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a subsection in the whatsnew then, marked as an api change.

df1 + np.nan
with pytest.raises(TypeError):
df1 - np.nan

actual = df1 + pd.NaT # NaT is datetime, not timedelta
tm.assert_frame_equal(actual, dfn)
Expand Down
25 changes: 20 additions & 5 deletions pandas/tests/internals/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -1235,16 +1235,31 @@ def test_binop_other(self, op, value, dtype):
(operator.truediv, 'bool'),
(operator.mod, 'i8'),
(operator.mod, 'complex128'),
(operator.mod, '<M8[ns]'),
(operator.mod, '<m8[ns]'),
(operator.pow, 'bool')}
if (op, dtype) in skip:
pytest.skip("Invalid combination {},{}".format(op, dtype))

e = DummyElement(value, dtype)
s = pd.DataFrame({"A": [e.value, e.value]}, dtype=e.dtype)
result = op(s, e).dtypes
expected = op(s, value).dtypes
assert_series_equal(result, expected)

invalid = {(operator.pow, '<M8[ns]'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull this out and parametrize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test already has two layers of parametrization; it isn't clear how to pull this out without making it more verbose+repetitive. Let me give this some thought and circle back.

(operator.mod, '<M8[ns]'),
(operator.truediv, '<M8[ns]'),
(operator.mul, '<M8[ns]'),
(operator.add, '<M8[ns]'),
(operator.pow, '<m8[ns]'),
(operator.mod, '<m8[ns]'),
(operator.mul, '<m8[ns]')}

if (op, dtype) in invalid:
with pytest.raises(TypeError):
result = op(s, e.value)
else:
# FIXME: Since dispatching to Series, this test no longer
# asserts anything meaningful
result = op(s, e.value).dtypes
expected = op(s, value).dtypes
assert_series_equal(result, expected)


@pytest.mark.parametrize('typestr, holder', [
Expand Down
Loading