Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect exception raised by Series[datetime64] + int #19147

Merged
merged 17 commits into from
Jan 17, 2018

Conversation

jbrockmendel
Copy link
Member

Series[datetime64] +/- int is currently raising a ValueError when it should be raising a TypeError. This was introduced in #19024.

In the process of fixing this bug, this also goes most of the way towards fixing #19123 by raising on add/sub with integer arrays.

Setup:

dti = pd.date_range('2016-01-01', periods=2)
ser = pd.Series(dti)

ATM:

>>> ser + 1
[...]
ValueError: Cannot shift with no freq

After this PR (and also in 0.22.0 and anything before #19024):

>>> ser + 1
[...]
TypeError: incompatible type for a datetime/timedelta operation [__add__]
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@@ -703,6 +702,23 @@ def wrapper(left, right, name=name, na_op=na_op):
return wrapper


def _dispatch_to_index_op(op, left, right, index_class):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to end up using this again for TimedeltaIndex a few bugfixes from now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can de-privatize. rename to 'dispatch_to_index_operation'. needs a doc-string.

@@ -692,6 +693,9 @@ def __add__(self, other):
return self._add_datelike(other)
elif isinstance(other, Index):
return self._add_datelike(other)
elif is_integer_dtype(other) and self.freq is None:
# GH#19123
raise NullFrequencyError("Cannot shift with no freq")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make this a ValueError, we don't want to have custom error messages normally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to catch this specifically in core.ops. Want to catch by checking the error message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't want to have custom error messages normally

The error message here "Cannot shift with no freq" is the same as the error message raised if adding just an integer when self.freq is None.

@@ -703,6 +702,23 @@ def wrapper(left, right, name=name, na_op=na_op):
return wrapper


def _dispatch_to_index_op(op, left, right, index_class):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can de-privatize. rename to 'dispatch_to_index_operation'. needs a doc-string.

checking and timezone handling.
"""
left_idx = index_class(left)
left_idx.freq = None # avoid accidentally allowing integer add/sub
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a mutating operation, don't do it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary. If I change this to an assert left_idx.freq is None that assertion fails in a number of tests:

dti = pd.date_range('1999-09-30', periods=10, tz='US/Pacific')
ser = pd.Series(dti)
>>> pd.DatetimeIndex(ser).freq
<Day>

It looks like this only occurs for tzaware cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this only occurs for tzaware cases.

Possibly related to caching of DatetimeIndexes and the fact that Series[datetime64tz]._data.blocks[0].values is itself a DatetimeIndex?

@@ -1226,7 +1226,8 @@ def _get_roll(self, i, before_day_of_month, after_day_of_month):
return roll

def _apply_index_days(self, i, roll):
i += (roll % 2) * Timedelta(days=self.day_of_month).value
nanos = (roll % 2) * Timedelta(days=self.day_of_month).value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from another PR (and I had commented on that), remove from here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re rIght that this is duplicated, but it is correct in both places, and this PR breaks without it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other PR, this needs a doc-string

@jreback jreback added Datetime Datetime data dtype Compat pandas objects compatability with Numpy or Python functions labels Jan 9, 2018
@jbrockmendel jbrockmendel changed the title Fix incorrect exception raises by Series[datetime64] + int Fix incorrect exception raised by Series[datetime64] + int Jan 9, 2018
@codecov
Copy link

codecov bot commented Jan 9, 2018

Codecov Report

Merging #19147 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19147      +/-   ##
==========================================
- Coverage   91.56%   91.53%   -0.03%     
==========================================
  Files         148      148              
  Lines       48856    48870      +14     
==========================================
+ Hits        44733    44735       +2     
- Misses       4123     4135      +12
Flag Coverage Δ
#multiple 89.91% <100%> (-0.03%) ⬇️
#single 41.69% <55.55%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 92.17% <100%> (+0.1%) ⬆️
pandas/core/indexes/datetimelike.py 97.11% <100%> (+0.03%) ⬆️
pandas/errors/__init__.py 100% <100%> (ø) ⬆️
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c271d4d...2231505. Read the comment docs.

@@ -680,6 +680,25 @@ def test_timedelta_series_ops(self):
assert_series_equal(ts - s, expected2)
assert_series_equal(ts + (-s), expected2)

def test_td64series_add_intlike(self):
# GH#19123
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't conjoin words

test_td64_series_and_integer_like

@@ -1428,6 +1441,25 @@ def test_dt64series_arith_overflow(self):
res = dt - ser
tm.assert_series_equal(res, -expected)

def test_dt64series_add_intlike(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -1226,7 +1226,8 @@ def _get_roll(self, i, before_day_of_month, after_day_of_month):
return roll

def _apply_index_days(self, i, roll):
i += (roll % 2) * Timedelta(days=self.day_of_month).value
nanos = (roll % 2) * Timedelta(days=self.day_of_month).value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other PR, this needs a doc-string

left_idx.freq = None
try:
result = op(left_idx, right)
except NullFrequencyError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is fine then. Please update tests where this should be caught (e.g. .shift)

# avoid accidentally allowing integer add/sub. For datetime64[tz] dtypes,
# left_idx may inherit a freq from a cached DatetimeIndex.
# See discussion in GH#19147.
left_idx.freq = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I don't like this mutation here. why is it necessary? (e.g. if one of the end-points is None, e.g. an integer series this will still raise, yes?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if left is tzaware then pd.DatetimeIndex(left, freq=None) may come back with non-None freq.

dti = pd.date_range('2016-01-01', periods=2, tz='US/Pacific')
ser = pd.Series(dti)

>>> pd.DatetimeIndex(ser, freq=None).freq
<Day>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and why is this a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in this scenario, without the left_idx.freq = None line, ser + 1 fails to raise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just parametrized test_dt64_series_add_intlike to make sure the tzaware case is covered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you didn't answer my question here. I don't want mutation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really, really did. Without setting the frequency to none, integer addition fails to raise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we need a better way to detect this, we cannot mutate things to just see if they raise

@jbrockmendel
Copy link
Member Author

Re-push or let sit?

@jreback
Copy link
Contributor

jreback commented Jan 12, 2018

needs a rebase

@@ -593,6 +594,12 @@ def test_nat_new(self):
exp = np.array([tslib.iNaT] * 5, dtype=np.int64)
tm.assert_numpy_array_equal(result, exp)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are very likely tests that are raising ValueError that should now be NullFrequencyError, pls change them.

You can find them very easily, by temporarly changing the error where NullFrequencyInhertis to something else (e.g. KeyError) and then seeing what raises

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found and changed one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok great

# avoid accidentally allowing integer add/sub. For datetime64[tz] dtypes,
# left_idx may inherit a freq from a cached DatetimeIndex.
# See discussion in GH#19147.
left_idx.freq = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we need a better way to detect this, we cannot mutate things to just see if they raise



class NullFrequencyError(ValueError):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a line in the api changes section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -593,6 +594,12 @@ def test_nat_new(self):
exp = np.array([tslib.iNaT] * 5, dtype=np.int64)
tm.assert_numpy_array_equal(result, exp)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok great

@jreback jreback added this to the 0.23.0 milestone Jan 16, 2018
@jreback
Copy link
Contributor

jreback commented Jan 16, 2018

needs a rebase, otherwise lgtm. ping on green.

@@ -273,6 +273,7 @@ Other API Changes
- The default ``Timedelta`` constructor now accepts an ``ISO 8601 Duration`` string as an argument (:issue:`19040`)
- ``IntervalDtype`` now returns ``True`` when compared against ``'interval'`` regardless of subtype, and ``IntervalDtype.name`` now returns ``'interval'`` regardless of subtype (:issue:`18980`)
- :func:`Series.to_csv` now accepts a ``compression`` argument that works in the same way as the ``compression`` argument in :func:`DataFrame.to_csv` (:issue:`18958`)
- :func:`DatetimeIndex.shift` and :func:`TimedeltaIndex.shift` will now raise ``NullFrequencyError`` (which subclasses ``ValueError``) when the index object frequency is ``None`` (:issue:`19147`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say something like: which subclasses ValueError, the currently raised exception

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@jbrockmendel
Copy link
Member Author

Ping

@jbrockmendel
Copy link
Member Author

One more bugfix is ready to go once this goes in, plus another PR getting rid of _Op and _TimeOp.

@jreback jreback merged commit 3db6f66 into pandas-dev:master Jan 17, 2018
@jreback
Copy link
Contributor

jreback commented Jan 17, 2018

thanks!

@jbrockmendel jbrockmendel deleted the nullfreq branch February 11, 2018 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants