-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect exception raised by Series[datetime64] + int #19147
Conversation
pandas/core/ops.py
Outdated
@@ -703,6 +702,23 @@ def wrapper(left, right, name=name, na_op=na_op): | |||
return wrapper | |||
|
|||
|
|||
def _dispatch_to_index_op(op, left, right, index_class): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to end up using this again for TimedeltaIndex a few bugfixes from now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can de-privatize. rename to 'dispatch_to_index_operation'. needs a doc-string.
@@ -692,6 +693,9 @@ def __add__(self, other): | |||
return self._add_datelike(other) | |||
elif isinstance(other, Index): | |||
return self._add_datelike(other) | |||
elif is_integer_dtype(other) and self.freq is None: | |||
# GH#19123 | |||
raise NullFrequencyError("Cannot shift with no freq") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just make this a ValueError, we don't want to have custom error messages normally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to catch this specifically in core.ops. Want to catch by checking the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't want to have custom error messages normally
The error message here "Cannot shift with no freq" is the same as the error message raised if adding just an integer when self.freq
is None.
pandas/core/ops.py
Outdated
@@ -703,6 +702,23 @@ def wrapper(left, right, name=name, na_op=na_op): | |||
return wrapper | |||
|
|||
|
|||
def _dispatch_to_index_op(op, left, right, index_class): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can de-privatize. rename to 'dispatch_to_index_operation'. needs a doc-string.
pandas/core/ops.py
Outdated
checking and timezone handling. | ||
""" | ||
left_idx = index_class(left) | ||
left_idx.freq = None # avoid accidentally allowing integer add/sub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a mutating operation, don't do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary. If I change this to an assert left_idx.freq is None
that assertion fails in a number of tests:
dti = pd.date_range('1999-09-30', periods=10, tz='US/Pacific')
ser = pd.Series(dti)
>>> pd.DatetimeIndex(ser).freq
<Day>
It looks like this only occurs for tzaware cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this only occurs for tzaware cases.
Possibly related to caching of DatetimeIndexes and the fact that Series[datetime64tz]._data.blocks[0].values is itself a DatetimeIndex?
pandas/tseries/offsets.py
Outdated
@@ -1226,7 +1226,8 @@ def _get_roll(self, i, before_day_of_month, after_day_of_month): | |||
return roll | |||
|
|||
def _apply_index_days(self, i, roll): | |||
i += (roll % 2) * Timedelta(days=self.day_of_month).value | |||
nanos = (roll % 2) * Timedelta(days=self.day_of_month).value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover from another PR (and I had commented on that), remove from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re rIght that this is duplicated, but it is correct in both places, and this PR breaks without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other PR, this needs a doc-string
Codecov Report
@@ Coverage Diff @@
## master #19147 +/- ##
==========================================
- Coverage 91.56% 91.53% -0.03%
==========================================
Files 148 148
Lines 48856 48870 +14
==========================================
+ Hits 44733 44735 +2
- Misses 4123 4135 +12
Continue to review full report at Codecov.
|
@@ -680,6 +680,25 @@ def test_timedelta_series_ops(self): | |||
assert_series_equal(ts - s, expected2) | |||
assert_series_equal(ts + (-s), expected2) | |||
|
|||
def test_td64series_add_intlike(self): | |||
# GH#19123 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't conjoin words
test_td64_series_and_integer_like
@@ -1428,6 +1441,25 @@ def test_dt64series_arith_overflow(self): | |||
res = dt - ser | |||
tm.assert_series_equal(res, -expected) | |||
|
|||
def test_dt64series_add_intlike(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/tseries/offsets.py
Outdated
@@ -1226,7 +1226,8 @@ def _get_roll(self, i, before_day_of_month, after_day_of_month): | |||
return roll | |||
|
|||
def _apply_index_days(self, i, roll): | |||
i += (roll % 2) * Timedelta(days=self.day_of_month).value | |||
nanos = (roll % 2) * Timedelta(days=self.day_of_month).value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other PR, this needs a doc-string
pandas/core/ops.py
Outdated
left_idx.freq = None | ||
try: | ||
result = op(left_idx, right) | ||
except NullFrequencyError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is fine then. Please update tests where this should be caught (e.g. .shift)
pandas/core/ops.py
Outdated
# avoid accidentally allowing integer add/sub. For datetime64[tz] dtypes, | ||
# left_idx may inherit a freq from a cached DatetimeIndex. | ||
# See discussion in GH#19147. | ||
left_idx.freq = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I don't like this mutation here. why is it necessary? (e.g. if one of the end-points is None, e.g. an integer series this will still raise, yes?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if left
is tzaware then pd.DatetimeIndex(left, freq=None)
may come back with non-None freq.
dti = pd.date_range('2016-01-01', periods=2, tz='US/Pacific')
ser = pd.Series(dti)
>>> pd.DatetimeIndex(ser, freq=None).freq
<Day>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and why is this a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because in this scenario, without the left_idx.freq = None
line, ser + 1
fails to raise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just parametrized test_dt64_series_add_intlike
to make sure the tzaware case is covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you didn't answer my question here. I don't want mutation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really, really did. Without setting the frequency to none, integer addition fails to raise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we need a better way to detect this, we cannot mutate things to just see if they raise
Re-push or let sit? |
needs a rebase |
@@ -593,6 +594,12 @@ def test_nat_new(self): | |||
exp = np.array([tslib.iNaT] * 5, dtype=np.int64) | |||
tm.assert_numpy_array_equal(result, exp) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are very likely tests that are raising ValueError that should now be NullFrequencyError, pls change them.
You can find them very easily, by temporarly changing the error where NullFrequencyInhertis to something else (e.g. KeyError) and then seeing what raises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found and changed one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok great
pandas/core/ops.py
Outdated
# avoid accidentally allowing integer add/sub. For datetime64[tz] dtypes, | ||
# left_idx may inherit a freq from a cached DatetimeIndex. | ||
# See discussion in GH#19147. | ||
left_idx.freq = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we need a better way to detect this, we cannot mutate things to just see if they raise
pandas/errors/__init__.py
Outdated
|
||
|
||
class NullFrequencyError(ValueError): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a line in the api changes section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -593,6 +594,12 @@ def test_nat_new(self): | |||
exp = np.array([tslib.iNaT] * 5, dtype=np.int64) | |||
tm.assert_numpy_array_equal(result, exp) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok great
needs a rebase, otherwise lgtm. ping on green. |
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -273,6 +273,7 @@ Other API Changes | |||
- The default ``Timedelta`` constructor now accepts an ``ISO 8601 Duration`` string as an argument (:issue:`19040`) | |||
- ``IntervalDtype`` now returns ``True`` when compared against ``'interval'`` regardless of subtype, and ``IntervalDtype.name`` now returns ``'interval'`` regardless of subtype (:issue:`18980`) | |||
- :func:`Series.to_csv` now accepts a ``compression`` argument that works in the same way as the ``compression`` argument in :func:`DataFrame.to_csv` (:issue:`18958`) | |||
- :func:`DatetimeIndex.shift` and :func:`TimedeltaIndex.shift` will now raise ``NullFrequencyError`` (which subclasses ``ValueError``) when the index object frequency is ``None`` (:issue:`19147`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say something like: which subclasses ValueError, the currently raised exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Ping |
One more bugfix is ready to go once this goes in, plus another PR getting rid of _Op and _TimeOp. |
thanks! |
Series[datetime64] +/- int is currently raising a ValueError when it should be raising a TypeError. This was introduced in #19024.
In the process of fixing this bug, this also goes most of the way towards fixing #19123 by raising on add/sub with integer arrays.
Setup:
ATM:
After this PR (and also in 0.22.0 and anything before #19024):
git diff upstream/master -u -- "*.py" | flake8 --diff