Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Resample raises AmbiguousTimeError if index starts or ends on DST #22514

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -615,6 +615,7 @@ Timezones
- Bug when setting a new value with :meth:`DataFrame.loc` with a :class:`DatetimeIndex` with a DST transition (:issue:`18308`, :issue:`20724`)
- Bug in :meth:`DatetimeIndex.unique` that did not re-localize tz-aware dates correctly (:issue:`21737`)
- Bug when indexing a :class:`Series` with a DST transition (:issue:`21846`)
- Bug in :meth:`DataFrame.resample` when :class:`DatetimeIndex` starts or ends on a DST transition (:issue:`10117`, :issue:`19375`)

Offsets
^^^^^^^
Expand Down
41 changes: 23 additions & 18 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -893,34 +893,39 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
tdata = <int64_t*> cnp.PyArray_DATA(trans)
ntrans = len(trans)

# Determine whether each date lies left of the DST transition (store in
# result_a) or right of the DST transition (store in result_b)
result_a = np.empty(n, dtype=np.int64)
result_b = np.empty(n, dtype=np.int64)
result_a.fill(NPY_NAT)
result_b.fill(NPY_NAT)

# left side
idx_shifted = (np.maximum(0, trans.searchsorted(
idx_shifted_left = (np.maximum(0, trans.searchsorted(
vals - DAY_NS, side='right') - 1)).astype(np.int64)

for i in range(n):
v = vals[i] - deltas[idx_shifted[i]]
pos = bisect_right_i8(tdata, v, ntrans) - 1

# timestamp falls to the left side of the DST transition
if v + deltas[pos] == vals[i]:
result_a[i] = v

# right side
idx_shifted = (np.maximum(0, trans.searchsorted(
idx_shifted_right = (np.maximum(0, trans.searchsorted(
vals + DAY_NS, side='right') - 1)).astype(np.int64)

for i in range(n):
v = vals[i] - deltas[idx_shifted[i]]
pos = bisect_right_i8(tdata, v, ntrans) - 1

# timestamp falls to the right side of the DST transition
if v + deltas[pos] == vals[i]:
result_b[i] = v
v_left = vals[i] - deltas[idx_shifted_left[i]]
if v_left in trans:
# The vals[i] lies directly on the DST border.
result_a[i] = v_left
else:
pos_left = bisect_right_i8(tdata, v_left, ntrans) - 1
# timestamp falls to the left side of the DST transition
if v_left + deltas[pos_left] == vals[i]:
result_a[i] = v_left

v_right = vals[i] - deltas[idx_shifted_right[i]]
if v_right in trans:
# The vals[i] lies directly on the DST border.
result_b[i] = v_right
else:
pos_right = bisect_right_i8(tdata, v_right, ntrans) - 1
# timestamp falls to the right side of the DST transition
if v_right + deltas[pos_right] == vals[i]:
result_b[i] = v_right

if infer_dst:
dst_hours = np.empty(n, dtype=np.int64)
Expand Down
20 changes: 20 additions & 0 deletions pandas/tests/test_resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -2125,6 +2125,26 @@ def test_downsample_across_dst(self):
freq='H'))
tm.assert_series_equal(result, expected)

def test_bin_edges_on_DST_transition(self):
# GH 10117
# Ends on DST boundary
idx = date_range("2014-10-26 00:30:00", "2014-10-26 02:30:00",
freq="30T", tz="Europe/London")
expected = Series(range(len(idx)), index=idx)
result = expected.resample('30T').mean()
tm.assert_series_equal(result, expected)

# Starts on DST boundary
idx = date_range('2014-03-09 03:00', periods=4,
freq='H', tz='America/Chicago')
s = Series(range(len(idx)), index=idx)
result = s.resample('H', label='right', closed='right').sum()
expected = Series([1, 2, 3], index=date_range('2014-03-09 04:00',
periods=3,
freq='H',
tz='America/Chicago'))
tm.assert_series_equal(result, expected)

def test_resample_with_nat(self):
# GH 13020
index = DatetimeIndex([pd.NaT,
Expand Down