Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Using DateOffset with shift on a daylight savings transition produces error #57837

Open
3 tasks done
martheveldhuis opened this issue Mar 14, 2024 · 3 comments
Open
3 tasks done
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@martheveldhuis
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

dt = pd.date_range("2024-03-31 00:00", "2024-03-31 07:00", freq="1h", tz="utc")
df = pd.DataFrame(index=dt, data={"A":range(0, len(dt))})

df_nl = df.tz_convert(tz="Europe/Amsterdam")
df_nl["B"] = df_nl["A"].shift(freq=pd.DateOffset(hours=1))

Issue Description

This last line gives an error:

pytz.exceptions.NonExistentTimeError: 2024-03-31 02:00:00

With full traceback:

  File "<stdin>", line 1, in <module>
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/generic.py", line 11230, in shift
    return self._shift_with_freq(periods, axis, freq)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/generic.py", line 11263, in _shift_with_freq
    new_ax = index.shift(periods, freq)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/indexes/datetimelike.py", line 503, in shift
    return self + offset
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/ops/common.py", line 76, in new_method
    return method(self, other)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/arraylike.py", line 186, in __add__
    return self._arith_method(other, operator.add)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 7238, in _arith_method
    return super()._arith_method(other, op)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/base.py", line 1382, in _arith_method
    result = ops.arithmetic_op(lvalues, rvalues, op)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/ops/array_ops.py", line 273, in arithmetic_op
    res_values = op(left, right)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/ops/common.py", line 76, in new_method
    return method(self, other)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/arrays/datetimelike.py", line 1372, in __add__
    result = self._add_offset(other)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 828, in _add_offset
    result = result.tz_localize(self.tz)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/arrays/_mixins.py", line 81, in method
    return meth(self, *args, **kwargs)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 1088, in tz_localize
    new_dates = tzconversion.tz_localize_to_utc(
  File "tzconversion.pyx", line 431, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc

Expected Behavior

This would be the desired ouput:

                           A B
2024-03-31 01:00:00+01:00  0 NaN
2024-03-31 03:00:00+02:00  1 NaN
2024-03-31 04:00:00+02:00  2 1
2024-03-31 05:00:00+02:00  3 2
2024-03-31 06:00:00+02:00  4 3
2024-03-31 07:00:00+02:00  5 4
2024-03-31 08:00:00+02:00  6 5
2024-03-31 09:00:00+02:00  7 6

The point of converting a UTC timeseries to Europe/Amsterdam time is that I want to look up behaviour of people, which stays consistent to their timezone. E.g. if someone goes to work every day at 08:00, that remains at 08:00 in their timezone, even after the daylight savings shift. In UTC, that person appears to leave one hour earlier (at 07:00). By converting to Europe/Amsterdam time, then shifting, this should be handled correctly.

Installed Versions

commit : bdc79c1
python : 3.10.11.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-1040-azure
Version : #47~20.04.1-Ubuntu SMP Fri Jun 2 21:38:08 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.1
numpy : 1.25.0
pytz : 2024.1
dateutil : 2.8.2
setuptools : 67.8.0
pip : 23.1.2
Cython : 0.29.35
pytest : 8.1.1
hypothesis : 6.99.5
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2023.6.0
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 15.0.1
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
sqlalchemy : 2.0.16
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@martheveldhuis martheveldhuis added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 14, 2024
@rok
Copy link
Contributor

rok commented Apr 4, 2024

This works as a workaround:

df_nl["B"] = df_nl["A"].shift(freq=pd.Timedelta("1h"))

Ideally shift would handle pd.DateOffset DST jumps.

@kdebrab
Copy link
Contributor

kdebrab commented Jul 18, 2024

Another example:

pd.Timestamp("2024-04-25", tz="Africa/Cairo") + pd.DateOffset(days=1)

which raises

pytz.exceptions.NonExistentTimeError: 2024-04-26 00:00:00

I think the best solution would be to add the options 'nonexistent' and 'ambiguous' to pd.DateOffset (similar as we have for e.g. the floor method), such that one can do:

pd.Timestamp("2024-04-25", tz="Africa/Cairo") + pd.DateOffset(days=1, nonexistent="shift_forward", ambiguous=False)

and get as result:

Timestamp('2024-04-26 01:00:00+0300', tz='Africa/Cairo')

I think that having this capability will also make it easier to resolve bugs like #58380 and #51211.

@martheveldhuis
Copy link
Author

martheveldhuis commented Jul 22, 2024

This works as a workaround:

df_nl["B"] = df_nl["A"].shift(freq=pd.Timedelta("1h"))

Ideally shift would handle pd.DateOffset DST jumps.

Unfortunately, this doesn't solve the issue. I will provide a better example:

start_date = pd.to_datetime("2024-03-30 07:00:00").tz_localize("Europe/Amsterdam")
end_date = start_date + timedelta(weeks=1)
datetime_index = pd.date_range(start=start_date, end=end_date, freq="h")
df = pd.DataFrame({"A": range(len(datetime_index))}, index=datetime_index)
df["B"] = df["A"].shift(freq=pd.Timedelta(weeks=1))
print(df)

Which outputs:

                             A    B
2024-03-30 07:00:00+01:00    0  NaN
2024-03-30 08:00:00+01:00    1  NaN
2024-03-30 09:00:00+01:00    2  NaN
2024-03-30 10:00:00+01:00    3  NaN
2024-03-30 11:00:00+01:00    4  NaN
...                        ...  ...
2024-04-06 04:00:00+02:00  164  NaN
2024-04-06 05:00:00+02:00  165  NaN
2024-04-06 06:00:00+02:00  166  NaN
2024-04-06 07:00:00+02:00  167  NaN
2024-04-06 08:00:00+02:00  168  0.0

[169 rows x 2 columns]

Even though I would expect:

                             A    B
2024-03-30 07:00:00+01:00    0  NaN
2024-03-30 08:00:00+01:00    1  NaN
2024-03-30 09:00:00+01:00    2  NaN
2024-03-30 10:00:00+01:00    3  NaN
2024-03-30 11:00:00+01:00    4  NaN
...                        ...  ...
2024-04-06 04:00:00+02:00  164  NaN
2024-04-06 05:00:00+02:00  165  NaN
2024-04-06 06:00:00+02:00  166  NaN
2024-04-06 07:00:00+02:00  167  0.0
2024-04-06 08:00:00+02:00  168  1.0

[169 rows x 2 columns]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants