Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: fillna on datetime64[ns, UTC] column hits RecursionError #38851

Closed
akrherz opened this issue Dec 31, 2020 · 6 comments · Fixed by #39194
Closed

REGR: fillna on datetime64[ns, UTC] column hits RecursionError #38851

akrherz opened this issue Dec 31, 2020 · 6 comments · Fixed by #39194
Labels
Bug Internals Related to non-user accessible pandas implementation Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype
Milestone

Comments

@akrherz
Copy link

akrherz commented Dec 31, 2020

Code Sample

from datetime import datetime, timezone
import pandas as pd

df = pd.DataFrame(range(31))
df["dt"] = pd.date_range("2020/12/01", "2020/12/31", tz="UTC")
df["dt"].iloc[5] = pd.NaT
df["dt"] = df["dt"].fillna(datetime(1980, 1, 1, tzinfo=timezone.utc))

Problem description

pandas 1.2.0 is throwing a RecursionError while attempting to fillna on a datetime64[ns, UTC] column. Same code worked in prior releases.

raceback (most recent call last):
  File "recursion.py", line 7, in <module>
    df["dt"] = df["dt"].fillna(utc(1980, 1, 1, tzinfo=timezone.utc))
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/series.py", line 4433, in fillna
    return super().fillna(
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/generic.py", line 6403, in fillna
    new_data = self._mgr.fillna(
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 621, in fillna
    return self.apply(
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 461, in fillna
    return self.split_and_operate(None, f, inplace)
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 516, in split_and_operate
    nv = f(mask, new_values, None)
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 459, in f
    return block.fillna(value, limit=limit, inplace=inplace, downcast=None)
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 461, in fillna
    return self.split_and_operate(None, f, inplace)
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 516, in split_and_operate
    nv = f(mask, new_values, None)
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 459, in f
    return block.fillna(value, limit=limit, inplace=inplace, downcast=None)
...snipped...
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 441, in fillna
    if self._can_hold_element(value):
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 2267, in _can_hold_element
    tipo = maybe_infer_dtype_type(element)
  File "/opt/miniconda3/envs/prod/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 924, in maybe_infer_dtype_type
    elif is_list_like(element):
  File "pandas/_libs/lib.pyx", line 1033, in pandas._libs.lib.is_list_like
  File "pandas/_libs/lib.pyx", line 1038, in pandas._libs.lib.c_is_list_like
  File "/opt/miniconda3/envs/prod/lib/python3.8/abc.py", line 98, in __instancecheck__
    return _abc_instancecheck(cls, instance)
RecursionError: maximum recursion depth exceeded in comparison

Expected Output

Output of pd.show_versions()

pd.show_versions()
/opt/miniconda3/envs/prod/lib/python3.8/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit : 3e89b4c
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-240.8.1.el8_3.x86_64
Version : #1 SMP Fri Dec 4 12:24:03 EST 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0
numpy : 1.19.4
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 50.0.0.post20200830
Cython : 0.29.21
pytest : 6.2.1
hypothesis : None
sphinx : 3.4.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : 0.8.5
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.3
sqlalchemy : 1.3.22
tables : None
tabulate : None
xarray : 0.16.2
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.52.0

Thank you.

@akrherz akrherz added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 31, 2020
@akrherz
Copy link
Author

akrherz commented Dec 31, 2020

Note that dropping the tzinfo on the fillna datetime object does not reproduce this issue.

from datetime import datetime, timezone
import pandas as pd

df = pd.DataFrame(range(31))
df["dt"] = pd.date_range("2020/12/01", "2020/12/31", tz="UTC")
df["dt"].iloc[5] = pd.NaT
df["dt"] = df["dt"].fillna(datetime(1980, 1, 1))

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Dec 31, 2020
@simonjayhawkins simonjayhawkins changed the title BUG: fillna on datetime64[ns, UTC] column hits RecursionError REGR: fillna on datetime64[ns, UTC] column hits RecursionError Dec 31, 2020
@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 31, 2020
@simonjayhawkins simonjayhawkins added this to the 1.2.1 milestone Dec 31, 2020
@simonjayhawkins
Copy link
Member

Thanks @akrherz for the report

pandas 1.2.0 is throwing a RecursionError while attempting to fillna on a datetime64[ns, UTC] column. Same code worked in prior releases.

first bad commit: [58f7468] CLN: remove unnecessary DatetimeTZBlock.fillna (#37040) cc @jbrockmendel

@simonjayhawkins simonjayhawkins added Internals Related to non-user accessible pandas implementation Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Dec 31, 2020
@jbrockmendel
Copy link
Member

Working on a fix for this.

The tricky part is what to do when we have two different UTC dtypes.

df["dt"] = pd.date_range("2020/12/01", "2020/12/31", tz="UTC") uses pytz.utc and then the fill value uses timezone.utc.

import pandas as pd
import pytz
from pandas._libs.tslibs.timezones import tz_compare
from pandas.core.dtypes.common import is_dtype_equal

dti = pd.date_range("2020/12/01", "2020/12/31", tz="UTC")
dti2 = pd.date_range("2020/12/01", "2020/12/31", tz=timezone.utc)

>>> is_dtype_equal(dti.dtype, dti2.dtype)
True
>>> tz_compare(dti.tz, dti2.tz)
False

Because the tzs dont match (#23959) and we dont allow setting mismatched tzs into DatetimeArray (#37605), we cast to a common dtype before trying again. But since the dtypes are considered equal, that dtype is unchanged, so we get a RecursionError.

So three options for how to fix this: 1) change tz_compare, 2) change DatetimeArray's setitem behavior, or 3) change DatetimeTZDtype.__eq__.

I lean towards option 1

@simonjayhawkins
Copy link
Member

Thanks @jbrockmendel for investigating this. We will want to backport a fix, so won't want any changes of behaviour.

The commit that broke the old behaviour was a clean-up, so option 4 is we could revert and thereafter, reapply to master with any fix that changes behaviour deferred to 1.3.

@jbrockmendel
Copy link
Member

so option 4 is we could revert and thereafter, reapply to master with any fix that changes behaviour deferred to 1.3.

works for me

@jorisvandenbossche
Copy link
Member

@jbrockmendel could you do the revert as a short-term fix?

For the long term:

So three options for how to fix this: 1) change tz_compare, 2) change DatetimeArray's setitem behavior, or 3) change DatetimeTZDtype.eq.

I lean towards option 1

Looking at the discussion in #37605, it seems we might want to change the setitem behaviour, so that could be a reason to go for 2) ?

Now, fixing tz_compare for this case might be a worthwhile fix on it own as well? In which case 1) might be the easier option, as a fix doesn't need to depend on the API change of option 2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Internals Related to non-user accessible pandas implementation Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants