-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: Row series are broken after applying to_datetime() #36785
Comments
I added another manifestation of the underlying issue: df.apply(lambda d: to_datetime(d, utc=True), axis=0).apply(lambda x: x.dropna().apply(lambda y: getattr(y, 'year')), axis=1) does not work any longer, even though it used to work in 1.0.4 where it was producing:
while on 1.1.3/master it raises: |
A workaround: def workaround(dates):
return pd.Series(list(dates), index=dates.index) # workaround
df.apply(lambda d: to_datetime(d, utc=True), axis=0).apply(lambda x: workaround(x).dropna().apply(lambda y: getattr(y, 'year')), axis=1) Interestingly:
This a pure manifestation of the underlying issue and at odds with the documentation which says that |
Hi, thanks for your report. This was introduced by 91802a9 The bug occurs in Line 411 in b76031d
The array has dtype object and changes the dtype of the BlockManager. Would using sanitize_array be a good solution here? This was used before to convert the array to the correct dtype. Tests seem to pass with this.
|
this is extremely strange to do and unless you can demonstrate that this is a real case -1 why are you not just directly using to_datetime? what is the point of apply here at all? |
The first apply is not the problem. The following
raises
|
on 1.0.5
on 1.1.4
so it appears that the dtype of the Series passed to the user supplied function as changed. |
I think the fix should ensure that what is passed to the user function is unchanged from 1.0.5 to avoid other regressions being reported. |
Looks like the problem is inside FrameColumnApply.series_generator, need to check dtype match when setting blk.values |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
Assigning the output of
pd.to_datetime
to a column of a dataframe, although not demonstrated in documentation is a popular use of this helper function. In previous versions of pandas (1.0.x) it was possible to convert multiple columns usingto_datetime
in combination withDataFrame.apply
. It is still possible in 1.1.2 and master:However, the rows in the subsequent apply operations are broken for certain operations. For example, trying to print them out raises:
TypeError: cannot concatenate object of type '<class 'numpy.ndarray'>'; only Series and DataFrame objs are valid
. This was not the case in pandas 1.0.4.Expected Output
Should not raise.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 2a7d332
python : 3.8.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-48-generic
Version : #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.1.2
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.2.3
setuptools : 41.2.0
Cython : None
pytest : 5.3.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.49.0
The text was updated successfully, but these errors were encountered: