Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: output of a transform is cast to dtype of input #10972

Closed
TomAugspurger opened this issue Sep 2, 2015 · 4 comments
Closed

BUG: output of a transform is cast to dtype of input #10972

TomAugspurger opened this issue Sep 2, 2015 · 4 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Groupby
Milestone

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 2, 2015

xref #11444, #13046 for addtl tests

In [27]: df = pd.DataFrame({'a': np.random.randint(0, 5, 365), 'b': pd.date_range('2015-01-01', periods=365, freq='D')})

In [28]: df.head()
Out[28]:
   a          b
0  3 2015-01-01
1  3 2015-01-02
2  4 2015-01-03
3  2 2015-01-04
4  4 2015-01-05

In [29]: df.groupby('a').b.transform(lambda x: x.dt.dayofweek - x.dt.dayofweek.mean()).head()
Out[29]:
0   1970-01-01 00:00:00.000000000
1   1970-01-01 00:00:00.000000001
2   1970-01-01 00:00:00.000000001
3   1970-01-01 00:00:00.000000002
4   1969-12-31 23:59:59.999999997
Name: b, dtype: datetime64[ns]

I expected a float. No idea how difficult this will be so I marked it for 0.18. I won't have time to get to it any earlier, but if someone else wants to...

@TomAugspurger TomAugspurger added Groupby Dtype Conversions Unexpected or buggy dtype conversions labels Sep 2, 2015
@TomAugspurger TomAugspurger added this to the 0.18.0 milestone Sep 2, 2015
@jreback
Copy link
Contributor

jreback commented Sep 2, 2015

This is only a problem with transform; apply does this kind of inference

In [6]: df.groupby('a').b.apply(lambda x: x.dt.dayofweek - x.dt.dayofweek.mean()).head()
Out[6]: 
0    0.214286
1    1.054795
2    1.837209
3    2.837209
4   -3.162791
dtype: float64

@jreback jreback modified the milestones: Next Major Release, 0.18.0 Sep 2, 2015
@TomAugspurger
Copy link
Contributor Author

Yeah, I've switched to apply for now. My actual case was transforming an integer to categorical (which raised an exception).

@jreback
Copy link
Contributor

jreback commented Sep 2, 2015

doesn't make sense to transform int->cat, rather just .astype

@TomAugspurger
Copy link
Contributor Author

Not that simple in my case. Have to groupby a level and do some shift / diff logic to get my result.

@jreback jreback modified the milestones: 0.18.1, Next Major Release Mar 12, 2016
@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 26, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Aug 21, 2016
jreback added a commit to jreback/pandas that referenced this issue Feb 27, 2017
The transform() operation needs to return a like-indexed. To
facilitate this, transform starts with a copy of the original series.
Then, after the computation for each group, sets the appropriate
elements of the copied series equal to the result. At that point is
does a type comparison, and discovers that the timedelta is not cast-
able to a datetime.

closes pandas-dev#10972

Author: Jeff Reback <[email protected]>
Author: Stephen Rauch <[email protected]>

Closes pandas-dev#15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits:

c3b0dd0 [Jeff Reback] PEP fix
2f48549 [Jeff Reback] fixup slow transforms
cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime
jreback added a commit to jreback/pandas that referenced this issue Feb 27, 2017
The transform() operation needs to return a like-indexed. To
facilitate this, transform starts with a copy of the original series.
Then, after the computation for each group, sets the appropriate
elements of the copied series equal to the result. At that point is
does a type comparison, and discovers that the timedelta is not cast-
able to a datetime.

closes pandas-dev#10972

Author: Jeff Reback <[email protected]>
Author: Stephen Rauch <[email protected]>

Closes pandas-dev#15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits:

c3b0dd0 [Jeff Reback] PEP fix
2f48549 [Jeff Reback] fixup slow transforms
cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
The transform() operation needs to return a like-indexed. To
facilitate this, transform starts with a copy of the original series.
Then, after the computation for each group, sets the appropriate
elements of the copied series equal to the result. At that point is
does a type comparison, and discovers that the timedelta is not cast-
able to a datetime.

closes pandas-dev#10972

Author: Jeff Reback <[email protected]>
Author: Stephen Rauch <[email protected]>

Closes pandas-dev#15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits:

c3b0dd0 [Jeff Reback] PEP fix
2f48549 [Jeff Reback] fixup slow transforms
cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Groupby
Projects
None yet
3 participants