Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 40420: Interpret NaN in clip() as no bound. #40927

Merged
merged 20 commits into from
Apr 23, 2021

Conversation

DriesSchaumont
Copy link
Member

@DriesSchaumont DriesSchaumont commented Apr 13, 2021

@DriesSchaumont
Copy link
Member Author

@dsaxton, I have openend this draft PR to fix #40420. Just to be clear: is the behavior proposed in this PR the expected one?
I still need to do some cleanup on the proposed code, but I wanted to make sure that the behavior we are targetting is correct.

@dsaxton
Copy link
Member

dsaxton commented Apr 14, 2021

@dsaxton, I have openend this draft PR to fix #40420. Just to be clear: is the behavior proposed in this PR the expected one?
I still need to do some cleanup on the proposed code, but I wanted to make sure that the behavior we are targetting is correct.

Yes, I believe according to @jreback and this issue we want to treat any "nullish" value as no bound: #17276

@DriesSchaumont DriesSchaumont marked this pull request as ready for review April 15, 2021 15:11
@DriesSchaumont DriesSchaumont marked this pull request as draft April 15, 2021 17:00
@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 16, 2021
@DriesSchaumont DriesSchaumont marked this pull request as ready for review April 16, 2021 19:14
@DriesSchaumont
Copy link
Member Author

I think non-green is not related here.

@DriesSchaumont DriesSchaumont requested a review from jreback April 18, 2021 20:23
@jreback jreback added this to the 1.3 milestone Apr 23, 2021
@jreback jreback merged commit 714f7d7 into pandas-dev:master Apr 23, 2021
@jreback
Copy link
Contributor

jreback commented Apr 23, 2021

thanks @DriesSchaumont

@DriesSchaumont DriesSchaumont deleted the fix-40420 branch April 23, 2021 15:18
yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request May 6, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
# Treat missing thresholds as no bounds, not clipping the values
if is_list_like(threshold):
fill_value = np.inf if method.__name__ == "le" else -np.inf
threshold_inf = threshold.fillna(fill_value)
Copy link
Contributor

@ms7463 ms7463 Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes an issue when working with datetime series (see here: #44785). Since downstream it will try to compare a float to a Timestamp object. There's probably a more robust solution, but a potential quick/easy fix would be to check the dtype here and use an appropriate alternative to np.inf (e.g. pd.Timestamp.max/min).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: clip where the bound is a series with NA values returns NA
4 participants