feat: is_nan #1583

MarcoGorelli · 2024-12-13T14:50:57Z

Looks like this would be useful to Tubular: azukds/tubular#349 (comment)

There would need to be a caveat in the docstring about how pandas doesn't distinguish between nan and null but other libraries typically do

This requires some attention to check that we get such nan vs null details correct:

for Polars, there is already is_nan
for PyArrow, there is https://arrow.apache.org/docs/python/generated/pyarrow.compute.is_nan.html
for pandas, we need to be careful. We don't want is_nan to pick up NaT / None / anything else which is used for missing values, but instead, specifically, we only want nan. So, we should check
- that the dtype is Float
- that s != s

The text was updated successfully, but these errors were encountered:

camriddell · 2024-12-17T16:00:56Z

For a specific nan do you want to disambiguate between Float64 & float64 in pandas? Where the latter would have the traditional nan, but the former would coerce all present nan to NA?

from pandas import NA, Series
from numpy import nan

s = Series([0, NA, nan, 3], dtype='object')

print(
    s,                                     # both (object dtype)
    s.astype('Float64'),                   # pd.NA
    s.astype('Int64'),                     # pd.NA
    s.astype('Float64').astype('float64'), # np.nan
    sep=f'\n{"-" * 40}\n'
)

MarcoGorelli · 2024-12-17T16:46:35Z

Float64 only coerces to NA on IO, but nan can still arise:

In [4]: s = pd.Series([1, 0, None], dtype='Float64')

In [5]: s / s
Out[5]:
0     1.0
1     NaN
2    <NA>
dtype: Float64

So here the expected output would be [False, True, NA]

camriddell · 2024-12-18T17:52:40Z

Last question- any reason to prefer value != value over using numpy.isnan for this operation?

MarcoGorelli added enhancement New feature or request good first issue Good for newcomers, but anyone is welcome to submit a pull request! and removed good first issue Good for newcomers, but anyone is welcome to submit a pull request! labels Dec 13, 2024

camriddell mentioned this issue Dec 19, 2024

feat: add is_nan expression & series method #1625

Merged

10 tasks

MarcoGorelli closed this as completed in #1625 Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: is_nan #1583

feat: is_nan #1583

MarcoGorelli commented Dec 13, 2024 •

edited

Loading

camriddell commented Dec 17, 2024

MarcoGorelli commented Dec 17, 2024

camriddell commented Dec 18, 2024

feat: is_nan #1583

feat: is_nan #1583

Comments

MarcoGorelli commented Dec 13, 2024 • edited Loading

camriddell commented Dec 17, 2024

MarcoGorelli commented Dec 17, 2024

camriddell commented Dec 18, 2024

MarcoGorelli commented Dec 13, 2024 •

edited

Loading