Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: inconsistencies/errors in quantile on empty DataFrame #14564

Closed
jorisvandenbossche opened this issue Nov 2, 2016 · 3 comments · Fixed by #41493
Closed

BUG: inconsistencies/errors in quantile on empty DataFrame #14564

jorisvandenbossche opened this issue Nov 2, 2016 · 3 comments · Fixed by #41493
Labels
API - Consistency Internal Consistency of API/Behavior good first issue Needs Tests Unit test(s) needed to prevent regressions quantile quantile method

Comments

@jorisvandenbossche
Copy link
Member

In PR #14536, I added some tests in comments, as they currently fail or give inconsistent results:

1. Empty frame with float dtype:

df = DataFrame(columns=['a', 'b'], dtype='float64')
df.quantile(0.5)
df.quantile(0.5, axis=1)

In 0.18.1, this gives NaNs or empty frame depending on the axis (which is correct I think):

In [10]: df.quantile(0.5)
Out[10]: 
a   NaN
b   NaN
dtype: float64

In [9]: df.quantile(0.5, axis=1)
Out[9]: 
Empty DataFrame
Columns: []
Index: []

But on master, the axis=1 case errors (df.quantile(0.5) also gives NaNs):
master:

In [8]: df.quantile(0.5, axis=1)
...
ValueError: need at least one array to concatenate

2. Empty frame with int dtype

df = DataFrame(columns=['a', 'b'], dtype='int64')
df.quantile(0.5)

Opposed to float dtype giving a series of NaNs, with integers it gives an empty frame in 0.18.1:

In [11]: df.quantile(0.5)
Out[11]: 
Empty DataFrame
Columns: []
Index: []

and on master also raises the ValueError as for float with axis=1:

In [14]: df.quantile(0.5)
...
ValueError: need at least one array to concatenate

3. Empty frame with datetime values

df = DataFrame(columns=['a', 'b'], dtype='datetime64')
df.quantile(0.5, numeric_only=False)

On 0.18.1 / master, it gives a series of NaNs, where this should be NaTs:

In [13]: df.quantile(0.5, numeric_only=False)
Out[13]: 
a   NaN
b   NaN
Name: 0.5, dtype: float64

4. Frame with only only datetime columns but without only_numeric=False

df = DataFrame({'a': pd.to_datetime(['2010', '2011']), 'b': [0, 5], 'c': pd.to_datetime(['2011', '2012'])})
df[['a', 'c']].quantile(.5)

On 0.18.1, this gives an empty frame

In [8]: df[['a', 'c']].quantile(.5)
Out[8]: 
Empty DataFrame
Columns: []
Index: []

while on master this raises the same ValueError as above:

In [7]: df[['a', 'c']].quantile(.5)
...
ValueError: need at least one array to concatenate
@jorisvandenbossche jorisvandenbossche added Bug Numeric Operations Arithmetic, Comparison, and Logical operations labels Nov 2, 2016
@jorisvandenbossche jorisvandenbossche added this to the Next Major Release milestone Nov 2, 2016
@rogeriomgatto
Copy link

There are also inconsistencies in groupby behaviour (0.19.1):

>>> df = pd.DataFrame({'a': np.zeros(0, dtype='int'), 'b': np.zeros(0, dtype='int')})

>>> df
Empty DataFrame
Columns: [a, b]
Index: []

>>> df.dtypes
a    int64
b    int64
dtype: object

>>> df.groupby(level=0).mean()
Empty DataFrame
Columns: [a, b]
Index: []

>>> df.groupby(level=0).quantile(0.5)
Empty DataFrame
Columns: []
Index: []

@jreback
Copy link
Contributor

jreback commented Feb 6, 2017

@rogeriomgatto pull-requests to fix are welcome, and get things done!

@mroeschke
Copy link
Member

mroeschke commented Apr 5, 2020

Looks like all these cases work on master (except the df = DataFrame(columns=['a', 'b'], dtype='datetime64') df.quantile(0.5, numeric_only=False) case returning NaTs). Could use test for the other cases.

We can create an independent issue for the case that is not working once these regression tests are added.

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Numeric Operations Arithmetic, Comparison, and Logical operations quantile quantile method labels Apr 5, 2020
@jbrockmendel jbrockmendel added API - Consistency Internal Consistency of API/Behavior quantile quantile method labels Sep 20, 2020
@mroeschke mroeschke modified the milestones: Contributions Welcome, 1.3 May 16, 2021
@jreback jreback modified the milestones: 1.3, Contributions Welcome May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior good first issue Needs Tests Unit test(s) needed to prevent regressions quantile quantile method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants