ValueError when trying to compute Quantile #14357

Rubyj · 2016-10-05T18:29:37Z

In [7]: df = pd.DataFrame(np.random.randn(10, 2))

In [8]: df.iloc[1, 1] = np.nan

In [9]: df.quantile(.5)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-66d518aa86c6> in <module>()
----> 1 df.quantile(.5)

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas-0.19.0rc1+21.ge596cbf-py3.5-macosx-10.11-x86_64.egg/pandas/core/frame.py in quantile(self, q, axis, numeric_only, interpolation)
   5152                                      axis=1,
   5153                                      interpolation=interpolation,
-> 5154                                      transposed=is_transposed)
   5155
   5156         if result.ndim == 2:

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas-0.19.0rc1+21.ge596cbf-py3.5-macosx-10.11-x86_64.egg/pandas/core/internals.py in quantile(self, **kwargs)
   3142
   3143     def quantile(self, **kwargs):
-> 3144         return self.reduction('quantile', **kwargs)
   3145
   3146     def setitem(self, **kwargs):

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas-0.19.0rc1+21.ge596cbf-py3.5-macosx-10.11-x86_64.egg/pandas/core/internals.py in reduction(self, f, axis, consolidate, transposed, **kwargs)
   3071         for b in self.blocks:
   3072             kwargs['mgr'] = self
-> 3073             axe, block = getattr(b, f)(axis=axis, **kwargs)
   3074
   3075             axes.append(axe)

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas-0.19.0rc1+21.ge596cbf-py3.5-macosx-10.11-x86_64.egg/pandas/core/internals.py in quantile(self, qs, interpolation, axis, mgr)
   1325             values = _block_shape(values[~mask], ndim=self.ndim)
   1326             if self.ndim > 1:
-> 1327                 values = values.reshape(result_shape)
   1328
   1329         from pandas import Float64Index

ValueError: total size of new array must be unchanged

original post follows

I have a simple dataframe that I created as follows:

df[df['Week of'] == week]

where week is a week name I'm filtering by

I have been taking the quartile values of this dataframe as follows:

df[df['Week of'] == week].quantile(.25)

However since the update to Pandas 0.19 I am receiving the error (this code worked fine before):

values = values.reshape(result_shape)
ValueError: total size of new array must be unchanged

The text was updated successfully, but these errors were encountered:

chris-b1 · 2016-10-05T18:49:03Z

Can you please make this a fully reproducible example with dummy data?

Rubyj · 2016-10-05T19:08:57Z

I have tracked this error down to there being NaN values in some, but not all, of the columns for a row (2 out of 10 in this case). I then tried to compute the quartile of that DF and pandas did not like this. My solution is to plug the NaN values with 0.

TomAugspurger · 2016-10-05T21:17:17Z

Edited in a reproducible example. Hard to say for sure, but maybe related to 4de83d2

It's definitely related to a (float) block having some cols with missing values:

In [11]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)))
In [13]: df.iloc[1, 1] = np.nan

In [14]: df.quantile(.5)
Out[14]:
0    4.5
1    7.0
Name: 0.5, dtype: float64

and

In [15]: df = pd.DataFrame(np.random.randn(10, 2))

In [17]: df.iloc[0, :] = np.nan

In [18]: df.quantile(.5)
Out[18]:
0    0.347815
1    0.072105
Name: 0.5, dtype: float64

both work

jreback · 2016-10-06T14:26:11Z

In [10]: pd.__version__
Out[10]: '0.19.0'

In [11]: np.random.seed(1234)

In [12]: df = pd.DataFrame(np.random.randn(10, 2))
    ...:
    ...:
    ...: df.iloc[0, :] = np.nan
    ...:

In [13]: df
Out[13]:
          0         1
0       NaN       NaN
1  1.432707 -0.312652
2 -0.720589  0.887163
3  0.859588 -0.636524
4  0.015696 -2.242685
5  1.150036  0.991946
6  0.953324 -2.021255
7 -0.334077  0.002118
8  0.405453  0.289092
9  1.321158 -1.546906

In [14]: df.median()
Out[14]:
0    0.859588
1   -0.312652
dtype: float64

In [15]: df.quantile(0.5)
Out[15]:
0    0.859588
1   -0.312652
Name: 0.5, dtype: float64

In [16]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)))
    ...: df.iloc[1, 1] = np.nan
    ...:
    ...:

In [17]: df
Out[17]:
   0    1
0  0  3.0
1  2  NaN
2  1  3.0
3  1  3.0
4  7  1.0
5  7  4.0
6  0  5.0
7  1  5.0
8  9  9.0
9  4  0.0

In [18]: df.median()
Out[18]:
0    1.5
1    3.0
dtype: float64

In [19]: df.quantile(0.5)
Out[19]:
0    1.5
1    3.0
Name: 0.5, dtype: float64

jreback · 2016-10-06T14:26:31Z

@Rubyj you'll have to show a complete end-to-end reproducible example. This was a bug in 0.18.1 but is correct in 0.19.0.

TomAugspurger · 2016-10-06T14:45:12Z

@jreback the problem seems to be a DataFrame with a FloatBloack that has at least 1 col with no missing values and at least 1 col with some missing values (see my edit at the top of the OP)

Rubyj · 2016-10-06T14:46:29Z

@jreback

@TomAugspurger provided a reproducible example for me in my original post and added the labels that you removed. Not sure if you saw that. Thank you Tom 👍

jreback · 2016-10-07T10:39:49Z

@TomAugspurger your example works, I see that you changed the top of post. thanks.

jreback · 2016-10-07T10:43:33Z

so in this case, the individual dims needs to be iterated (corresponding with the columns). with the quantiling then combined, rather than doing this all at once. numpy doesn't handle the nans in the quantiling.

Rubyj changed the title ~~ValueError when trying to compute Quartile~~ ValueError when trying to compute Quantile Oct 5, 2016

TomAugspurger added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version Numeric Operations Arithmetic, Comparison, and Logical operations labels Oct 5, 2016

TomAugspurger added this to the 0.19.1 milestone Oct 5, 2016

TomAugspurger added Effort Medium and removed Effort Medium labels Oct 5, 2016

jreback removed this from the 0.19.1 milestone Oct 6, 2016

jreback removed Difficulty Intermediate Regression Functionality that used to work in a prior pandas version labels Oct 6, 2016

jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Oct 6, 2016

jorisvandenbossche added this to the 0.19.1 milestone Oct 6, 2016

jorisvandenbossche mentioned this issue Oct 29, 2016

BUG: DataFrame.quantile with NaNs (GH14357) #14536

Merged

jorisvandenbossche closed this as completed in #14536 Nov 2, 2016

jreback mentioned this issue Feb 20, 2017

Quantile fails when only NaNs on some rows/columns #15460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError when trying to compute Quantile #14357

ValueError when trying to compute Quantile #14357

Rubyj commented Oct 5, 2016 •

edited

Loading

chris-b1 commented Oct 5, 2016

Rubyj commented Oct 5, 2016

TomAugspurger commented Oct 5, 2016

jreback commented Oct 6, 2016

jreback commented Oct 6, 2016

TomAugspurger commented Oct 6, 2016 •

edited

Loading

Rubyj commented Oct 6, 2016 •

edited

Loading

jreback commented Oct 7, 2016

jreback commented Oct 7, 2016

ValueError when trying to compute Quantile #14357

ValueError when trying to compute Quantile #14357

Comments

Rubyj commented Oct 5, 2016 • edited Loading

chris-b1 commented Oct 5, 2016

Rubyj commented Oct 5, 2016

TomAugspurger commented Oct 5, 2016

jreback commented Oct 6, 2016

jreback commented Oct 6, 2016

TomAugspurger commented Oct 6, 2016 • edited Loading

Rubyj commented Oct 6, 2016 • edited Loading

jreback commented Oct 7, 2016

jreback commented Oct 7, 2016

Rubyj commented Oct 5, 2016 •

edited

Loading

TomAugspurger commented Oct 6, 2016 •

edited

Loading

Rubyj commented Oct 6, 2016 •

edited

Loading