Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError when trying to compute Quantile #14357

Closed
Rubyj opened this issue Oct 5, 2016 · 9 comments
Closed

ValueError when trying to compute Quantile #14357

Rubyj opened this issue Oct 5, 2016 · 9 comments
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@Rubyj
Copy link

Rubyj commented Oct 5, 2016

In [7]: df = pd.DataFrame(np.random.randn(10, 2))

In [8]: df.iloc[1, 1] = np.nan

In [9]: df.quantile(.5)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-66d518aa86c6> in <module>()
----> 1 df.quantile(.5)

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas-0.19.0rc1+21.ge596cbf-py3.5-macosx-10.11-x86_64.egg/pandas/core/frame.py in quantile(self, q, axis, numeric_only, interpolation)
   5152                                      axis=1,
   5153                                      interpolation=interpolation,
-> 5154                                      transposed=is_transposed)
   5155
   5156         if result.ndim == 2:

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas-0.19.0rc1+21.ge596cbf-py3.5-macosx-10.11-x86_64.egg/pandas/core/internals.py in quantile(self, **kwargs)
   3142
   3143     def quantile(self, **kwargs):
-> 3144         return self.reduction('quantile', **kwargs)
   3145
   3146     def setitem(self, **kwargs):

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas-0.19.0rc1+21.ge596cbf-py3.5-macosx-10.11-x86_64.egg/pandas/core/internals.py in reduction(self, f, axis, consolidate, transposed, **kwargs)
   3071         for b in self.blocks:
   3072             kwargs['mgr'] = self
-> 3073             axe, block = getattr(b, f)(axis=axis, **kwargs)
   3074
   3075             axes.append(axe)

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas-0.19.0rc1+21.ge596cbf-py3.5-macosx-10.11-x86_64.egg/pandas/core/internals.py in quantile(self, qs, interpolation, axis, mgr)
   1325             values = _block_shape(values[~mask], ndim=self.ndim)
   1326             if self.ndim > 1:
-> 1327                 values = values.reshape(result_shape)
   1328
   1329         from pandas import Float64Index

ValueError: total size of new array must be unchanged

original post follows


I have a simple dataframe that I created as follows:

df[df['Week of'] == week]

where week is a week name I'm filtering by

I have been taking the quartile values of this dataframe as follows:

df[df['Week of'] == week].quantile(.25)

However since the update to Pandas 0.19 I am receiving the error (this code worked fine before):

values = values.reshape(result_shape)
ValueError: total size of new array must be unchanged

@Rubyj Rubyj changed the title ValueError when trying to compute Quartile ValueError when trying to compute Quantile Oct 5, 2016
@chris-b1
Copy link
Contributor

chris-b1 commented Oct 5, 2016

Can you please make this a fully reproducible example with dummy data?

@Rubyj
Copy link
Author

Rubyj commented Oct 5, 2016

I have tracked this error down to there being NaN values in some, but not all, of the columns for a row (2 out of 10 in this case). I then tried to compute the quartile of that DF and pandas did not like this. My solution is to plug the NaN values with 0.

@TomAugspurger
Copy link
Contributor

Edited in a reproducible example. Hard to say for sure, but maybe related to 4de83d2

It's definitely related to a (float) block having some cols with missing values:

In [11]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)))
In [13]: df.iloc[1, 1] = np.nan

In [14]: df.quantile(.5)
Out[14]:
0    4.5
1    7.0
Name: 0.5, dtype: float64

and

In [15]: df = pd.DataFrame(np.random.randn(10, 2))

In [17]: df.iloc[0, :] = np.nan

In [18]: df.quantile(.5)
Out[18]:
0    0.347815
1    0.072105
Name: 0.5, dtype: float64

both work

@TomAugspurger TomAugspurger added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version Numeric Operations Arithmetic, Comparison, and Logical operations labels Oct 5, 2016
@TomAugspurger TomAugspurger added this to the 0.19.1 milestone Oct 5, 2016
@jreback
Copy link
Contributor

jreback commented Oct 6, 2016

In [10]: pd.__version__
Out[10]: '0.19.0'

In [11]: np.random.seed(1234)

In [12]: df = pd.DataFrame(np.random.randn(10, 2))
    ...:
    ...:
    ...: df.iloc[0, :] = np.nan
    ...:

In [13]: df
Out[13]:
          0         1
0       NaN       NaN
1  1.432707 -0.312652
2 -0.720589  0.887163
3  0.859588 -0.636524
4  0.015696 -2.242685
5  1.150036  0.991946
6  0.953324 -2.021255
7 -0.334077  0.002118
8  0.405453  0.289092
9  1.321158 -1.546906

In [14]: df.median()
Out[14]:
0    0.859588
1   -0.312652
dtype: float64

In [15]: df.quantile(0.5)
Out[15]:
0    0.859588
1   -0.312652
Name: 0.5, dtype: float64

In [16]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)))
    ...: df.iloc[1, 1] = np.nan
    ...:
    ...:

In [17]: df
Out[17]:
   0    1
0  0  3.0
1  2  NaN
2  1  3.0
3  1  3.0
4  7  1.0
5  7  4.0
6  0  5.0
7  1  5.0
8  9  9.0
9  4  0.0

In [18]: df.median()
Out[18]:
0    1.5
1    3.0
dtype: float64

In [19]: df.quantile(0.5)
Out[19]:
0    1.5
1    3.0
Name: 0.5, dtype: float64

@jreback
Copy link
Contributor

jreback commented Oct 6, 2016

@Rubyj you'll have to show a complete end-to-end reproducible example. This was a bug in 0.18.1 but is correct in 0.19.0.

@jreback jreback removed this from the 0.19.1 milestone Oct 6, 2016
@jreback jreback removed Difficulty Intermediate Regression Functionality that used to work in a prior pandas version labels Oct 6, 2016
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 6, 2016

@jreback the problem seems to be a DataFrame with a FloatBloack that has at least 1 col with no missing values and at least 1 col with some missing values (see my edit at the top of the OP)

@Rubyj
Copy link
Author

Rubyj commented Oct 6, 2016

@jreback

@TomAugspurger provided a reproducible example for me in my original post and added the labels that you removed. Not sure if you saw that. Thank you Tom 👍

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Oct 6, 2016
@jorisvandenbossche jorisvandenbossche added this to the 0.19.1 milestone Oct 6, 2016
@jreback
Copy link
Contributor

jreback commented Oct 7, 2016

@TomAugspurger your example works, I see that you changed the top of post. thanks.

@jreback
Copy link
Contributor

jreback commented Oct 7, 2016

so in this case, the individual dims needs to be iterated (corresponding with the columns). with the quantiling then combined, rather than doing this all at once. numpy doesn't handle the nans in the quantiling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

5 participants