CLN: move nanpercentile functionality to nanops #14562

jorisvandenbossche · 2016-11-02T13:03:28Z

Move the _nanpercentile functionality used in quantile in internals.py (

Lines 1319 to 1393 in 52f31d4

    
           def _nanpercentile1D(values, mask, q, **kw): 
        
               values = values[~mask] 
        
               if len(values) == 0: 
        
                   if is_scalar(q): 
        
                       return self._na_value 
        
                   else: 
        
                       return np.array([self._na_value] * len(q), 
        
                                       dtype=values.dtype) 
        
               return np.percentile(values, q, **kw) 
        
           def _nanpercentile(values, q, axis, **kw): 
        
               mask = isnull(self.values) 
        
               if not is_scalar(mask) and mask.any(): 
        
                   if self.ndim == 1: 
        
                       return _nanpercentile1D(values, mask, q, **kw) 
        
                   else: 
        
                       # for nonconsolidatable blocks mask is 1D, but values 2D 
        
                       if mask.ndim < values.ndim: 
        
                           mask = mask.reshape(values.shape) 
        
                       if axis == 0: 
        
                           values = values.T 
        
                           mask = mask.T 
        
                       result = [_nanpercentile1D(val, m, q, **kw) for (val, m) 
        
                                 in zip(list(values), list(mask))] 
        
                       result = np.array(result, dtype=values.dtype, copy=False).T 
        
                       return result 
        
               else: 
        
                   return np.percentile(values, q, axis=axis, **kw) 
        
           from pandas import Float64Index 
        
           is_empty = values.shape[axis] == 0 
        
           if is_list_like(qs): 
        
               ax = Float64Index(qs) 
        
               if is_empty: 
        
                   if self.ndim == 1: 
        
                       result = self._na_value 
        
                   else: 
        
                       # create the array of na_values 
        
                       # 2d len(values) * len(qs) 
        
                       result = np.repeat(np.array([self._na_value] * len(qs)), 
        
                                          len(values)).reshape(len(values), 
        
                                                               len(qs)) 
        
               else: 
        
                   try: 
        
                       result = _nanpercentile(values, np.array(qs) * 100, 
        
                                               axis=axis, **kw) 
        
                   except ValueError: 
        
                       # older numpies don't handle an array for q 
        
                       result = [_nanpercentile(values, q * 100, 
        
                                                axis=axis, **kw) for q in qs] 
        
                   result = np.array(result, copy=False) 
        
                   if self.ndim > 1: 
        
                       result = result.T 
        
           else: 
        
               if self.ndim == 1: 
        
                   ax = Float64Index([qs]) 
        
               else: 
        
                   ax = mgr.axes[0] 
        
               if is_empty: 
        
                   if self.ndim == 1: 
        
                       result = self._na_value 
        
                   else: 
        
                       result = np.array([self._na_value] * len(self)) 
        
               else: 
        
                   result = _nanpercentile(values, qs * 100, axis=axis, **kw)

) to nanops.py.

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-11-02T13:10:44Z

Original comment of @jreback

I meant move ALL of this; the nanops do everything (based on dtype), are basically ufuncs per-dtype. Its ok for now if you want to merge (to fix the bug). But let's open a new issue to move this code. All of the rest of it is there (for other ops). We don't do very much inside the block managers, mainly just assemble blocks, actual calculations are pushed to other routines (numpy or pandas)

The reason that it is not just a copy paste, is that the current implementation makes use of the _try_coerce_args, _na_value and _try_coerce_result methods of the Block, which you have to replace (but there is also functionality for that in nanops).

jbrockmendel · 2019-12-25T23:20:40Z

Closed by #24597

jorisvandenbossche added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Difficulty Intermediate Clean labels Nov 2, 2016

jbrockmendel added the Refactor Internal refactoring of code label Jul 23, 2019

jbrockmendel mentioned this issue Jul 27, 2019

CLN: de-kludge quantile, make interpolate_with_fill understand datetime64 #27626

Closed

jbrockmendel removed the Difficulty Intermediate label Oct 21, 2019

jbrockmendel closed this as completed Dec 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: move nanpercentile functionality to nanops #14562

CLN: move nanpercentile functionality to nanops #14562

jorisvandenbossche commented Nov 2, 2016

jorisvandenbossche commented Nov 2, 2016

jbrockmendel commented Dec 25, 2019

CLN: move nanpercentile functionality to nanops #14562

CLN: move nanpercentile functionality to nanops #14562

Comments

jorisvandenbossche commented Nov 2, 2016

jorisvandenbossche commented Nov 2, 2016

jbrockmendel commented Dec 25, 2019