Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: move nanpercentile functionality to nanops #14562

Closed
jorisvandenbossche opened this issue Nov 2, 2016 · 2 comments
Closed

CLN: move nanpercentile functionality to nanops #14562

jorisvandenbossche opened this issue Nov 2, 2016 · 2 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Clean Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Refactor Internal refactoring of code

Comments

@jorisvandenbossche
Copy link
Member

See comment #14536 (comment)

Move the _nanpercentile functionality used in quantile in internals.py (

def _nanpercentile1D(values, mask, q, **kw):
values = values[~mask]
if len(values) == 0:
if is_scalar(q):
return self._na_value
else:
return np.array([self._na_value] * len(q),
dtype=values.dtype)
return np.percentile(values, q, **kw)
def _nanpercentile(values, q, axis, **kw):
mask = isnull(self.values)
if not is_scalar(mask) and mask.any():
if self.ndim == 1:
return _nanpercentile1D(values, mask, q, **kw)
else:
# for nonconsolidatable blocks mask is 1D, but values 2D
if mask.ndim < values.ndim:
mask = mask.reshape(values.shape)
if axis == 0:
values = values.T
mask = mask.T
result = [_nanpercentile1D(val, m, q, **kw) for (val, m)
in zip(list(values), list(mask))]
result = np.array(result, dtype=values.dtype, copy=False).T
return result
else:
return np.percentile(values, q, axis=axis, **kw)
from pandas import Float64Index
is_empty = values.shape[axis] == 0
if is_list_like(qs):
ax = Float64Index(qs)
if is_empty:
if self.ndim == 1:
result = self._na_value
else:
# create the array of na_values
# 2d len(values) * len(qs)
result = np.repeat(np.array([self._na_value] * len(qs)),
len(values)).reshape(len(values),
len(qs))
else:
try:
result = _nanpercentile(values, np.array(qs) * 100,
axis=axis, **kw)
except ValueError:
# older numpies don't handle an array for q
result = [_nanpercentile(values, q * 100,
axis=axis, **kw) for q in qs]
result = np.array(result, copy=False)
if self.ndim > 1:
result = result.T
else:
if self.ndim == 1:
ax = Float64Index([qs])
else:
ax = mgr.axes[0]
if is_empty:
if self.ndim == 1:
result = self._na_value
else:
result = np.array([self._na_value] * len(self))
else:
result = _nanpercentile(values, qs * 100, axis=axis, **kw)
) to nanops.py.

@jorisvandenbossche jorisvandenbossche added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Difficulty Intermediate Clean labels Nov 2, 2016
@jorisvandenbossche
Copy link
Member Author

Original comment of @jreback

I meant move ALL of this; the nanops do everything (based on dtype), are basically ufuncs per-dtype. Its ok for now if you want to merge (to fix the bug). But let's open a new issue to move this code. All of the rest of it is there (for other ops). We don't do very much inside the block managers, mainly just assemble blocks, actual calculations are pushed to other routines (numpy or pandas)

The reason that it is not just a copy paste, is that the current implementation makes use of the _try_coerce_args, _na_value and _try_coerce_result methods of the Block, which you have to replace (but there is also functionality for that in nanops).

@jbrockmendel
Copy link
Member

Closed by #24597

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Clean Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Refactor Internal refactoring of code
Projects
None yet
Development

No branches or pull requests

2 participants