-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INT: provide helpers for accessing the values of DataFrame columns #33252
Changes from all commits
2afaaad
18203e8
b1434fe
8e29685
f472232
562635c
a25cac8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,7 @@ | |
FrozenSet, | ||
Hashable, | ||
Iterable, | ||
Iterator, | ||
List, | ||
Optional, | ||
Sequence, | ||
|
@@ -40,7 +41,16 @@ | |
from pandas._config import get_option | ||
|
||
from pandas._libs import algos as libalgos, lib, properties | ||
from pandas._typing import Axes, Axis, Dtype, FilePathOrBuffer, Label, Level, Renamer | ||
from pandas._typing import ( | ||
ArrayLike, | ||
Axes, | ||
Axis, | ||
Dtype, | ||
FilePathOrBuffer, | ||
Label, | ||
Level, | ||
Renamer, | ||
) | ||
from pandas.compat import PY37 | ||
from pandas.compat._optional import import_optional_dependency | ||
from pandas.compat.numpy import function as nv | ||
|
@@ -2573,6 +2583,21 @@ def _ixs(self, i: int, axis: int = 0): | |
|
||
return result | ||
|
||
def _get_column_array(self, i: int) -> ArrayLike: | ||
""" | ||
Get the values of the i'th column (ndarray or ExtensionArray, as stored | ||
in the Block) | ||
""" | ||
return self._data.iget_values(i) | ||
|
||
def _iter_column_arrays(self) -> Iterator[ArrayLike]: | ||
""" | ||
Iterate over the arrays of all columns in order. | ||
This returns the values as stored in the Block (ndarray or ExtensionArray). | ||
""" | ||
for i in range(len(self.columns)): | ||
yield self._get_column_array(i) | ||
|
||
def __getitem__(self, key): | ||
key = lib.item_from_zerodim(key) | ||
key = com.apply_if_callable(key, self) | ||
|
@@ -8031,8 +8056,12 @@ def _reduce( | |
|
||
assert filter_type is None or filter_type == "bool", filter_type | ||
|
||
dtype_is_dt = self.dtypes.apply( | ||
lambda x: is_datetime64_any_dtype(x) or is_period_dtype(x) | ||
dtype_is_dt = np.array( | ||
[ | ||
is_datetime64_any_dtype(values.dtype) or is_period_dtype(values.dtype) | ||
for values in self._iter_column_arrays() | ||
], | ||
dtype=bool, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the perf issue here is in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as much as im trying to avoid There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Both the
Indeed, that would be even more appropriate here. Now, I am happy to change it to that, but that's not really the core of this PR. I mainly wanted to add one useful case as an illustration, but mainly want the helper function for other PRs. I am happy to merge it without already using it somewhere as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When having a dataframe with only extension blocks, it's actually slower than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since we're just going to call |
||
) | ||
if numeric_only is None and name in ["mean", "median"] and dtype_is_dt.any(): | ||
warnings.warn( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think another newline?