-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: DataFrame.transpose for pyarrow-backed #54224
Conversation
pandas/core/frame.py
Outdated
if isinstance(self._mgr, ArrayManager): | ||
arrays = self._mgr.arrays | ||
else: | ||
arrays = list(self._iter_column_arrays()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cant do this unconditionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean use _iter_column_arrays
in both cases? The masked array implementation just above this uses the condition as well, presumably for performance:
import pandas as pd
pd.options.mode.data_manager = "array"
df = pd.DataFrame(columns=range(100000), dtype="int64[pyarrow]")
%timeit list(df._iter_column_arrays())
# 57.9 ms ± 2.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit df._mgr.arrays
# 110 ns ± 8.79 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, thanks for taking a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on second thought, I moved the condition into DataFrame._iter_column_arrays
to avoid repeating it elsewhere
Thanks @lukemanley |
doc/source/whatsnew/v2.1.0.rst
file if fixing a bug or adding a new feature.Similar to #52836, but for pyarrow-backed frames.
Renamed the masked implementation from
homogenous
->homogeneous
. Both may be valid(?), buthomogeneous
is used elsewhere.