-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Practical steps towards a simplified BlockManager #34669
Comments
This is now easy to grep for because it is
I lost momentum on this, would be happy to see it picked back up. IIRC
+1 |
rolling operations also operate block-wise, but removal would be fairly straightforward |
where is that implemented? Does it go through BlockManager.apply? |
In rolling: https://github.com/pandas-dev/pandas/blob/master/pandas/core/window/rolling.py#L232 Which ultimately hits: https://github.com/pandas-dev/pandas/blob/master/pandas/core/generic.py#L5479 Which would be neat to get rid of since I think this is the only usage of this method in |
thanks, ill take a look |
I can volunteer to change rolling to column-wise |
I think the thing to do is go through BlockManager.apply, so as to be agnostic to the ongoing 1D/2D issue |
In the mailing list discussion on a simplified (non-consolidating) BlockManager, @jbrockmendel brought up the relevant question of how we could get there (incrementally?): https://mail.python.org/pipermail/pandas-dev/2020-May/001223.html
Since that is a more practical discussion, moving that to an issue here.
Longer term, there are multiple options of how such a blockmanager could be enabled by the user (listing 2 here, but there are probably other options as well):
temporary, to the Blocks, so maybe not an ideal path forward.
For the "all extension arrays" option, we could also use light-weight EAs to store numpy arrays (like the "PandasArrays" we have now, only then actually using it), as long as we don't yet have actual extension arrays for all dtypes.
For both, we could have a constructor keyword to enabled it, and/or a global config option to enable it.
Now, on the shorter term, there are probably some work items we could tackle to reduce the API surface / usage of blocks outside of the internals, to make things like the above more realistic to implement (and make it easier to experiment with alternative BlockManager implementations):
REF: BlockManager.delete -> idelete #33332 (removal of BlockManager.get/set/delete) this might actually already be done?
(Maybe actually reducing might not be possible, but the exercise can still be useful to get an explicit overview of what is needed to implement a BlockManager).
It might be that there are already more concrete open issues about some of those aspects.
cc @pandas-dev/pandas-core
The text was updated successfully, but these errors were encountered: