-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "DEPR: make_block (#56422)" #56481
Conversation
This reverts commit b0ffccd.
Should this go in before the RC? xref comment here #56422 (comment) |
Ideally yes |
I added an expanded comment on this issue (#56422 (comment)). And to clear, this is not a critical blocker for the RC (just for the final release). Nothing will break for people testing the RC if this is not in, but if they are running tests where they also use pyarrow, they will see a flood of additional warnings (that we can easily avoid by merging this), that will distract of actual relevant deprecation warnings we want them to check for the RC. |
cc @pandas-dev/pandas-core for more thoughts on this as this is the last remaining blocker for the release. |
Is there a way to not issue the warning if coming from I understand that @jbrockmendel would like to "force" |
That is incorrect, several reasonable alternatives have been offered in #56422. |
My read of that discussion is that there is debate about whether the 6X performance hit is "reasonable". |
"6x" is less accurate than "100us" |
Deprecating doesn't impose the performance penalty - it only imposes the message on users. Once can always improve performance of the blessed paths if this is a real bottle neck, but I do suspect that 35us vs 135 us is going to be noise in almost all cases. It does seem that the incentive for pyarrow to change their usage is nil until this makes it into a release. |
@jbrockmendel I would recommend that you re-read that discussion then. Yes, there are alternatives, and I also acknowledged that, but only ones (that are zero-copy) that can be used in the future, not right now (note that pyarrow cannot enable optional features like CoW, that's something for the user to do) To be explicit: the alternative that is 400µs slower (I agree that a slowdown like this will mostly be noise, although have to check the impact of more dtypes/columns) is not possible right now for pyarrow to implement (with public APIs). |
I think the current deprecation is OK to leave in for 2.2, but this deprecation should not be enforced (or upgraded to a |
Given that there is a lot of ongoing discussion about this (what's the exact alternative, what should be the timeline, etc), can we simply revert this for now so we can have this discussion before committing to certain changes? This warning will appear in CI for every project that uses pandas and pyarrow (e.g. we already see that in geopandas). So this is annoying a lot of other projects, just because we are committing the deprecation a bit early. |
Reverting it now means delaying enforcing it from 2025 to 2026. That is bad. |
@jorisvandenbossche wrote:
@jbrockmendel wrote:
To me, the timing seems to be the critical difference in opinion. I think there is general agreement that @jorisvandenbossche What timing do you propose for this deprecation and removal? |
would using a PendingDeprecationWarning address some of the CI issues? |
According to our 2-year support window of pyarrow, the earliest moment we can enforce this deprecation would be spring 2026. Even if we are laxer (as we currently are) and only support for 1 year, the earliest moment would be in spring 2025 (because the earliest version of pyarrow that can support this will be in spring 2024). If we only change Deprecation->FutureWarning in 4.0, what does it matter that the DeprecationWarning is added in 2.2 or 3.0 or 3.1?
And can you explain why this is "bad"? What we are deprecating here is a small helper function (only for external usage, not used internally) that has very little maintenance overhead (assumption based on the fact it was hardly touched the last three years). Keeping it a year longer doesn't seem much of an issue. |
Pytest shows both DeprecationWarning and PendingDeprecationWarning by default, so that would be no difference AFAIK
I have no problem with deprecating |
For the uninformed what huge benefit is there to a downstream library using make_block versus say constructing a dataframe with a dict of arrays? |
Zero-copy construction of 2D blocks. We currently don't have a public API that accepts 2D arrays (except for the case of a single dtype for your full DataFrame). |
That was a typo on my part. Should read "from 2025 to 2026". It is bad because 2026 is a long way away and enforcing the deprecation blocks weaning pyarrow off of constructing Managers which blocks any potential changes in Managers (as demonstrated by ArrayManager never being workable in the pyarrow-touching IO tests) |
Kind of. In many (all?) cases pyarrow already does a copy when converting its 1D arrays to 2D arrays. The reasoning that was given for this years ago was that this prevented post-construction performance hits bc pandas did silent consolidation. But silent consolidation was removed in 2.0, and there has been a lot of very frustrating goalpost-shifting. |
So let's revert the deprecation for 2.2 and then reintroduce for 3.0 so that we can enforce this either in 4.0 (less likely) or 5.0 (more likely because of the support window) at the latest. We have to run some tests to see if concat + reindex is sufficient or if we need a more efficient path that takes a list of arrays directly, but that should be relatively easy I guess? |
My concern with that plan is that Joris will infinitely re-litigate "clear alternative for the valid use cases" a few months from now. As bashtage said, pyarrow has no incentive to actually change until this is in a release. pyarrow is not being a good citizen by refusing to use public APIs. |
FWIW, it seems that the people coding that part of pyarrow are @jorisvandenbossche and @wesm so they are representing both communities on this topic..... |
Signing off for the day. I'll consent to reverting for 2.2 if there is firm commitment to deprecating in 3.0 and enforcing in 4.0. pyarrow shouldn't get to abuse our internals indefinitely over a few hundred microseconds. |
merging as we agreed on reverting the deprecation for 2.2 and deprecating again for 3.0 |
Looks like I made CI red with this (should've rebased before merging). Will take a look at the failures. |
…") (#56814) Backport PR #56481: Revert "DEPR: make_block (#56422)" Co-authored-by: Joris Van den Bossche <[email protected]>
This reverts commit b0ffccd.
This reverts commit b0ffccd.
See the discussion in #56422