Add low-level create_dataframe_from_blocks helper function #58197

jorisvandenbossche · 2024-04-09T15:40:41Z

See my explanation at #56815

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

pandas/api/internals.py

jbrockmendel · 2024-04-09T16:25:27Z

pandas/api/internals.py

@@ -0,0 +1,50 @@
+from pandas import DataFrame


Can we add a docstring (module and/or function level) to the effect of "we discourage this for everyone except pyarrow. if you think you have a use case for this, let us know at [...]"

@phofl might also have a use case in dask (I don't know id you already have a better idea now if that would be the case?)

Yeah we are working on changing how we shuffle data were this would be helpful (we will get a huge number of small data frames, so overhead is painful), but I agree that we should strengthen this a little bit that makes it clear that end users shouldn't need this

I already added a more generic note "For almost all use cases, you should use the standard pd.DataFrame(..) constructor instead." without naming specific libraries that use this.

What would we gain with a "if you think you have a use case for this, let us know at"? Learning about use cases where people would use this is certainly valuable, but in the end it will be public developer API and so if we would in the future change or remove it, we need to go through normal deprecation processes anyway, I think.

Hopefully we'll never have to revisit this again. But if we do, there is evidence that discussions around a deprecation here would be more painful than elsewhere. It would be helpful to know ahead of such a discussion if anyone else was using it. Moreover, the "let us know" is a chance to try to talk anyone out of using this.

Added "If you are planning to use this function, let us know by opening an issue at https://github.com/pandas-dev/pandas/issues."

pandas/core/internals/api.py

jbrockmendel · 2024-04-09T16:26:35Z

Not my favorite thing in the world, but better than the status quo so i'm on board.

…-from-blocks

jorisvandenbossche · 2024-04-10T16:01:17Z

I added some basic roundtrip tests and tests for the corner cases that I am aware of (the numpy arrays instead of EA for datetime/timedelta, and passing 1D EAs for cases that are stored 2D internally).
(and using this in pyarrow is also passing the test suite there)

jorisvandenbossche · 2024-04-11T12:38:02Z

@jbrockmendel @phofl could I get a more in-depth review (if there are remaining comments)? I would like to get the change to use this included in pyarrow 16, so we don't have to worry about deprecation warnings when merging #57754 for 3.0 (for the case of using the latest released version of both libraries), but then I need to get this in in the coming days.

jorisvandenbossche · 2024-04-15T16:32:45Z

Small reminder here

phofl · 2024-04-15T17:37:21Z

thx @jorisvandenbossche

…v#58197)

Add low-level create_dataframe_from_blocks helper function

58289bb

jorisvandenbossche requested review from jbrockmendel and phofl April 9, 2024 15:40

jorisvandenbossche commented Apr 9, 2024

View reviewed changes

pandas/api/internals.py Show resolved Hide resolved

jbrockmendel reviewed Apr 9, 2024

View reviewed changes

pandas/core/internals/api.py Outdated Show resolved Hide resolved

mroeschke added API Design Internals Related to non-user accessible pandas implementation labels Apr 9, 2024

add tests + type annotations

4a427f7

jorisvandenbossche marked this pull request as ready for review April 10, 2024 13:51

jorisvandenbossche added 3 commits April 10, 2024 16:12

try fix mypy

7ba8441

Merge remote-tracking branch 'upstream/main' into internals-dataframe…

493e481

…-from-blocks

update test_api.py

eb91afc

add note about when you plan to use this

07dbcfb

jorisvandenbossche added this to the 3.0 milestone Apr 11, 2024

mroeschke mentioned this pull request Apr 12, 2024

RLS: 3.0 #57064

Open

phofl approved these changes Apr 15, 2024

View reviewed changes

phofl merged commit ae246a6 into pandas-dev:main Apr 15, 2024
46 checks passed

jorisvandenbossche deleted the internals-dataframe-from-blocks branch April 15, 2024 19:21

jbrockmendel mentioned this pull request Apr 24, 2024

DEPR: make_block #57754

Merged

5 tasks

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024

Add low-level create_dataframe_from_blocks helper function (pandas-de…

1183936

…v#58197)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add low-level create_dataframe_from_blocks helper function #58197

Add low-level create_dataframe_from_blocks helper function #58197

jorisvandenbossche commented Apr 9, 2024 •

edited

Loading

jbrockmendel Apr 9, 2024

jorisvandenbossche Apr 10, 2024 •

edited

Loading

phofl Apr 10, 2024

jorisvandenbossche Apr 10, 2024

jbrockmendel Apr 10, 2024

jorisvandenbossche Apr 11, 2024

jbrockmendel commented Apr 9, 2024

jorisvandenbossche commented Apr 10, 2024

jorisvandenbossche commented Apr 11, 2024

jorisvandenbossche commented Apr 15, 2024

phofl commented Apr 15, 2024

Add low-level create_dataframe_from_blocks helper function #58197

Add low-level create_dataframe_from_blocks helper function #58197

Conversation

jorisvandenbossche commented Apr 9, 2024 • edited Loading

jbrockmendel Apr 9, 2024

Choose a reason for hiding this comment

jorisvandenbossche Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

phofl Apr 10, 2024

Choose a reason for hiding this comment

jorisvandenbossche Apr 10, 2024

Choose a reason for hiding this comment

jbrockmendel Apr 10, 2024

Choose a reason for hiding this comment

jorisvandenbossche Apr 11, 2024

Choose a reason for hiding this comment

jbrockmendel commented Apr 9, 2024

jorisvandenbossche commented Apr 10, 2024

jorisvandenbossche commented Apr 11, 2024

jorisvandenbossche commented Apr 15, 2024

phofl commented Apr 15, 2024

jorisvandenbossche commented Apr 9, 2024 •

edited

Loading

jorisvandenbossche Apr 10, 2024 •

edited

Loading