Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39217: [Python] RecordBatchReader.from_stream constructor for objects implementing the Arrow PyCapsule protocol #39218

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Dec 13, 2023

Rationale for this change

In contrast to Array, RecordBatch and Schema, for the C Stream (mapping to RecordBatchReader) we haven't an equivalent factory function that can accept any Arrow-compatible object and turn it into a pyarrow object through the PyCapsule Protocol.

For that reason, this proposes an explicit constructor class method for this: RecordBatchReader.from_stream (this is a quite generic name, so other name suggestions are certainly welcome).

Are these changes tested?

TODO

…r objects implementing the Arrow PyCapsule protocol
Copy link

⚠️ GitHub issue #39217 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. Of course it needs some tests.
Should pyarrow.ipc.open_stream also accept PyCapsule producers?

)

if schema is not None:
requested = schema.__arrow_c_schema__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to first test the presence of this method using hasattr as above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Dec 21, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 21, 2023
@jorisvandenbossche
Copy link
Member Author

Should pyarrow.ipc.open_stream also accept PyCapsule producers?

My understanding is that at the moment this function is meant to work with file-like objects (or the in-memory buffer representing it) for an IPC encapsulated message. That seems a bit different in scope, and I would say it's fine to keep that scope?

@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review December 21, 2023 15:12
@pitrou
Copy link
Member

pitrou commented Dec 21, 2023

My understanding is that at the moment this function is meant to work with file-like objects (or the in-memory buffer representing it) for an IPC encapsulated message. That seems a bit different in scope, and I would say it's fine to keep that scope?

Fair enough. My concern was that RecordBatchReader.from_stream is less visible...

@jorisvandenbossche jorisvandenbossche merged commit dc40e5f into apache:main Jan 8, 2024
12 checks passed
@jorisvandenbossche jorisvandenbossche removed the awaiting change review Awaiting change review label Jan 8, 2024
@jorisvandenbossche jorisvandenbossche deleted the recordbatchreader-from-stream branch January 8, 2024 15:49
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit dc40e5f.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

clayburn pushed a commit to clayburn/arrow that referenced this pull request Jan 23, 2024
…r objects implementing the Arrow PyCapsule protocol (apache#39218)

### Rationale for this change

In contrast to Array, RecordBatch and Schema, for the C Stream (mapping to RecordBatchReader) we haven't an equivalent factory function that can accept any Arrow-compatible object and turn it into a pyarrow object through the PyCapsule Protocol.

For that reason, this proposes an explicit constructor class method for this: `RecordBatchReader.from_stream` (this is a quite generic name, so other name suggestions are certainly welcome).

### Are these changes tested?
TODO

* Closes: apache#39217

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…r objects implementing the Arrow PyCapsule protocol (apache#39218)

### Rationale for this change

In contrast to Array, RecordBatch and Schema, for the C Stream (mapping to RecordBatchReader) we haven't an equivalent factory function that can accept any Arrow-compatible object and turn it into a pyarrow object through the PyCapsule Protocol.

For that reason, this proposes an explicit constructor class method for this: `RecordBatchReader.from_stream` (this is a quite generic name, so other name suggestions are certainly welcome).

### Are these changes tested?
TODO

* Closes: apache#39217

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
…r objects implementing the Arrow PyCapsule protocol (apache#39218)

### Rationale for this change

In contrast to Array, RecordBatch and Schema, for the C Stream (mapping to RecordBatchReader) we haven't an equivalent factory function that can accept any Arrow-compatible object and turn it into a pyarrow object through the PyCapsule Protocol.

For that reason, this proposes an explicit constructor class method for this: `RecordBatchReader.from_stream` (this is a quite generic name, so other name suggestions are certainly welcome).

### Are these changes tested?
TODO

* Closes: apache#39217

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] RecordBatchReader constructor from stream object implementing the PyCapsule Protocol
2 participants