-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Allow disabling more components #33126
Comments
Joris Van den Bossche / @jorisvandenbossche: But it would indeed be good to make more of the others optional as well. Compute would probably give the biggest benefit, although also the most difficult one? In the cython code this is actually already handled using the |
Antoine Pitrou / @pitrou: define_option(ARROW_PYTHON
"Build some components needed by PyArrow.;\
(This is a deprecated option. Use CMake presets instead.)"
OFF
DEPENDS
ARROW_COMPUTE
ARROW_CSV
ARROW_DATASET
ARROW_FILESYSTEM
ARROW_HDFS
ARROW_JSON) |
Antoine Pitrou / @pitrou: |
Joris Van den Bossche / @jorisvandenbossche: |
Antoine Pitrou / @pitrou: As for the Compute dependency: perhaps we can factor out the casting code in PyArrow C++ (there's not much of it) and use compilation directives to simply return |
Some users would like to build lightweight versions of PyArrow, for example for use in AWS Lambda or similar systems which constrain the total size of usable libraries.
However, PyArrow currently mandates some Arrow C++ components which can lead to a very sizable Arrow binary install: Compute, CSV, Dataset, Filesystem, HDFS and JSON.
Reporter: Antoine Pitrou / @pitrou
Related issues:
Note: This issue was originally created as ARROW-17916. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: