Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image -> Arrow support #8330

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Image -> Arrow support #8330

wants to merge 25 commits into from

Conversation

wiredfool
Copy link
Member

@wiredfool wiredfool commented Aug 25, 2024

Partial implementation of #8329.

This is for discussion -- there are a number of issues.

Changes proposed in this pull request:

  • Implements basic __arrow_c_array__ and __arrow_c_schema__ support for reading Pillow images in Arrow capable libraries.
  • Implements basic arrow array exporting object -> Pillow image. This supports reading arrays in exactly the same formats that we export from Pillow.
  • Adds a way to force the internal allocator to use single blocks for images. Previously there is core.new_block, but internal operations call ImageNewDirty internally, which only calls the Arena Allocator.
  • Adds C level read-only flag. Currently only exported to python layer, and overrides python read_only.

Issues:

  • Currently fails on large images.
    • Returns a ValueError if the image is allocated in multiple blocks
  • There's extraneous stuff in Arrow.c that should be cleaned up.
  • Some changes in Storage.c aren't required.
  • Unclear if we're exporting the most useful version of the data.

Todo:

  • Alternate binary installable arrow lib for testing, or loopback only testing on troublesome platforms if we can't find one that works.
  • Fix Windows Segfault. I don't have a windows env here, so either I need to figure out a CI based version or I'll need an assist.
    • Windows tests now appear to be passing on 3.10->3.13, with 3.9, pypy3.10 and 3.14 failing on arrow install.
  • Python accessible version (r/w, likely) of the schema capsule and possibly array metadata. Partially for testing, partially for sanity checking at the python layer before passing into the C portion.
  • Fix the UNDONEs in the code, especially around error handling.
    • make sure we're raising an error if we don't accept the array format.
  • Large image support or error out.
  • Wire up enforcement of c level read_only flag
  • Docs

For Review:

  • What CVE is in here that I'm not seeing?
  • There's some pretty serious mucking with the object lifetimes, specifically there's now effectively a refcount on the core C image, so it's lifetime is not limited to the python object's lifetime. I think this is required, but it needs review.
  • I'm not 100% on how so much of the checking is in Storage.c, vs somewhere closer to the surface.
  • The core Imaging item now stores a reference to the array pycapsule, which requires PyINCREF and PyDECREF in Storage.c. This feels like it's finally breaking the wall of libImaging not relying on python headers.

src/PIL/Image.py Outdated Show resolved Hide resolved
* Fixed format, only for 4 channel images
* structs can't be encoded this way, they need to have one child array
per struct member.
* i.e. struct of arrays, rather than an array of structs.
* basic safety included
* only respecting the types we emit
@wiredfool wiredfool force-pushed the arrow branch 3 times, most recently from c10beff to 97eb7c0 Compare January 21, 2025 21:41
@wiredfool
Copy link
Member Author

Added basic fromarrow support, covering exactly the formats that we emit, and testing using round trip support.

Windows is failing, so that's something that's going to need digging. And somehow codecov is failing before most of the runs have gone, so not sure what's up there.

@wiredfool
Copy link
Member Author

Windows AMD64 is failing on pillow arrow tests as well as the python version windows matrix:

Tests/test_arrow.py::test_to_array[L-dtype0-None] DEBUG:   25+ if ( >>>> !$?) { exit $LASTEXITCODE }

Windows x86, Windows pypy, manylinux wheels and MacOS Arm64 and 10.15 X86_64 are failing on pyarrow install/compilation, instead of installing the prebuilt wheel. (I'm not sure why MacOS Arm64 is failing, as that's my dev platform). We're going to have to look at either compile time prerequisites, a lighter library with lots of binaries, or potentially skipping tests if pyarrow doesn't compile.

src/PIL/Image.py Outdated Show resolved Hide resolved
src/PIL/Image.py Outdated Show resolved Hide resolved
src/PIL/Image.py Outdated Show resolved Hide resolved
wiredfool and others added 4 commits January 25, 2025 13:42
Co-authored-by: Andrew Murray <[email protected]>
Co-authored-by: Andrew Murray <[email protected]>
Co-authored-by: Andrew Murray <[email protected]>
Co-authored-by: Andrew Murray <[email protected]>
("HSV", ["L", "F"]),
),
)
def test_invalid_array_type(mode: str, dest_modes: List[str]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_invalid_array_type(mode: str, dest_modes: List[str]) -> None:
def test_invalid_array_type(mode: str, dest_modes: list[str]) -> None:

@wiredfool
Copy link
Member Author

wiredfool commented Feb 1, 2025

Ok. I'm really confused now. Windows AMD64 does seem to be working on a test machine. (win11, freshly downloaded vc community 2020, python 3.12.8). Even pyarrow installs cleanly, which I wasn't expecting from the CI runs. (
https://github.com/python-pillow/Pillow/actions/runs/13061587157)

(vpy) c:\Users\eric\src\Pillow>pytest Tests/test_arrow.py
================================================= test session starts =================================================
platform win32 -- Python 3.12.8, pytest-8.3.4, pluggy-1.5.0
--------------------------------------------------------------------
Pillow 11.1.0
Python 3.12.8 (tags/v3.12.8:2dc476b, Dec  3 2024, 19:30:04) [MSC v.1942 64 bit (AMD64)]
--------------------------------------------------------------------
Python executable is c:\Users\eric\src\vpy\Scripts\python.exe
Environment Python files loaded from c:\Users\eric\src\vpy
System Python files loaded from C:\Users\eric\AppData\Local\Programs\Python\Python312
--------------------------------------------------------------------
Python Pillow modules loaded from c:\Users\eric\src\vpy\Lib\site-packages\PIL
Binary Pillow modules loaded from c:\Users\eric\src\vpy\Lib\site-packages\PIL
--------------------------------------------------------------------
--- PIL CORE support ok, compiled for 11.1.0
--- TKINTER support ok, loaded 8.6
--- FREETYPE2 support ok, loaded 2.13.3
--- LITTLECMS2 support ok, loaded 2.16
--- WEBP support ok, loaded 1.5.0
--- JPEG support ok, compiled for libjpeg-turbo 3.1.0
--- OPENJPEG (JPEG2000) support ok, loaded 2.5.3
--- ZLIB (PNG/ZIP) support ok, loaded 1.3.1.zlib-ng, compiled for zlib-ng 2.2.3
--- LIBTIFF support ok, loaded 4.6.0
*** RAQM (Bidirectional Text) support not installed
--- LIBIMAGEQUANT (Quantization method) support ok, loaded 2.16.0
*** XCB (X protocol) support not installed
--------------------------------------------------------------------

rootdir: c:\Users\eric\src\Pillow
configfile: pyproject.toml
plugins: cov-6.0.0, timeout-2.3.1
collected 26 items

Tests\test_arrow.py ..........................                                                                   [100%]

================================================= 26 passed in 0.35s ==================================================

* Return memory error for allocation errors
* Return value error for invalid layout (block) or bad mode)
* Free children on releasing the array
* Only decrement refcount on the leaf array release
pre-commit-ci bot and others added 4 commits February 3, 2025 15:29
* PyCapsules call the destructor on Del
* Need to make sure that lifetime matches the array lifetime.
* Test that arrow can be exported when the block allocator is forced.
@wiredfool
Copy link
Member Author

wiredfool commented Feb 3, 2025

Ok, so I think this is actually pretty solid now.

  • @nulano I'd appreciate any suggestions you might have as to why the windows CI is failing, where it works on my machine.
    • Looks like windows is now passing, except for places where we can't install pyarrow or one of its dependencies.
    • MacOS/Ubuntu are both passing on all pythons, now that the 3.13t version compiles.
  • @hugo / @radarhere / @homm I'd appreciate a good look at the C level and the decisions there. I think we're good, but more eyes are better.
  • Still need to figure out why some of the CI wheel runs are failing on PyArrow. Perhaps we just install that manually in some systems, and test if it's available. Loopback Image.fromarrow(img) does enable some testing.
  • It might be better if pre-commit.ci didn't aggressively cancel CI runs on a Do Not Merge PR. I've got 30+ emails about cancelled runs from today due to linting, and really, linting is something that is just not required when we're trying to figure out if it's the right approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants