Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Support H5AD ingest directly from S3 in backed mode #1629

Merged
merged 2 commits into from
Aug 22, 2023

Conversation

johnkerl
Copy link
Member

Issue and/or context:

@thetorpedodog has a marvelous change which lets us ingest H5AD -- with memory-friendly "backed" mode -- directly from S3 without needing to make a temp copy to local. Let's use there here too.

Changes:

Notes for Reviewer:

I'm having a problem with type checking.

Copy link
Contributor

@thetorpedodog thetorpedodog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marvelous

that is definitely one word you could use to describe it

I'm not entirely sure what the deal is with the anndata failing to read properly in some of your test cases, but this is my suggestion to make mypy stop whining.

apis/python/src/tiledbsoma/io/ingest.py Outdated Show resolved Hide resolved
@johnkerl johnkerl requested a review from thetorpedodog August 22, 2023 16:00
Copy link
Contributor

@thetorpedodog thetorpedodog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Based on the changes you made, it seems like we should maybe update tiledb.VFS.open to also accept pathlib.Paths as well. Doing so would make the changes you just did unnecessary (and make the TileDB VFS API play better with the rest of the Python world).

One trivial nit that you are free to ignore.

@@ -219,16 +219,16 @@ def test_resume_mode(adata, resume_mode_h5ad_file):
tempdir1 = tempfile.TemporaryDirectory()
output_path1 = tempdir1.name
tiledbsoma.io.from_h5ad(
output_path1, resume_mode_h5ad_file, "RNA", ingest_mode="write"
output_path1, resume_mode_h5ad_file.as_posix(), "RNA", ingest_mode="write"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think str(resume_mode_h5ad_file) might technically be a better way to express this, since there is a theoretical world where tiledbsoma runs on non-POSIX machines. str(some_pathlib_path) turns it into the system’s default representation. (I’m pretty sure I have myself written .as_posix() elsewhere in tiledbsoma without thinking that part through, so it basically Does Not Matter.)

@johnkerl
Copy link
Member Author

Thanks @thetorpedodog ! I think the follow-on for TileDB-Py (accepting Pathlib) is a good one, thanks!

@johnkerl johnkerl merged commit 1b2c3e7 into main Aug 22, 2023
@johnkerl johnkerl deleted the kerl/s3-h5ad-backed-ingest branch August 22, 2023 16:14
thetorpedodog added a commit to TileDB-Inc/TileDB-Py that referenced this pull request Aug 22, 2023
When working on a change in tiledbsoma [1], we ran into somewhat
mysterious error messages where things had worked before. This was
because, unlike `open` (and other libraries which accept path-likes),
`tiledb.VFS` only accepts `str` (and undocumented `bytes`) paths.
Accepting path-like objects (like `pathlib.Path`s) lets us play better
with other libraries and user code.

[1]: single-cell-data/TileDB-SOMA#1629
thetorpedodog added a commit to TileDB-Inc/TileDB-Py that referenced this pull request Aug 22, 2023
When working on a change in tiledbsoma [1], we ran into somewhat
mysterious error messages where things had worked before. This was
because, unlike `open` (and other libraries which accept path-likes),
`tiledb.VFS` only accepts `str` (and undocumented `bytes`) paths.
Accepting path-like objects (like `pathlib.Path`s) lets us play better
with other libraries and user code.

[1]: single-cell-data/TileDB-SOMA#1629
thetorpedodog added a commit to TileDB-Inc/TileDB-Py that referenced this pull request Aug 22, 2023
When working on a change in tiledbsoma [1], we ran into somewhat
mysterious error messages where things had worked before. This was
because, unlike `open` (and other libraries which accept path-likes),
`tiledb.VFS` only accepts `str` (and undocumented `bytes`) paths.
Accepting path-like objects (like `pathlib.Path`s) lets us play better
with other libraries and user code.

[1]: single-cell-data/TileDB-SOMA#1629
thetorpedodog added a commit to TileDB-Inc/TileDB-Py that referenced this pull request Aug 23, 2023
When working on a change in tiledbsoma [1], we ran into somewhat
mysterious error messages where things had worked before. This was
because, unlike `open` (and other libraries which accept path-likes),
`tiledb.VFS` only accepts `str` (and undocumented `bytes`) paths.
Accepting path-like objects (like `pathlib.Path`s) lets us play better
with other libraries and user code.

[1]: single-cell-data/TileDB-SOMA#1629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants