-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] Support H5AD ingest directly from S3 in backed mode #1629
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
marvelous
that is definitely one word you could use to describe it
I'm not entirely sure what the deal is with the anndata failing to read properly in some of your test cases, but this is my suggestion to make mypy stop whining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Based on the changes you made, it seems like we should maybe update tiledb.VFS.open
to also accept pathlib.Path
s as well. Doing so would make the changes you just did unnecessary (and make the TileDB VFS API play better with the rest of the Python world).
One trivial nit that you are free to ignore.
@@ -219,16 +219,16 @@ def test_resume_mode(adata, resume_mode_h5ad_file): | |||
tempdir1 = tempfile.TemporaryDirectory() | |||
output_path1 = tempdir1.name | |||
tiledbsoma.io.from_h5ad( | |||
output_path1, resume_mode_h5ad_file, "RNA", ingest_mode="write" | |||
output_path1, resume_mode_h5ad_file.as_posix(), "RNA", ingest_mode="write" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think str(resume_mode_h5ad_file)
might technically be a better way to express this, since there is a theoretical world where tiledbsoma runs on non-POSIX machines. str(some_pathlib_path)
turns it into the system’s default representation. (I’m pretty sure I have myself written .as_posix()
elsewhere in tiledbsoma without thinking that part through, so it basically Does Not Matter.)
Thanks @thetorpedodog ! I think the follow-on for TileDB-Py (accepting Pathlib) is a good one, thanks! |
When working on a change in tiledbsoma [1], we ran into somewhat mysterious error messages where things had worked before. This was because, unlike `open` (and other libraries which accept path-likes), `tiledb.VFS` only accepts `str` (and undocumented `bytes`) paths. Accepting path-like objects (like `pathlib.Path`s) lets us play better with other libraries and user code. [1]: single-cell-data/TileDB-SOMA#1629
When working on a change in tiledbsoma [1], we ran into somewhat mysterious error messages where things had worked before. This was because, unlike `open` (and other libraries which accept path-likes), `tiledb.VFS` only accepts `str` (and undocumented `bytes`) paths. Accepting path-like objects (like `pathlib.Path`s) lets us play better with other libraries and user code. [1]: single-cell-data/TileDB-SOMA#1629
When working on a change in tiledbsoma [1], we ran into somewhat mysterious error messages where things had worked before. This was because, unlike `open` (and other libraries which accept path-likes), `tiledb.VFS` only accepts `str` (and undocumented `bytes`) paths. Accepting path-like objects (like `pathlib.Path`s) lets us play better with other libraries and user code. [1]: single-cell-data/TileDB-SOMA#1629
When working on a change in tiledbsoma [1], we ran into somewhat mysterious error messages where things had worked before. This was because, unlike `open` (and other libraries which accept path-likes), `tiledb.VFS` only accepts `str` (and undocumented `bytes`) paths. Accepting path-like objects (like `pathlib.Path`s) lets us play better with other libraries and user code. [1]: single-cell-data/TileDB-SOMA#1629
Issue and/or context:
@thetorpedodog has a marvelous change which lets us ingest H5AD -- with memory-friendly "backed" mode -- directly from S3 without needing to make a temp copy to local. Let's use there here too.
Changes:
Notes for Reviewer:
I'm having a problem with type checking.