Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Prototype Dask-backed to_anndata #3740

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

[python] Prototype Dask-backed to_anndata #3740

wants to merge 11 commits into from

Conversation

ryan-williams
Copy link
Member

@ryan-williams ryan-williams commented Feb 26, 2025

Changes

  • Experiment{,AxisQuery}.to_anndata can optionally produce an AnnData with X matrix of type dask.Array
    • Its blocks are scipy.sparse.csr_matrix's, produced lazily from slices of the underlying Experiment or ExperimentAxisQuery
    • CSC is also supported
  • dask is introduced as a "dev" dependency (for tests of this code)
    • It's already listed in requirements_spatial.txt, for similar reasons
    • import dask only occurs inside relevant functions and test-cases; this change should be transparent to users.
  • Direct links to new/relevant docs:
  • #3741 contains benchmarking/profiling code and stats.

I've also enumerated and reordered the tutorials in tutorials.rst.

sc-64138

Copy link

codecov bot commented Feb 28, 2025

Codecov Report

Attention: Patch coverage is 81.14286% with 33 lines in your changes missing coverage. Please review.

Project coverage is 88.94%. Comparing base (0b3c43c) to head (73ea4d8).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3740      +/-   ##
==========================================
- Coverage   89.15%   88.94%   -0.22%     
==========================================
  Files          54       56       +2     
  Lines        6420     6587     +167     
==========================================
+ Hits         5724     5859     +135     
- Misses        696      728      +32     
Flag Coverage Δ
python 88.94% <81.14%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 88.94% <81.14%> (-0.22%) ⬇️
libtiledbsoma ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ryan-williams ryan-williams marked this pull request as ready for review March 12, 2025 20:53
@ryan-williams ryan-williams changed the title [WIP] [python] Prototype Dask-backed to_anndata [python] Prototype Dask-backed to_anndata Mar 12, 2025
Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryan-williams this is truly gorgeous.

Several relatively minor things flagged. I've nothing "big" to add.

I defer to @ivirshup who's got the Dask-user context which I lack.

Thank you for your time on this!

"id": "510dcf31-f0fb-4652-bcfb-abdf4cb6f1e2",
"metadata": {},
"source": [
"The `X` matrix is a [Dask Array]:\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't hurt to add type(adata.X) as another cell -- just a thought

Co-authored-by: John Kerl <[email protected]>
Copy link
Member Author

@ryan-williams ryan-williams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed most comments, a couple still TODO

ryan-williams and others added 2 commits March 12, 2025 18:25
CR: notebook copy fix

Co-authored-by: John Kerl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants