Skip to content

Fix tests for dask PCA #3162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 25, 2024
Merged

Fix tests for dask PCA #3162

merged 2 commits into from
Jul 25, 2024

Conversation

flying-sheep
Copy link
Member

@flying-sheep flying-sheep commented Jul 23, 2024

Fixes the current test failures.

I could also do a simpler version where I just rechunk in one way instead of adding parameters here.

Copy link

codecov bot commented Jul 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.52%. Comparing base (9fa55b8) to head (87afc4a).
Report is 55 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3162   +/-   ##
=======================================
  Coverage   76.52%   76.52%           
=======================================
  Files         109      109           
  Lines       12483    12483           
=======================================
  Hits         9553     9553           
  Misses       2930     2930           

@flying-sheep flying-sheep requested a review from ilan-gold July 23, 2024 14:47
@ilan-gold
Copy link
Contributor

Sorry what is the provenance of this fix? Why isn't the fix something in AnnData? This seems somewhat complicated

@flying-sheep
Copy link
Member Author

flying-sheep commented Jul 23, 2024

Yeah, I was concerned about the complexity.

But AnnData doesn’t do anything wrong, it just uses chunks in both directions, which that one specific algorithm doesn’t support.

But I think in general we should test for 2D chunks, that’s why I think AnnData’s test helpers work as intended.

Any idea how to do this more simply?

@ilan-gold
Copy link
Contributor

ilan-gold commented Jul 23, 2024

@flying-sheep I mean why this came up all of a sudden in terms of provenance. And furthermore, instead of rechunk, why not make a chunks argument or the like for the anndata function?

@flying-sheep
Copy link
Member Author

I’m not sure why it came up now, I saw changes to the chunking in scverse/anndata#1550 and assumed that introduced it, but maybe not.

regarding complexity and parameters, we could simplify things by doing

def maybe_rechunk_1d(a: NDArray | DaskArray | ...) -> NDArray | DaskArray | ...:
    if isinstance(a, DaskArray):
        return a.rechunk((a.chunksize[0], -1))
    return a

and using that in all the functions that fail (after running things through array_type). Would you prefer that?

@flying-sheep
Copy link
Member Author

I simplified this quite a bit. I decided against blocking this on an upstream anndata helper for multiple reasons:

  1. we test on older anndata versions that won’t have that parameter immediately
  2. needs some designing, e.g. the API should be able to do “use ceil(shape[0] / 2) as chunk size for dim 0, and -1 (=full size) for dim 1”
  3. it would make no sense for the non-dask array types, should we still add it for them?

@flying-sheep flying-sheep enabled auto-merge (squash) July 25, 2024 11:04
@flying-sheep flying-sheep disabled auto-merge July 25, 2024 11:27
@flying-sheep flying-sheep merged commit 71fd59a into main Jul 25, 2024
11 of 14 checks passed
@flying-sheep flying-sheep deleted the fix-dask-svd branch July 25, 2024 11:27
meeseeksmachine pushed a commit to meeseeksmachine/scanpy that referenced this pull request Jul 25, 2024
flying-sheep added a commit that referenced this pull request Jul 25, 2024
kaushalprasadhial pushed a commit to sanchit-misra/scanpy that referenced this pull request Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants