-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Dask in highly_variable_genes #2809
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #2809 +/- ##
==========================================
+ Coverage 74.07% 74.11% +0.04%
==========================================
Files 115 115
Lines 12652 12685 +33
==========================================
+ Hits 9372 9402 +30
- Misses 3280 3283 +3
|
ac48b8e
to
7ced391
Compare
filt, _ = filter_genes( | ||
_get_obs_rep(adata_subset, layer=layer), min_cells=1, inplace=False | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has been broken the whole time by operating on X instead of the selected layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet we didn't notice since the nonzero values are likely pretty similar, if not the same, for common use cases
Hmm, I found several weirdnesses in hvg:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the comments, can we get a regression test for "cell_ranger" (e.g. generate results with an older version)? I don't think we have one in the test suite.
Sure! That’s a concrete thing I can do. I’ll do that on thursday, I did the rest of what you asked today
I don't think this has happened yet, could we add this?
There are two more lines which aren't covered, but I believe they should be unreachable (both just ValueError that the arg should be "cell_ranger" or "seurat") so it's fine.
I'm a little concerned about changing the return for inplace=False
, in case anyone was relying on that. Not too concerned, but what do you think about tracking that for 2.0?
This reverts commit e1f3cca.
Sorry for the accidental push, trying to see if we can get inline code coverage on github: |
Co-authored-by: Isaac Virshup <[email protected]>
I’ll do that today if you can give me more info. Usually “regression test” refers to testing specific properties that were broken in a bug and subsequently fixed. What properties exactly are you looking for? Why /edit: done in #2851
yeah, lines like that are more defensive coding. I add them even to internal code to force us to look at everything instead of having a
You mean the fact that the index makes the dataframe now actually useful? I can’t think of a way in which this breaks things in a way that isn’t immediately obvious and welcome. Of course, code can be infinitely weird, but can you think of a scenario? |
I get that it is more useful, but any code that was accessing it with |
We have documented the columns returned by the data frame, not the index, so I’d say the documented behavior stayed the same |
log1p
,normalize_per_cell
,filter_cells
/filter_genes
#2814TODO:
flavor=
"seurat"
"cell_ranger"
n_top_genes=n
Aggregate
when paired with'median'
dask/dask#10853){min,max}_{disp,mean}