-
Notifications
You must be signed in to change notification settings - Fork 629
Refactored scanpy.tools._sparse_nanmean to eliminate unnecessary data… #3570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@flying-sheep Can you assign the reviewer please? Fall test looks like testing bug |
And I can rewrite logic without any coping using numba, but it can be slowly that implemented methods |
I know that I have "No milestone failure" but I haven't permissions to set milestone, probably I need help of maintainers to set it |
@Reovirus I restarted the CI as there should hopefully be no flaky tests at the moment. Don't worry about the milestone, please. |
failure is scipy.sparse.csr_matrix.count_nonzero which has axis argument since scipy 1.15.0. I'll rewrite with numba |
@Zethson cund you please restart CI? I catch very strange bug with http, it's not mine code |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
CI pased, can somebody review in some time pls |
68e1e9c
to
e089438
Compare
I made the benchmarks run, let’s see how well this works! Could you please add a release note fragment?
|
Yes, l'll make a note, thanks!) |
Benchmark changes
Comparison: https://github.com/scverse/scanpy/compare/813608e3af7e25214f4370dabb31b13a5e8159aa..7e23b19d9a0e89d1820c6fb1854c5408a814fed4 More details: https://github.com/scverse/scanpy/pull/3570/checks?check_run_id=41426505892 |
according to the benchmarks, this is actually slower than before. the benchmarks are a bit flaky, so it could be wrong, but usually a factor of 2.5 like here is real. |
Understand. Try to change and speed up |
I've replace np.add.at by np.bincount (it was slowest place). Now results is on pictures (old is old functions, new is new implementation, new_parallel is new with some njint in one case. Matrix was (5000, 5000) and I use 100 random matrix (~20% nans, ~20% zeros in one matrix) for testing). X axis like {input_format: csc|csr}{axis: 0|1}{version: old|new|new_with_numba}, y-axis is _sparse_nanmean spent time in seconds on my ubuntu VM (low-resources) UPD. I've seen benchmark is added automatically |
@flying-sheep |
I think these should be parallel numba kernels with mask arguments for major and minor axis. I have a working implementation in rapids-singlecell that doesnt need to copy at all. Starting from _get_mean_var is I think the best way forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor kernels to be 0 copy
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
@Intron7 I rewrite logics. But my local benchmarking gives another result (I just measure peak memory by ), I'm trying to fix it. And I have no idea about some metrics like preprocessing_counts.FastSuite.time_log1p('pbmc3k', 'counts-off-axis'). It change sign randomly, is it normal? |
Fixes #1894. Reduced redundant data copying; the original matrix is now copied once instead of twice. One copy remains necessary to replace NaNs with zeros without modifying the original matrix.