Skip to content

Refactored scanpy.tools._sparse_nanmean to eliminate unnecessary data… #3570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

Reovirus
Copy link

@Reovirus Reovirus commented Apr 8, 2025

Fixes #1894. Reduced redundant data copying; the original matrix is now copied once instead of twice. One copy remains necessary to replace NaNs with zeros without modifying the original matrix.​

  • Closes #1894
  • Tests included
  • Release notes not necessary because

@Reovirus
Copy link
Author

Reovirus commented Apr 8, 2025

@flying-sheep Can you assign the reviewer please? Fall test looks like testing bug

@Reovirus
Copy link
Author

Reovirus commented Apr 8, 2025

And I can rewrite logic without any coping using numba, but it can be slowly that implemented methods

@Reovirus
Copy link
Author

Reovirus commented Apr 9, 2025

I know that I have "No milestone failure" but I haven't permissions to set milestone, probably I need help of maintainers to set it

@Zethson
Copy link
Member

Zethson commented Apr 11, 2025

@Reovirus I restarted the CI as there should hopefully be no flaky tests at the moment.

Don't worry about the milestone, please.

@Reovirus
Copy link
Author

failure is scipy.sparse.csr_matrix.count_nonzero which has axis argument since scipy 1.15.0. I'll rewrite with numba

@Reovirus
Copy link
Author

@Zethson cund you please restart CI? I catch very strange bug with http, it's not mine code

@Reovirus
Copy link
Author

Reovirus commented Apr 11, 2025

CI pased, can somebody review in some time pls

@Reovirus Reovirus force-pushed the _sparse_nanmean_is_inefficient branch from 68e1e9c to e089438 Compare April 11, 2025 14:55
@flying-sheep
Copy link
Member

flying-sheep commented Apr 11, 2025

I made the benchmarks run, let’s see how well this works!

Could you please add a release note fragment?

hatch run towncrier:create 3570.performance.md

@flying-sheep flying-sheep added this to the 1.11.2 milestone Apr 11, 2025
@Reovirus
Copy link
Author

Yes, l'll make a note, thanks!)

Copy link

scverse-benchmark bot commented Apr 11, 2025

Benchmark changes

Change Before [813608e] After [7e23b19] Ratio Benchmark (Parameter)
- 10.8±0.2ms 8.80±0.4ms 0.81 preprocessing_counts.FastSuite.time_log1p('pbmc3k', 'counts-off-axis')
- 48.4±3ms 43.0±0.9ms 0.89 preprocessing_counts.time_filter_genes('pbmc3k', 'counts-off-axis')
+ 261M 308M 1.18 tools.peakmem_score_genes

Comparison: https://github.com/scverse/scanpy/compare/813608e3af7e25214f4370dabb31b13a5e8159aa..7e23b19d9a0e89d1820c6fb1854c5408a814fed4
Last changed: Wed, 30 Apr 2025 16:08:04 +0000

More details: https://github.com/scverse/scanpy/pull/3570/checks?check_run_id=41426505892

@flying-sheep
Copy link
Member

according to the benchmarks, this is actually slower than before.

the benchmarks are a bit flaky, so it could be wrong, but usually a factor of 2.5 like here is real.

@Reovirus
Copy link
Author

Understand. Try to change and speed up

@Reovirus
Copy link
Author

Reovirus commented Apr 11, 2025

I've replace np.add.at by np.bincount (it was slowest place).

Now results is on pictures (old is old functions, new is new implementation, new_parallel is new with some njint in one case. Matrix was (5000, 5000) and I use 100 random matrix (~20% nans, ~20% zeros in one matrix) for testing). X axis like {input_format: csc|csr}{axis: 0|1}{version: old|new|new_with_numba}, y-axis is _sparse_nanmean spent time in seconds on my ubuntu VM (low-resources)

test_2000_2000_100_boxplot
test_2000_2000_100_violin

UPD. I've seen benchmark is added automatically

@Reovirus
Copy link
Author

Reovirus commented Apr 11, 2025

@flying-sheep
We have a small win (23.4±2ms | 21.8±0.8ms | 0.93 | tools.time_score_genes, too high p-value to display cause stderr is high, implementation is lessly effective on smal-amount-of-zeros data) in score_genes time and small lose in memory. Should I optimize memory usage? (it can be caused by mask=np.isnan(data), instead of thic I can backgroundly recalc mask in np.nan_to_num and reduce memory on 1 array)))

@Intron7
Copy link
Member

Intron7 commented Apr 14, 2025

I think these should be parallel numba kernels with mask arguments for major and minor axis. I have a working implementation in rapids-singlecell that doesnt need to copy at all. Starting from _get_mean_var is I think the best way forward

Copy link
Member

@Intron7 Intron7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor kernels to be 0 copy

@Reovirus
Copy link
Author

Reovirus commented Apr 30, 2025

@Intron7 I rewrite logics. But my local benchmarking gives another result (I just measure peak memory by ), I'm trying to fix it. And I have no idea about some metrics like preprocessing_counts.FastSuite.time_log1p('pbmc3k', 'counts-off-axis'). It change sign randomly, is it normal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_sparse_nanmean is inefficient
4 participants