`normalize_total` with numba #3135

Intron7 · 2024-07-02T12:12:32Z

What kind of feature would you like to request?

Other?

Please describe your wishes

This would speed up normalization

ilan-gold · 2024-08-08T13:43:18Z

Waiting for potential work already done on this by intel folks (@ashish615 have you guys worked on this?)

ashish615 · 2024-08-08T13:59:28Z

@Intron7 code for below link may work for only csr matrix.
https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/blob/main/pipelines/single-cell-RNA-seq-analysis/notebooks/fastpp.py#L499-L522

ilan-gold · 2024-10-17T13:41:48Z

Implement with or without Intel PR

selmanozleyen · 2025-04-07T13:22:51Z

So the implementation from intel would replace the axis_mul_or_truediv function we use as last step here:

scanpy/src/scanpy/preprocessing/_normalization.py

Lines 28 to 52 in 834159a

    
           def _normalize_data(X, counts, after=None, *, copy: bool = False): 
        
               X = X.copy() if copy else X 
        
               if issubclass(X.dtype.type, int | np.integer): 
        
                   X = X.astype(np.float32)  # TODO: Check if float64 should be used 
        
               if after is None: 
        
                   if isinstance(counts, DaskArray): 
        
                       def nonzero_median(x): 
        
                           return np.ma.median(np.ma.masked_array(x, x == 0)).item() 
        
                       after = da.from_delayed( 
        
                           dask.delayed(nonzero_median)(counts), 
        
                           shape=(), 
        
                           meta=counts._meta, 
        
                           dtype=counts.dtype, 
        
                       ) 
        
                   else: 
        
                       counts_greater_than_zero = counts[counts > 0] 
        
                       after = np.median(counts_greater_than_zero, axis=0) 
        
               counts = counts / after 
        
               out = X if isinstance(X, np.ndarray | CSBase) else None 
        
               return axis_mul_or_truediv( 
        
                   X, counts, op=truediv, out=out, allow_divide_by_zero=False, axis=0 
        
               )

Because everything done until that point is to find median in case there is no fixed target sum given. So I would essentially have to utilize numba in axis_mul_or_truediv, does that sound right @ilan-gold @flying-sheep ? If so I would proceed on that

ilan-gold · 2025-04-07T13:30:51Z

@selmanozleyen That does sound correct, although I have not looked into it closely before just now, and there is not much of a bread crumb trail to pick at since the Intel code is so different. So I don't want to say "yes for sure" but it does seem that way. It's possible axis_sum could be another place to use numba, I'm really not sure. BTW a good place for something like "fast axis multiplication" or "axis sum" could be https://github.com/scverse/fast-array-utils (and in fact maybe should be there). But prototyping here and getting the actual numba function ready would make sense. Maybe @Intron7 wants to add something as to which part of normalize_total would be "numba-able" if not those mentioned?

flying-sheep · 2025-04-08T09:10:05Z

I removed the axis from the name, since extending the functions to allow axis=None wasn’t that hard, and yes, sum is there.

See here for my thoughts about the scope of fast-array-utils: #3449

Intron7 added Enhancement ✨ Triage 🩺 This issue needs to be triaged by a maintainer labels Jul 2, 2024

Intron7 self-assigned this Jul 2, 2024

flying-sheep added Area – Performance 🐌 and removed Triage 🩺 This issue needs to be triaged by a maintainer labels Aug 8, 2024

flying-sheep removed the Enhancement ✨ label Dec 16, 2024

ilan-gold assigned selmanozleyen and Intron7 and unassigned Intron7 Apr 3, 2025

Intron7 linked a pull request Apr 8, 2025 that will close this issue

normalize_total with numba #3571

Open

flying-sheep linked a pull request Apr 17, 2025 that will close this issue

normalize_total with numba #3571

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`normalize_total` with numba #3135

`normalize_total` with numba #3135

Intron7 commented Jul 2, 2024

ilan-gold commented Aug 8, 2024

ashish615 commented Aug 8, 2024

ilan-gold commented Oct 17, 2024

selmanozleyen commented Apr 7, 2025

ilan-gold commented Apr 7, 2025

flying-sheep commented Apr 8, 2025

normalize_total with numba #3135

normalize_total with numba #3135

Comments

Intron7 commented Jul 2, 2024

What kind of feature would you like to request?

Please describe your wishes

ilan-gold commented Aug 8, 2024

ashish615 commented Aug 8, 2024

ilan-gold commented Oct 17, 2024

selmanozleyen commented Apr 7, 2025

ilan-gold commented Apr 7, 2025

flying-sheep commented Apr 8, 2025

`normalize_total` with numba #3135

`normalize_total` with numba #3135