Speed up categorical regressor with numba #3353

Intron7 · 2024-11-11T14:06:20Z

Use numba to create the regressor for categorical regression

codecov · 2024-11-11T14:21:13Z

Codecov Report

Attention: Patch coverage is 42.85714% with 8 lines in your changes missing coverage. Please review.

Project coverage is 75.46%. Comparing base (bdcef41) to head (eedb314).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/scanpy/preprocessing/_simple.py	42.85%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3353      +/-   ##
==========================================
+ Coverage   73.09%   75.46%   +2.36%     
==========================================
  Files         113      113              
  Lines       13139    13148       +9     
==========================================
+ Hits         9604     9922     +318     
+ Misses       3535     3226     -309

Files with missing lines	Coverage Δ
src/scanpy/preprocessing/_simple.py	`88.50% <42.85%> (+0.52%)`	⬆️

... and 22 files with indirect coverage changes

tests/test_preprocessing.py

src/scanpy/preprocessing/_simple.py

ilan-gold · 2024-11-11T15:13:59Z

tests/test_preprocessing.py

+    np.testing.assert_array_almost_equal(adata.X, tester)
+
+
+def test_regressor_categorical():


I would

explain why this test exists (to test against a previous implementation? I am impartial whether it's necessary TBH since we are already testing for reproducibility, could see getting rid of this)

refactor the "Create org regressors" into a helper function like create_original

I can see your point here

Do you have an an opinion on the first point? Is this test necessary? If so, perhaps a comment then?

tests/test_preprocessing.py

ilan-gold

I think this is missing: #3353 (comment) and the first part of https://github.com/scverse/scanpy/pull/3353/files#r1836830351

tests/test_preprocessing.py

src/scanpy/preprocessing/_simple.py

ilan-gold · 2024-11-12T13:32:38Z

src/scanpy/preprocessing/_simple.py

@@ -722,13 +737,13 @@ def regress_out(
                "we regress on the mean for each category."
            )
        logg.debug("... regressing on per-gene means within categories")
-        regressors = np.zeros(X.shape, dtype="float32")
+        # Create numpy array's from categorical variable
+        cats = np.int64(len(adata.obs[keys[0]].cat.categories))


Also comment why np.int64

because it has be done because of weird typing from pandas. So this ensures that it works within the kernel

so len doesn’t return a Python int? That’s a pandas bug.

Co-authored-by: Ilan Gold <[email protected]>

tests/test_preprocessing.py

src/scanpy/preprocessing/_simple.py

ilan-gold · 2024-11-12T15:53:37Z

tests/test_preprocessing.py

+    np.testing.assert_array_almost_equal(adata.X, tester)
+
+
+def test_regressor_categorical():


Do you have an an opinion on the first point? Is this test necessary? If so, perhaps a comment then?

src/scanpy/preprocessing/_simple.py

tests/test_preprocessing.py

Intron7 · 2025-02-10T16:17:59Z

I renamed one variable to make is clearer what it is. Added some comments that should add more context what the code is doing.

src/scanpy/preprocessing/_simple.py

tests/test_preprocessing.py

src/scanpy/preprocessing/_simple.py

Co-authored-by: Ilan Gold <[email protected]>

ilan-gold · 2025-02-11T10:38:05Z

src/scanpy/preprocessing/_simple.py

+    X: np.ndarray, number_categories: int, cat_array: np.ndarray
+) -> np.ndarray:
+    # create regressor matrix for categorical variables
+    regressors = np.zeros(X.shape, dtype=X.dtype)


check dtype for behavior with integer dtype i.e., need to ensure this is a floating point matrix

ilan-gold

Why no test for the dtype if we're also fixing that bug here? or in #3461?

ilan-gold · 2025-02-13T15:19:36Z

tests/test_preprocessing.py

+        (["bulk_labels"], "regress_test_small_cat.npy", 1e-6),
+    ],
+)
+def test_regress_out_reproducible(keys, test_file, atol):


Shouldn't we add a test for integer + float as a param to this test?

src/scanpy/preprocessing/_simple.py

docs/release-notes/3353.performance.md

Co-authored-by: Philipp A. <[email protected]>

Co-authored-by: Ilan Gold <[email protected]>

Intron7 added 3 commits November 11, 2024 14:35

add function and test

086f70d

add test

37244a9

add test for regressor

b4ecb0a

Intron7 added this to the 1.11.0 milestone Nov 11, 2024

Intron7 and others added 2 commits November 11, 2024 15:54

add release note

36858d9

Merge branch 'main' into create_cat_regressor

be1bccc

Intron7 requested review from flying-sheep and ilan-gold November 11, 2024 14:56

ilan-gold requested changes Nov 11, 2024

View reviewed changes

Intron7 added 2 commits November 11, 2024 16:25

update typing

a1a59ae

update test

7b41bc8

Intron7 requested a review from ilan-gold November 11, 2024 15:36

ilan-gold requested changes Nov 12, 2024

View reviewed changes

Intron7 added 2 commits November 12, 2024 13:45

update test

119a142

update dtype

d77fa9c

ilan-gold requested changes Nov 12, 2024

View reviewed changes

Intron7 and others added 4 commits November 12, 2024 14:44

rename cats

236e356

Update tests/test_preprocessing.py

bb9cde4

Co-authored-by: Ilan Gold <[email protected]>

Update tests/test_preprocessing.py

bbb5035

Co-authored-by: Ilan Gold <[email protected]>

Update tests/test_preprocessing.py

2a92193

Co-authored-by: Ilan Gold <[email protected]>

Intron7 requested a review from ilan-gold November 12, 2024 15:18

ilan-gold requested changes Nov 12, 2024

View reviewed changes

ilan-gold and others added 4 commits November 12, 2024 16:53

Update tests/test_preprocessing.py

c7b78c0

remove test

b001c0e

update kernel

c3ce03e

remove test

c50226a

Intron7 requested a review from ilan-gold November 13, 2024 10:55

flying-sheep requested changes Nov 14, 2024

View reviewed changes

src/scanpy/preprocessing/_simple.py Outdated Show resolved Hide resolved

src/scanpy/preprocessing/_simple.py Outdated Show resolved Hide resolved

tests/test_preprocessing.py Outdated Show resolved Hide resolved

make test together

c6665f4

flying-sheep removed their request for review November 21, 2024 11:39

Merge branch 'main' into create_cat_regressor

2e16c45

flying-sheep modified the milestones: 1.11.0, 1.12.0 Dec 20, 2024

Intron7 and others added 3 commits January 23, 2025 15:57

Merge branch 'main' into create_cat_regressor

1b7d7e1

Merge branch 'main' into create_cat_regressor

3b7fe6e

update doc strings and clean up names

726a625

Intron7 requested review from flying-sheep and ilan-gold February 10, 2025 16:16

ilan-gold reviewed Feb 11, 2025

View reviewed changes

src/scanpy/preprocessing/_simple.py Outdated Show resolved Hide resolved

tests/test_preprocessing.py Show resolved Hide resolved

src/scanpy/preprocessing/_simple.py Outdated Show resolved Hide resolved

Update src/scanpy/preprocessing/_simple.py

104a0f3

Co-authored-by: Ilan Gold <[email protected]>

ilan-gold requested changes Feb 11, 2025

View reviewed changes

Intron7 added 3 commits February 11, 2025 13:11

update dtypes

f9b13be

update atol for test

6eafd04

remove int fix

1dae8f4

Intron7 requested a review from ilan-gold February 13, 2025 14:47

ilan-gold requested changes Feb 13, 2025

View reviewed changes

flying-sheep reviewed Feb 18, 2025

View reviewed changes

docs/release-notes/3353.performance.md Outdated Show resolved Hide resolved

flying-sheep modified the milestones: 1.12.0, 1.11.1 Feb 18, 2025

flying-sheep added the Area – Performance 🐌 label Feb 18, 2025

flying-sheep changed the title ~~Create cat regressor~~ Speed up categorical regressor with numba Feb 18, 2025

flying-sheep assigned Intron7 Feb 18, 2025

flying-sheep modified the milestones: 1.11.1, 1.11.2 Mar 31, 2025

Intron7 and others added 4 commits April 14, 2025 10:02

Update docs/release-notes/3353.performance.md

39ad1c0

Co-authored-by: Philipp A. <[email protected]>

Merge branch 'main' into create_cat_regressor

4f3db86

Update src/scanpy/preprocessing/_simple.py

2d578c8

Co-authored-by: Ilan Gold <[email protected]>

Fix sparse check

eedb314

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up categorical regressor with numba #3353

Speed up categorical regressor with numba #3353

Intron7 commented Nov 11, 2024

codecov bot commented Nov 11, 2024 •

edited

Loading

ilan-gold Nov 11, 2024

Intron7 Nov 11, 2024

ilan-gold Nov 12, 2024

ilan-gold left a comment

ilan-gold Nov 12, 2024

ilan-gold Nov 12, 2024

Intron7 Nov 12, 2024

flying-sheep Nov 21, 2024

ilan-gold Nov 12, 2024

Intron7 commented Feb 10, 2025

ilan-gold Feb 11, 2025

ilan-gold left a comment

ilan-gold Feb 13, 2025

		np.testing.assert_array_almost_equal(adata.X, tester)


		def test_regressor_categorical():

Speed up categorical regressor with numba #3353

Are you sure you want to change the base?

Speed up categorical regressor with numba #3353

Conversation

Intron7 commented Nov 11, 2024

codecov bot commented Nov 11, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilan-gold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Intron7 commented Feb 10, 2025

Choose a reason for hiding this comment

ilan-gold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 11, 2024 •

edited

Loading