Draft PR: Add modularity and modularity_adata functions to scanpy.metrics #3613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

amalia-k510 wants to merge 3 commits into scverse:main from amalia-k510:main

+67 −3

amalia-k510 commented Apr 25, 2025

This adds two functions to compute modularity scores from a given graph and a clustering like Leiden or Louvain. The goal is to make it easier to compare different community detection methods using an external metric. This follows up on issue #2908. To my knowledge, there is no built-in way to compare clustering results nor ways to calculate modularity score.

Functions:

modularity(): accepts a connectivity matrix and label array, generates an igraph, and returns the modularity score
modularity_adata(): AnnData wrapper that pulls graph and clustering labels

amalia-k510 added 3 commits

April 25, 2025 12:57


          Modularity score functions with comments

f76dc7b


          typo fix

f092469


          Merge branch 'scverse:main' into main

7ffa1ec

amalia-k510 marked this pull request as ready for review

April 25, 2025 11:24

flying-sheep added this to the 1.12.0 milestone

flying-sheep requested changes

View reviewed changes

Member

flying-sheep left a comment •

edited

Loading

Hi! Apart from the issue with igraph being an optional dependency, this looks good!

I think we have get_igraph_from_adjacency which might be useful, but maybe not.

Could you please add tests? We have many examples on how, best would probably be

for the direct variant, manually create very small graphs to run this on so you can be sure the results are correct
for the anndata version, use neighbors to create the connectivity matrix.

Please add @needs.igraph so the test only runs when igraph is installed

if you’re unsure about anything, please search the code for examples or ask me!

If you end up implementing a non-igraph flavor for this, please test using parametrization, e.g.: @pytest.mark.parametrize("directed", [True, False], ids=["directed", "undirected"])

src/scanpy/metrics/_metrics.py

@@ @@ -89,3 +95,61 @@ def confusion_matrix( @@
                   df = df.loc[np.array(orig_idx), np.array(new_idx)]
                   return df
+              def modularity(connectivities, labels, mode="UNDIRECTED") -> float:

Member

flying-sheep Apr 25, 2025

This comment also applies to the other function.

please use type annotations and remove the : free-text type from the parameters in the docstring.

E.g. instead of connectivities : array-like or sparse matrix in the docstring, it should be connectivities: ArrayLike | CSBase in the function definition. (search the code for examples if you’re unsure where to import them from).

Also since there are limited options, use Literal["UNDIRECTED", "DIRECTED"] instead of str. Text for defaults is added automatically, so don‘t write them into the docstring either. But here a is_directed: bool parameter would be better anyway, there will never be more than two options.

src/scanpy/metrics/_metrics.py

		@@ -4,11 +4,15 @@

		from typing import TYPE_CHECKING

		import igraph as ig

Member

flying-sheep Apr 25, 2025

This is an optional dependency and therefore can’t be imported on the top-level.

Optimally we’d have multiple implementations so this works without igraph, but it’s OK to just import igraph in the function and have it fail when igraph can’t be imported.

src/scanpy/metrics/_metrics.py

               from natsort import natsorted
               from pandas.api.types import CategoricalDtype
+              from scanpy._compat import CSRBase

Member

flying-sheep Apr 25, 2025

please use relative imports (from .._compat)

src/scanpy/metrics/_metrics.py



		def modularity(connectivities, labels, mode="UNDIRECTED") -> float:
		# default mode is undirected?? can be specified as directed or undirected

Member

flying-sheep Apr 25, 2025

why is this a question? 😃

src/scanpy/metrics/_metrics.py

+                  if isinstance(connectivities, CSRBase):
+                      # Convert sparse matrix to dense format so that igraph can handle it
+                      # Weighted_Adjacency expects with nested lists or numpy arrays and not sparse matrices
+                      dense_connectivities = connectivities.toarray()

Member

flying-sheep Apr 25, 2025

The graphs we use are inherently sparse, so we should use APIs that operate on sparse graphs and never densify if possible.

Are you sure there’s no support for initializing igraph from a sparse matrix?

src/scanpy/metrics/_metrics.py

Comment on lines +126 to +129

+                  if isinstance(labels, pd.Series):
+                      labels = labels.values
+                  # making sure labels are in the right format, i.e., a list of integers
+                  labels = pd.Categorical(np.asarray(labels)).codes

Member

flying-sheep Apr 25, 2025

that’s a lot of converting around! Are you sure you can’t just pass labels into pd.Categorical directly?

src/scanpy/metrics/_metrics.py

+                  label_array = adata.obs[labels] if isinstance(labels, str) else labels
+                  # extracting the connectivities from adata.obsp["connectivities"]
+                  connectivities = adata.obsp[obsp]
+                  return modularity(connectivities, label_array)

Member

flying-sheep Apr 25, 2025 •

edited

Loading

Hmm, we don’t currently store if the connectivity calculation assumes a directed or undirected graph …

Just a thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet