Skip to content

Draft PR: Add modularity and modularity_adata functions to scanpy.metrics #3613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

amalia-k510
Copy link

This adds two functions to compute modularity scores from a given graph and a clustering like Leiden or Louvain. The goal is to make it easier to compare different community detection methods using an external metric. This follows up on issue #2908. To my knowledge, there is no built-in way to compare clustering results nor ways to calculate modularity score.

Functions:

  • modularity(): accepts a connectivity matrix and label array, generates an igraph, and returns the modularity score
  • modularity_adata(): AnnData wrapper that pulls graph and clustering labels

@amalia-k510 amalia-k510 marked this pull request as ready for review April 25, 2025 11:24
@flying-sheep flying-sheep added this to the 1.12.0 milestone Apr 25, 2025
Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Apart from the issue with igraph being an optional dependency, this looks good!

I think we have get_igraph_from_adjacency which might be useful, but maybe not.

Could you please add tests? We have many examples on how, best would probably be

  1. for the direct variant, manually create very small graphs to run this on so you can be sure the results are correct
  2. for the anndata version, use neighbors to create the connectivity matrix.

Please add @needs.igraph so the test only runs when igraph is installed

if you’re unsure about anything, please search the code for examples or ask me!

If you end up implementing a non-igraph flavor for this, please test using parametrization, e.g.: @pytest.mark.parametrize("directed", [True, False], ids=["directed", "undirected"])

@@ -89,3 +95,61 @@ def confusion_matrix(
df = df.loc[np.array(orig_idx), np.array(new_idx)]

return df


def modularity(connectivities, labels, mode="UNDIRECTED") -> float:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment also applies to the other function.

please use type annotations and remove the : free-text type from the parameters in the docstring.

E.g. instead of connectivities : array-like or sparse matrix in the docstring, it should be connectivities: ArrayLike | CSBase in the function definition. (search the code for examples if you’re unsure where to import them from).

Also since there are limited options, use Literal["UNDIRECTED", "DIRECTED"] instead of str. Text for defaults is added automatically, so don‘t write them into the docstring either. But here a is_directed: bool parameter would be better anyway, there will never be more than two options.

@@ -4,11 +4,15 @@

from typing import TYPE_CHECKING

import igraph as ig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an optional dependency and therefore can’t be imported on the top-level.

Optimally we’d have multiple implementations so this works without igraph, but it’s OK to just import igraph in the function and have it fail when igraph can’t be imported.

from natsort import natsorted
from pandas.api.types import CategoricalDtype

from scanpy._compat import CSRBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use relative imports (from .._compat)



def modularity(connectivities, labels, mode="UNDIRECTED") -> float:
# default mode is undirected?? can be specified as directed or undirected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a question? 😃

if isinstance(connectivities, CSRBase):
# Convert sparse matrix to dense format so that igraph can handle it
# Weighted_Adjacency expects with nested lists or numpy arrays and not sparse matrices
dense_connectivities = connectivities.toarray()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The graphs we use are inherently sparse, so we should use APIs that operate on sparse graphs and never densify if possible.

Are you sure there’s no support for initializing igraph from a sparse matrix?

Comment on lines +126 to +129
if isinstance(labels, pd.Series):
labels = labels.values
# making sure labels are in the right format, i.e., a list of integers
labels = pd.Categorical(np.asarray(labels)).codes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that’s a lot of converting around! Are you sure you can’t just pass labels into pd.Categorical directly?

label_array = adata.obs[labels] if isinstance(labels, str) else labels
# extracting the connectivities from adata.obsp["connectivities"]
connectivities = adata.obsp[obsp]
return modularity(connectivities, label_array)
Copy link
Member

@flying-sheep flying-sheep Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we don’t currently store if the connectivity calculation assumes a directed or undirected graph …

Just a thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants