-
Notifications
You must be signed in to change notification settings - Fork 627
Draft PR: Add modularity and modularity_adata functions to scanpy.metrics #3613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Apart from the issue with igraph
being an optional dependency, this looks good!
I think we have get_igraph_from_adjacency
which might be useful, but maybe not.
Could you please add tests? We have many examples on how, best would probably be
- for the direct variant, manually create very small graphs to run this on so you can be sure the results are correct
- for the anndata version, use
neighbors
to create the connectivity matrix.
Please add @needs.igraph
so the test only runs when igraph is installed
if you’re unsure about anything, please search the code for examples or ask me!
If you end up implementing a non-igraph flavor for this, please test using parametrization, e.g.: @pytest.mark.parametrize("directed", [True, False], ids=["directed", "undirected"])
@@ -89,3 +95,61 @@ def confusion_matrix( | |||
df = df.loc[np.array(orig_idx), np.array(new_idx)] | |||
|
|||
return df | |||
|
|||
|
|||
def modularity(connectivities, labels, mode="UNDIRECTED") -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment also applies to the other function.
please use type annotations and remove the : free-text type
from the parameters in the docstring.
E.g. instead of connectivities : array-like or sparse matrix
in the docstring, it should be connectivities: ArrayLike | CSBase
in the function definition. (search the code for examples if you’re unsure where to import them from).
Also since there are limited options, use Literal["UNDIRECTED", "DIRECTED"]
instead of str
. Text for defaults is added automatically, so don‘t write them into the docstring either. But here a is_directed: bool
parameter would be better anyway, there will never be more than two options.
@@ -4,11 +4,15 @@ | |||
|
|||
from typing import TYPE_CHECKING | |||
|
|||
import igraph as ig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an optional dependency and therefore can’t be imported on the top-level.
Optimally we’d have multiple implementations so this works without igraph, but it’s OK to just import igraph in the function and have it fail when igraph can’t be imported.
from natsort import natsorted | ||
from pandas.api.types import CategoricalDtype | ||
|
||
from scanpy._compat import CSRBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use relative imports (from .._compat
)
|
||
|
||
def modularity(connectivities, labels, mode="UNDIRECTED") -> float: | ||
# default mode is undirected?? can be specified as directed or undirected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a question? 😃
if isinstance(connectivities, CSRBase): | ||
# Convert sparse matrix to dense format so that igraph can handle it | ||
# Weighted_Adjacency expects with nested lists or numpy arrays and not sparse matrices | ||
dense_connectivities = connectivities.toarray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The graphs we use are inherently sparse, so we should use APIs that operate on sparse graphs and never densify if possible.
Are you sure there’s no support for initializing igraph from a sparse matrix?
if isinstance(labels, pd.Series): | ||
labels = labels.values | ||
# making sure labels are in the right format, i.e., a list of integers | ||
labels = pd.Categorical(np.asarray(labels)).codes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that’s a lot of converting around! Are you sure you can’t just pass labels
into pd.Categorical
directly?
label_array = adata.obs[labels] if isinstance(labels, str) else labels | ||
# extracting the connectivities from adata.obsp["connectivities"] | ||
connectivities = adata.obsp[obsp] | ||
return modularity(connectivities, label_array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we don’t currently store if the connectivity calculation assumes a directed or undirected graph …
Just a thought.
This adds two functions to compute modularity scores from a given graph and a clustering like Leiden or Louvain. The goal is to make it easier to compare different community detection methods using an external metric. This follows up on issue #2908. To my knowledge, there is no built-in way to compare clustering results nor ways to calculate modularity score.
Functions: