Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default leiden clustering backend to igraph, and reduce default number of iterations #2865

Open
ivirshup opened this issue Feb 19, 2024 · 6 comments
Assignees
Milestone

Comments

@ivirshup
Copy link
Member

ivirshup commented Feb 19, 2024

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

On the recommendation of the library authors, we should change the default leiden backend to igraph (#1053) now that this has been made available in #2815).

At the same time, we should change the number of default iterations. Right now, we iterate until convergence. This can be very slow, especially for large datasets. We should probably just stick with the default of the underlying library (which is 2).

@ilan-gold
Copy link
Contributor

FutureWarning for the next release to begin the process.

@aeisenbarth
Copy link

I first understood the warning that the default will be switched so that we have to be explicit (flavor="leidenalg") if the old default works for us. But the warning persists, maybe suggesting that we should switch to flavor="igraph".

When switching to igraph, we get a deprecation warning resolution_parameter keyword argument is deprecated, use resolution=... instead which has been deprecated in igraph some years ago (7848bcb). Scanpy has not yet updated the parameter since the initial leiden implementation, or the deprecation was not noticed. Can users ignore the warning, or should users put an upper cap on the igraph version to be safe?

@ilan-gold
Copy link
Contributor

@aeisenbarth Could you provide a small code sample or point to one of our unit tests where this happens? I am not seeing what you are referring to. For example:

import scanpy as sc
adata = sc.datasets.pbmc3k()
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata, flavor="igraph", resolution=5, directed=False, n_iterations=2, copy=True)
sc.tl.leiden(adata, flavor="leidenalg", resolution=5, copy=True)

do not seem to yield the warnings you describe. Thanks!

@aeisenbarth
Copy link

DeprecationWarning is silenced by default, for not annoying users. But I get all when running unit tests with pytest.

Here with all warnings enabled:

>>> import warnings
>>> warnings.filterwarnings("always")
>>> import scanpy as sc
>>> adata = sc.datasets.pbmc3k()
>>> sc.pp.pca(adata)
>>> sc.pp.neighbors(adata)
>>> sc.tl.leiden(adata, flavor="igraph", n_iterations=2)
…/lib/python3.9/site-packages/scanpy/tools/_leiden.py:185: DeprecationWarning: resolution_parameter keyword argument is deprecated, use resolution=... instead
  part = g.community_leiden(**clustering_args)
$ pip freeze | grep -E "anndata|scanpy|igraph|python-igraph|leidenalg"
anndata==0.10.7
igraph==0.11.5
leidenalg==0.10.2
python-igraph==0.11.5
scanpy==1.10.1

@ilan-gold
Copy link
Contributor

@aeisenbarth I still cannot reproduce this warning, which is very frustrating. But I also don't understand how this could be happening. Looking at the code:

clustering_args["n_iterations"] = n_iterations
if flavor == "leidenalg":
if resolution is not None:
clustering_args["resolution_parameter"] = resolution
directed = True if directed is None else directed
g = _utils.get_igraph_from_adjacency(adjacency, directed=directed)
if partition_type is None:
partition_type = leidenalg.RBConfigurationVertexPartition
if use_weights:
clustering_args["weights"] = np.array(g.es["weight"]).astype(np.float64)
clustering_args["seed"] = random_state
part = leidenalg.find_partition(g, partition_type, **clustering_args)
else:
g = _utils.get_igraph_from_adjacency(adjacency, directed=False)
if use_weights:
clustering_args["weights"] = "weight"
if resolution is not None:
clustering_args["resolution"] = resolution
clustering_args.setdefault("objective_function", "modularity")
with _utils.set_igraph_random_state(random_state):
part = g.community_leiden(**clustering_args)
# store output into adata.obs

I cannot see how resolution_parameter would end up in the community_leiden call. I'll keep looking into this a bit more.

@ilan-gold
Copy link
Contributor

ilan-gold commented May 22, 2024

Ah I see @aeisenbarth: #2999 solved this, so this will be resolved in an upcoming release: https://github.com/scverse/scanpy/blob/main/docs/release-notes/1.10.2.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants