Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug: Updated parse_gtdbtk.Snakefile #19

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shiraz-shah
Copy link

MAGinator had a bug because of the default mmseqs gene clustering mode used. Due to this bug, gene fragments from incomplete assemblies would end up as their own gene clusters. This would inflate the total number of gene clusters, with unforeseen downstream consequences for signature gene selection and abundance estimation.

We have fixed this bug by changing the mmseqs clustering mode to coverage mode 1, so gene fragments do not end up as separate clusters, but instead get merged with their full-length counterparts.

In addition, the mmseqs clustering workflow has been changed from easy-linclust to easy-cluster, because the latter is fast enough (20 minutes for a deep 500-sample metagenome data set), while easy-linclust employs a number of heuristics to improve speed at the cost of accuracy.

Changed mmseqs gene clustering to coverage mode 1, so gene fragments do not end up as separate clusters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant