Fixed bug: Updated parse_gtdbtk.Snakefile #19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
MAGinator had a bug because of the default mmseqs gene clustering mode used. Due to this bug, gene fragments from incomplete assemblies would end up as their own gene clusters. This would inflate the total number of gene clusters, with unforeseen downstream consequences for signature gene selection and abundance estimation.
We have fixed this bug by changing the mmseqs clustering mode to coverage mode 1, so gene fragments do not end up as separate clusters, but instead get merged with their full-length counterparts.
In addition, the mmseqs clustering workflow has been changed from easy-linclust to easy-cluster, because the latter is fast enough (20 minutes for a deep 500-sample metagenome data set), while easy-linclust employs a number of heuristics to improve speed at the cost of accuracy.