Skip to content
@Meta-CAMP

Meta-CAMP

Welcome to the CAMP!

The Core Analysis Modular Pipeline, the CAMP, is a software toolkit designed for dynamic and educational analyses of metagenomes, bacterial isolates, and, in general, all things microbial. The CAMP is broadly applicable and can be the main analytic workflow for many future projects. It is currently the primary analytic workflow for the Microbiome-in-a-Bottle project and the MetaSUB Consortium.

The core philosophy of the CAMP is anchored in modularity, which is meant to stand in stark contrast to the popular bioinformatic toolkits of "one-click pipelines." By defining every step in an analytic workflow as single, consistently documented and parameterized codebase, we aim to enable users to gain total control over and a deep understand of their bioinformatic analyses.

Please post questions and issues related to CAMP tools on the GitHub repository of the specific module in question.

Overview

An overview of the available metagenomics analysis modules in the Core Analysis Modular Pipeline (CAMP). All modules share the same internal architecture, but wrap a different set of algorithms (shown to the left of each box) customized to its particular analysis goals. Modules that are typically the beginning of analysis projects are coloured light blue, modules that are typically intermediate steps are coloured medium blue, and modules that are typically terminal analysis steps are coloured dark blue.

Note: The CAMP is under active development.

Available Analysis Modules

General-Purpose

Raw sequencing datasets are filtered for low-quality bases, low-complexity regions in reads, and extremely short reads using fastp (1). Reads can optionally be deduplicated. Filtered reads are trimmed of adapters using Trimmomatic (2). If host read removal is selected, trimmed and filtered reads are mapped using Bowtie2 and Samtools with the 'very-sensitive' flag to the host reference genome (here, the human reference genome assembly GRCh38), and mapped reads removed (3,4). As a last-pass, BayesHammer is used to correct sequencing errors (5). FastQC and MultiQC are used to generate overviews (ex. parameters such as per-base quality scores, sequence duplication levels) of processed dataset quality (6,7).

The processed sequencing reads can be assembled using MetaSPAdes (with optional flags for metaviral and/or plasmid assembly also available), MegaHIT, or both (8,9). Here, only MetaSPAdes was used. The assembly is subsequently summarized using MetaQUAST (10).

MAG Inference and Quality-Checking

Processed sequencing reads are mapped back to the de novo assembled contigs using Bowtie2 and Samtools. This read coverage information, along with the contig sequences themselves, are used as input for the following binning algorithms: MetaBAT2, CONCOCT, SemiBin2, MaxBin2, VAMB, and MetaBinner (11 – 16). The sets of MAGs inferred by each algorithm are used as input for DAS Tool, an ensemble binning algorithm, to generate a set of consensus MAGs scored based on the presence/absence of single-copy genes (SCGs) (17).

The consensus refined MAGs are quality-checked using an array of parameters. CheckM2 calculates completeness, which is based on the number of lineage-specific marker gene sets present in a MAG, and contamination, which is the number of over-represented multiple copies of a marker gene in a MAG (18). gunc is also used to assess contamination (19). MAGs are classified using GTDB-Tk, which relies on approximately calculating average nucleotide identity (ANI) to a database of reference genomes (20). For MAGs with a species classification, their contig content is compared to the species' reference genome and genome-based completion, misassembly, and non-alignment statistics calculated using QUAST (21). OTHER ANALYSIS GOALS

Other Analysis Goals

The processed sequencing reads can be classified using MetaPhlan4, Kraken2/Bracken, and XTree (22 – 25). All three tools were used here. To estimate the relative abundance of a taxon, MetaPhlan4 calculates marker gene coverage, Bracken calculates the proportion of reads assigned to a taxon with k-mer uniqueness-based scaling, and XTree estimates directly from unique k-mer proportions. Since each of these output reports are of different formats, the raw reports from each algorithm are standardized in format for easier comparisons downstream.

The processed sequencing reads are assembled with MetaSPAdes, and viral contigs are subsequently identified using the output assembly graph and ViralVerify (8,26). Contigs containing putative viral genetic material are also identified using VIBRANT, VirSorter, and VirFinder (27 – 29). The aggregated lists of contigs from the three inference algorithms is dereplicated using VirClust (30) and merged with the ViralVerify list, and the overall quality of the putative viruses is assessed using CheckV (31).

Open reading frames (ORFs) are identified in the de novo assembly using Bakta, and clustered using MMSeqs (32, 33). Genes are identified from these ORFs by alignment to the DIAMOND database to obtain the functional profile of the sample (34).

Raw sequencing datasets are trimmed using PoreChop, and then low-quality bases are filtered out using NanoFilt (35, 36). Host reads are optionally removed using Minimap2 (37). FastQC and MultiQC are used to generate overviews (ex. parameters such as per-base quality scores, sequence duplication levels) of processed dataset quality (6,7).

Decontamination

The decontamination module is still under construction and is not currently publicly available. A feature table of relative abundances (ex. operational taxonomic units (OTUs), taxa, metagenome-assembled genomes) is provided to Decontam and Recentrifuge, each of which estimates contamination from feature abundances either within or between samples respectively (38, 39).

Making New Analysis Modules

For full instructions on how to set up custom CAMP-style modules, please see the module template.

The CAMP Core Principles

The CAMP is meant as an alterative to one-click approaches, built on three core principles.

  1. One module, one job, one output

Going from short reads to, say, binned Metagenome-Assembled-Genomes, requires many intermediate steps and file types. This means that, in a single pipeline if one software dependency breaks, if a given user has an incompatible system with just one underlying tool, if one bug pops up in the code, the whole thing can fall apart.

Each module executes a single analytic task and provides the user with fully flexible parameters. Most run multiple different pieces of software, encouraging comparisons in how, say, different taxonomic profilers can yield different results.

Additionally, every module takes a standardized set of inputs and outputs, allowing them to be easily strung together.

  1. Designed for algorithmic understanding

One of the first steps in using a module is manually setting the parameters.yaml file. While an extra bit of effort, this encourages the user to think about what they’re running, instead of just pushing go. We’ve tried to walk the line between ease of use and encouraging understanding of the underlying process.

  1. Flexible development

Who are we to presume what your needs are, bioinformatically speaking. By separating tasks into modules, we’ve aimed to generate a toolkit that is maximally flexible. Further, if you need to build something else, constructing a new module based on existing pieces of software takes only a couple of hours for an experienced developer. This is in large-part due to our automated module-structure generation that every repository uses. Once you understand how one module is put together, you understand them all.

Citing the CAMP

If you use the CAMP, please cite it as below, as well as the software it wraps! For a list of the software, please see the Methods section in the manuscript.

CAMP: A modular metagenomics analysis system for integrated multi-step data exploration. Lauren Mak, Braden Tierney, Cynthia Ronkowski, Rodolfo Brizola Toscan, Berk Turhan, Michael Toomey, Juan Sebastian Andrade Martinez, Chenlian Fu, Alexander G Lucaci, Arthur Henrique Barrios Solano, João Carlos Setubal, James R Henriksen, Sam Zimmerman, Malika Kopbayeva, Anna Noyvert, Zana Iwan, Shraman Kar, Nikita Nakazawa, Dmitry Meleshko, Dmytro Horyslavets, Valeriia Kantsypa, Alina Frolova, Andre Kahles, David Danko, Eran Elhaik, Pawel Labaj, Serghei Mangul, The International MetaSUB Consortium, Christopher E. Mason, Iman Hajirasouliha. bioRxiv 2023.04.09.536171; doi: https://doi.org/10.1101/2023.04.09.536171

Popular repositories Loading

  1. camp_short-read-quality-control camp_short-read-quality-control Public

    HTML 4 3

  2. camp_binning camp_binning Public

    Jupyter Notebook 2 2

  3. camp_short-read-taxonomy camp_short-read-taxonomy Public

    Short read taxonomy

    Jupyter Notebook 1 2

  4. CAMP_Module_Template CAMP_Module_Template Public

    Python 1 2

  5. camp_short-read-assembly camp_short-read-assembly Public

    Jupyter Notebook 2

  6. camp_virus-phage-detect camp_virus-phage-detect Public

    Python

Repositories

Showing 10 of 13 repositories
  • .github Public
    Meta-CAMP/.github’s past year of commit activity
    0 MIT 0 0 0 Updated Nov 21, 2024
  • camp_mag-qc Public
    Meta-CAMP/camp_mag-qc’s past year of commit activity
    Jupyter Notebook 0 1 1 0 Updated Nov 9, 2024
  • Meta-CAMP/camp_virus-phage-detect’s past year of commit activity
    Python 0 0 0 0 Updated Nov 9, 2024
  • camp_short-read-taxonomy Public

    Short read taxonomy

    Meta-CAMP/camp_short-read-taxonomy’s past year of commit activity
    Jupyter Notebook 1 2 1 0 Updated Nov 9, 2024
  • Meta-CAMP/camp_short-read-quality-control’s past year of commit activity
    HTML 4 3 2 1 Updated Nov 9, 2024
  • Meta-CAMP/camp_short-read-assembly’s past year of commit activity
    Jupyter Notebook 0 2 1 0 Updated Nov 9, 2024
  • Meta-CAMP/camp_gene-catalog’s past year of commit activity
    Python 0 1 0 0 Updated Nov 9, 2024
  • camp_binning Public
    Meta-CAMP/camp_binning’s past year of commit activity
    Jupyter Notebook 2 MIT 2 1 0 Updated Nov 9, 2024
  • Meta-CAMP/CAMP_Module_Template’s past year of commit activity
    Python 1 MIT 2 0 2 Updated Nov 9, 2024
  • camp_normalization Public

    CAMP module for count data normalization

    Meta-CAMP/camp_normalization’s past year of commit activity
    Python 0 MIT 0 0 0 Updated May 29, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…