Skip to content

Workflows

Ira Zibbu edited this page Jan 14, 2025 · 4 revisions

Common workflows are described here.

Data Handling

brefito download-data

Downloads the files specified in a sample

Inputs: data.csv
Outputs: nanopore-reads, illumina-reads
Options: --config data_csv=<path> (Default is data.csv)
         --resources connections=<int>

The connections resource controls how many simultaneous download jobs will be used. By default it is 1. Be careful to not make it too high and overload your system!

brefito merge-reads

Merges the raw reads corresponding to each sample into one file per type of read.

Inputs: data.csv
Outputs: merged-reads
Options: <same options as download-data>
         --config breseq_options="<breseq_options>"

brefito merge-trimmed-reads

Merges the trimmed reads corresponding to each sample into one file per type of read.

Inputs: data.csv
Outputs: merged-reads-trimmed
Options: <same options as download-data>
         --config breseq_options="<breseq_options>"

Predicting Mutations by Mapping Reads to a Reference

brefito predict-mutations-breseq

Runs breseq using the reference files and trimmed read files.

Inputs: data.csv
Outputs: breseq-references/data, breseq-references/html, breseq-references/gd 
Options: <same options as download-data>
         --config BRESEQ_OPTIONS="<breseq_options>"
            Options that get passed to breseq
         --config BRESEQ_THREADS=<int>
            Override the default number of threads for each breseq job.
         --config No_DEFAULT_BRESEQ_OPTIONS=<bool>
            Don't pass the default option of -x to breseq when using nanopore reads

brefito tabulate-ssrs-breseq

Runs breseq using the reference files and trimmed read files. Then runs breseq CL-TABULATE on the aligned reads to create a CSV file that counts how many reads have different numbers of bases in each mononucleotide repeat with at least a certain minimum length in the reference file.

Inputs: data.csv
Outputs: breseq-references/ssrs
Options: <same options as download-data>
         --config ssr_minimum_length=<int>
            Minimum length (--minimum-length) parameter passed to `breseq CL-TABULATE`
         --config ssr_strict_mode=<bool>
            Pass the `--strict` parameter to `breseq CL-TABULATE`.

brefito compare-mutations-breseq

Runs predict-mutations-breseq and then generates HTML compare tables to summarize similarities and differences between samples. Different compare table files are created for each set of samples that were compared against different reference sequences.

Inputs: data.csv
Outputs: breseq-references/compare[_#].html, breseq-references/html, breseq-references/gd 
Options: <same options as predict-mutations-breseq>

brefito coverage-plots-breseq

Runs breseq BAM2COV to create coverage plots tiling the reference genome.

inputs: breseq-references/data
Outputs: breseq-references/cov
Options: <same options as predict-mutations-breseq>

brefito mutate-genomes-gdtools

Uses gdtools from breseq to apply the GenomeDiff files in genome_diff to generate updated reference genomes that include those mutations. One GenomeDiff file is expected per sample with the *.gd file ending. These could be copied from a breseq-*/gd directory and then manually edited to curate the mutations they describe.

Inputs: data.csv, genome-diffs/*.gd
Outputs: mutants
Options: <same options as download-data>

You can use predict-mutations-breseq-mutants after this command to re-run breseq using the input reads against the hypothesized mutant genome sequences. If their lists of mutations are correct and complete the output should now show no mutations predicted.

Evaluating Assemblies

brefito align-reads

Generates files that can be loaded in IGV to view sequences (FASTA/FAI), reads (BAM/BAI) and annotations (GFF). Runs minimap2 for nanopore reads and bowtie2 for illumina reads for mapping to the provided reference.

Inputs: data.csv
Outputs: align-reads-references/data
Options: <same options as download-data>

brefito check-soft-clipping

Analyzes and plots soft-clipped reads after mapping.

Inputs: align-reads-references/data
Outputs: align-reads-reference/soft-clipping
Options: <same options as download-data>

brefito annotate-genomes

Combines annotations of genes from prokka with annotations of IS elements from isescan into a final Genbank file for each sample.

Inputs: data.csv, references
Outputs: annotated-references
Options: <same options as download-data>

Assembling genomes with Autocycler

Requirements

  • Autocycler is not available on bioconda. Download the release for your OS from the Autocycler GitHub.
  • DO NOT download the binary into the brefito folder. This interferes with the execution of Snakemake.
  • Add the path to the folder that contains the autocycler binary to your $PATH variable.
  • Clone the Autocycler repository anywhere on your system. DO NOT clone it into the brefito folder.
git clone https://github.com/rrwick/Autocycler.git
  • Add the path to the scripts/ folder of this repository to your $PATH.

brefito autocycler-assemble

Use autocycler to generate a consensus assembly for each sample.

Inputs: data.csv
Outputs: autocycler/{sample}/output/consensus_assembly.fasta
Options: --config genome_size=<int> 
         required step to supply estimated genome size (eg: 4600000)