Workflows

Common workflows are described here.

Data Handling

`brefito download-data`

Downloads the files specified in a sample

Inputs: data.csv
Outputs: nanopore-reads, illumina-reads
Options: --config data_csv=<path> (Default is data.csv)
         --resources connections=<int>

The connections resource controls how many simultaneous download jobs will be used. By default it is 1. Be careful to not make it too high and overload your system!

`brefito merge-reads`

Merges the raw reads corresponding to each sample into one file per type of read.

Inputs: data.csv
Outputs: merged-reads
Options: <same options as download-data>
         --config breseq_options="<breseq_options>"

`brefito merge-trimmed-reads`

Merges the trimmed reads corresponding to each sample into one file per type of read.

Inputs: data.csv
Outputs: merged-reads-trimmed
Options: <same options as download-data>
         --config breseq_options="<breseq_options>"

Predicting Mutations by Mapping Reads to a Reference

`brefito predict-mutations-breseq`

Runs breseq using the reference files and trimmed read files.

Inputs: data.csv
Outputs: breseq-references/data, breseq-references/html, breseq-references/gd 
Options: <same options as download-data>
         --config BRESEQ_OPTIONS="<breseq_options>"
            Options that get passed to breseq
         --config BRESEQ_THREADS=<int>
            Override the default number of threads for each breseq job.
         --config No_DEFAULT_BRESEQ_OPTIONS=<bool>
            Don't pass the default option of -x to breseq when using nanopore reads

`brefito tabulate-ssrs-breseq`

Runs breseq using the reference files and trimmed read files. Then runs breseq CL-TABULATE on the aligned reads to create a CSV file that counts how many reads have different numbers of bases in each mononucleotide repeat with at least a certain minimum length in the reference file.

Inputs: data.csv
Outputs: breseq-references/ssrs
Options: <same options as download-data>
         --config ssr_minimum_length=<int>
            Minimum length (--minimum-length) parameter passed to `breseq CL-TABULATE`
         --config ssr_strict_mode=<bool>
            Pass the `--strict` parameter to `breseq CL-TABULATE`.

`brefito compare-mutations-breseq`

Runs predict-mutations-breseq and then generates HTML compare tables to summarize similarities and differences between samples. Different compare table files are created for each set of samples that were compared against different reference sequences.

Inputs: data.csv
Outputs: breseq-references/compare[_#].html, breseq-references/html, breseq-references/gd 
Options: <same options as predict-mutations-breseq>

`brefito coverage-plots-breseq`

Runs breseq BAM2COV to create coverage plots tiling the reference genome.

inputs: breseq-references/data
Outputs: breseq-references/cov
Options: <same options as predict-mutations-breseq>

`brefito mutate-genomes-gdtools`

Uses gdtools from breseq to apply the GenomeDiff files in genome_diff to generate updated reference genomes that include those mutations. One GenomeDiff file is expected per sample with the *.gd file ending. These could be copied from a breseq-*/gd directory and then manually edited to curate the mutations they describe.

Inputs: data.csv, genome-diffs/*.gd
Outputs: mutants
Options: <same options as download-data>

You can use predict-mutations-breseq-mutants after this command to re-run breseq using the input reads against the hypothesized mutant genome sequences. If their lists of mutations are correct and complete the output should now show no mutations predicted.

Evaluating Assemblies

`brefito align-reads`

Generates files that can be loaded in IGV to view sequences (FASTA/FAI), reads (BAM/BAI) and annotations (GFF). Runs minimap2 for nanopore reads and bowtie2 for illumina reads for mapping to the provided reference.

Inputs: data.csv
Outputs: align-reads-references/data
Options: <same options as download-data>

`brefito check-soft-clipping`

Analyzes and plots soft-clipped reads after mapping.

Inputs: align-reads-references/data
Outputs: align-reads-reference/soft-clipping
Options: <same options as download-data>

`brefito annotate-genomes`

Combines annotations of genes from prokka with annotations of IS elements from isescan into a final Genbank file for each sample.

Inputs: data.csv, references
Outputs: annotated-references
Options: <same options as download-data>

Assembling genomes with Autocycler

Requirements

Autocycler is not available on bioconda. Download the release for your OS from the Autocycler GitHub.
DO NOT download the binary into the brefito folder. This interferes with the execution of Snakemake.
Add the path to the folder that contains the autocycler binary to your $PATH variable.
Clone the Autocycler repository anywhere on your system. DO NOT clone it into the brefito folder.

git clone https://github.com/rrwick/Autocycler.git

Add the path to the scripts/ folder of this repository to your $PATH.

`brefito autocycler-assemble`

Use autocycler to generate a consensus assembly for each sample.

Inputs: data.csv
Outputs: autocycler/{sample}/output/consensus_assembly.fasta
Options: --config genome_size=<int> 
         required step to supply estimated genome size (eg: 4600000)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly