Bioinformatic pipeline for SARS-CoV-2 sequence analysis used at the Folkehelseinstituttet
Docker-based solution for sequence analysis of SARS-CoV-2 Nanopore samples
git clone
cd FHI_SC2_Pipeline_Nanopore
docker build -t garcianacho/fhisc2:Nanopore .
Note that building the image for the first time can take up to two hours.
Alternativetly, it is posible to pull updated builds from Dockerhub:
docker pull garcianacho/fhisc2:Nanopore
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore ArticV4
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore ArticV3
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore Midnight
Note that older versions of docker might require the flag --privileged and that multiuser systems might require the flag -u 1000 to run
The script expects the following folder structure where the fastq.gz files are placed inside independent folders for each Sample
./_ |-ExperimentXX.xlsx |-GridXXX |-OppsettXXX |-XXXXXXXXFAXXXXXXXXXX |-sequencing_summary_FAXXXXX.txt |-fastq_pass |-barcode1 |-XXXX_pass_barcode01_XXXX.fastq |-YYYY_pass_barcode01_YYYY.fastq |-barcode2 |-barcode3 |-....
The script also expects a .xlsx file, that contains information about the position of the samples on a 96-well-plate, the links between Barcodes and sequenceID and the DNA concentration (alternatively this column can be used for the Ct-values). It is possible to download a template of the xlsx file here
-Summary including mutations found, pangolin lineage, number of reads, coverage, depth, etc...
-Bam files
-Consensus sequences
-Aligned consensus sequences
-Consensus nucleotide sequence for gene S
-Indels and frameshift identification run against FHIs frameshift-database
-Quality-control plot for the plate to detect possible contaminations
-Phylogenetic-tree plot of the samples
-Noise during variant calling across the genome
-Quality-control for contaminations/low-quality samples
-Amplicon efficacy of the selected primer-set for all the samples