Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering bug #172

Closed
wants to merge 45 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
46444a7
Update default_params.config
LennertVerboven Aug 10, 2023
2df215a
Create variant_table_to_fasta.py
LennertVerboven Aug 10, 2023
9f09556
Update default_params.config
TimHHH Aug 10, 2023
7c871f7
Added the resistance database for TBProfiler version 5
LennertVerboven Aug 12, 2023
9735abd
Update magma-env-1.yml
LennertVerboven Aug 12, 2023
340a98f
Update setup_conda_envs.sh
LennertVerboven Aug 12, 2023
a943ad7
Update default_params.config
LennertVerboven Aug 13, 2023
e2ba4c3
Update build.sh
LennertVerboven Aug 13, 2023
7e4d928
Update Dockerfile
LennertVerboven Aug 13, 2023
8c7bf6c
Update summarize_resistance.py
LennertVerboven Aug 13, 2023
e590d85
replace sed -> python scripts [ci skip]
abhi18av Aug 13, 2023
8a0288f
tweak for python2 [ci skip]
abhi18av Aug 13, 2023
25f343e
document the different GVCF files
abhi18av Aug 13, 2023
2e9d1d0
fix typo [ci skip]
abhi18av Aug 13, 2023
fe42e9c
Added the structural variant workflow and the resistance profiling of…
LennertVerboven Aug 16, 2023
9f2a7ec
Merge branch 'update_tbprofiler_v5' of https://github.com/TORCH-Conso…
LennertVerboven Aug 16, 2023
79dfc4a
cleanup
LennertVerboven Aug 16, 2023
dbbad0d
Update setup_conda_envs.sh
LennertVerboven Aug 16, 2023
7c35e55
Update default_params.config
LennertVerboven Aug 16, 2023
33f8be5
Update rename_vcf_chrom.py
LennertVerboven Aug 16, 2023
c6e4704
Update rename_vcf_chrom.py
LennertVerboven Aug 16, 2023
4e7d69f
interim commit [ci skip]
abhi18av Aug 17, 2023
8e3e1f3
tweak variants to fasta [ci skip]
abhi18av Aug 17, 2023
80e9273
accommodate the new design for structural variants [ci skip]
abhi18av Aug 17, 2023
4e2995c
fix imports [ci skip]
abhi18av Aug 17, 2023
0d5f1dc
fix input to workflow [ci skip]
abhi18av Aug 17, 2023
294243f
dev [ci skip]
abhi18av Aug 17, 2023
f349e50
dev [ci skip]
abhi18av Aug 17, 2023
d43b341
build and push new containers for v1.1.1 [ci skip]
abhi18av Aug 17, 2023
6427271
Fixed the summarize resistance script and added the strcutural varian…
LennertVerboven Aug 17, 2023
6a4bfe4
fixed some merge conflicts
LennertVerboven Aug 17, 2023
dfbfe61
Merge pull request #168 from TORCH-Consortium/replace_sed
LennertVerboven Aug 17, 2023
af14690
add back the bc dependency [ci skip]
abhi18av Aug 17, 2023
f60c184
Update magma-env-1.yml
LennertVerboven Aug 17, 2023
72e4a3c
minimal change, add bc to container-2 only [ci skip]
abhi18av Aug 17, 2023
655da91
Update CHANGELOG.md
LennertVerboven Aug 17, 2023
f759670
Changed the permissions on some files
LennertVerboven Aug 17, 2023
a37479d
Merge branch 'update_tbprofiler_v5' of https://github.com/TORCH-Conso…
LennertVerboven Aug 17, 2023
d495bb8
Fixed a typo in the script causing structural variants to not shoiw up
LennertVerboven Aug 17, 2023
fa14386
Merge pull request #169 from TORCH-Consortium/update_tbprofiler_v5
LennertVerboven Aug 17, 2023
e0dbea2
add the default lineage reference files GVCF [ci skip]
abhi18av Aug 18, 2023
85722a0
tweak comments [ci skip]
abhi18av Aug 18, 2023
b218fb6
tweak comments in the config file [ci skip]
abhi18av Aug 18, 2023
fe82c3d
fixed the filtering bug
LennertVerboven Aug 20, 2023
c2af534
finilize filtering bug fix
LennertVerboven Aug 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Created a parallel workflow for mapping without using the strict seed lenght for use in the structural variant workflow.

Updated TBProfiler to version 5.0.0 and recreated the resistance database to work with the the new version

Updated the summarize resistance script to include the structural variants in the excel output
29 changes: 22 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,24 @@ MAGMA (**M**aximum **A**ccessible **G**enome for **M**tb **A**nalysis) is a pipe
- An (optional) GVCF reference dataset for ~600 samples is provided for augmenting smaller datasets


# (Optional) GVCF for analyzing small number of samples
# (Optional) GVCF datasets

We also provide some reference GVCF files which you could use for specific use-cases.

- For small datasets (20 samples or less), we recommend that you download the `EXIT_RIF GVCF` files from https://zenodo.org/record/8054182

- For including Mtb lineages and outgroup (M. canettii) in the phylogenetic tree, you can download the `LineagesAndOutgroup` files from https://zenodo.org/record/8233518


```
use_ref_exit_rif_gvcf = false
ref_exit_rif_gvcf = "/path/to/FILE.g.vcf.gz"
ref_exit_rif_gvcf_tbi = "/path/FILE.g.vcf.gz.tbi"
```

> :note: **Custom GVCF dataset**:
For creating a custom GVCF dataset, you can refer the discussion [here](https://github.com/TORCH-Consortium/MAGMA/issues/162).

You can download the `EXIT_RIF GVCF` files from https://zenodo.org/record/8054182

## Tutorials and Presentations

Expand Down Expand Up @@ -91,7 +106,7 @@ Which could be provided to the pipeline using `-params-file` parameter as shown
```console
nextflow run 'https://github.com/TORCH-Consortium/MAGMA' \
-profile conda_local \
-r v1.0.1 \
-r v1.1.1 \
-params-file my_parameters_1.yml

```
Expand Down Expand Up @@ -139,9 +154,9 @@ We provide [two docker containers](https://github.com/orgs/TORCH-Consortium/pack
Although, you don't need to pull the containers manually, but should you need to, you could use the following commands to pull the pre-built and provided containers

```console
docker pull ghcr.io/torch-consortium/magma/magma-container-1:1.1.0
docker pull ghcr.io/torch-consortium/magma/magma-container-1:1.1.1

docker pull ghcr.io/torch-consortium/magma/magma-container-2:1.1.0
docker pull ghcr.io/torch-consortium/magma/magma-container-2:1.1.1
```


Expand All @@ -154,7 +169,7 @@ Here's the command which should be used
nextflow run 'https://github.com/torch-consortium/magma' \
-params-file my_parameters_2.yml \
-profile docker \
-r v1.0.1
-r v1.1.1
```

> :bulb: **Hint**: <br>
Expand Down Expand Up @@ -189,7 +204,7 @@ You can then include this configuration as part of the pipeline invocation comma
```console
nextflow run 'https://github.com/torch-consortium/magma' \
-profile docker \
-r v1.0.1 \
-r v1.1.1 \
-c custom.config \
-params-file my_parameters_2.yml
```
Expand Down
1 change: 1 addition & 0 deletions bin/reformat_lofreq.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ def write_vcf(filename, df, header):
args = vars(parser.parse_args())

vcf, header, not_empty = read_vcf(args['lofreq_vcf_file'])
header = '\n'.join([i for i in header.split('\n') if 'lofreq' not in i])
if not_empty:
vcf['FORMAT'] = 'GT:AD:DP:GQ:PL'

Expand Down
6 changes: 3 additions & 3 deletions bin/rename_vcf_chrom.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#! /usr/bin/env python3
#! /usr/bin/env python2

'''Original author Jody Phelan at https://github.com/jodyphelan/pathogen-profiler/blob/master/scripts/rename_vcf_chrom.py'''
import sys
Expand Down Expand Up @@ -30,11 +30,11 @@ def cmd_out(cmd,verbose=1):
stderr.close()

def main(args):
generator = cmd_out(f"bcftools view {args.vcf}") if args.vcf else sys.stdin
generator = cmd_out("bcftools view " + args.vcf) if args.vcf else sys.stdin
convert = dict(zip(args.source,args.target))
for l in generator:
if l[0]=="#":
sys.stdout.write(l)
sys.stdout.write(l.strip()+"\n")
else:
row = l.strip().split()
row[0] = convert[row[0]]
Expand Down
318 changes: 165 additions & 153 deletions bin/summarize_resistance.py

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions bin/variant_table_to_fasta.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#! /usr/bin/env python3

import sys
import argparse

def main(args):
table = []
with open(args.table, 'r') as table_file:
table.append(table_file.readline().strip().split('\t')) # Get the headerline without modifying
# Process the actual variants
for idx, l in enumerate(table_file):
l = l.strip().split('\t')
l = [i.replace('*', '-').replace('.', '-') for i in l]
if l.count('-')/len(l) < (1-args.site_representation_cutoff):
table.append(l)
else:
pass
with open(args.output_fasta, 'w') as fasta_file:
for l in list(map(list, zip(*table))):
fasta_file.write('>{}\n{}\n'.format(l[0].replace('.GT', ''), ''.join(l[1:])))



parser = argparse.ArgumentParser(description='tbprofiler script',formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('table', type=str, help='The input table to convert (stdin if empty)')
parser.add_argument('output_fasta', type=str, help='The output fasta file')
parser.add_argument('site_representation_cutoff', type=float, help='Minimum fraction of samples that need to have a call at a site before it is considered')
parser.set_defaults(func=main)
args = parser.parse_args()
args.func(args)
26 changes: 12 additions & 14 deletions conda_envs/magma-env-1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,16 @@ name: magma-env-1
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- bioconda::gatk4=4.2.6.1
- conda-forge::r-ggplot2=3.3.5
- conda-forge::pandas=1.5.1
- conda-forge::xlsxwriter=3.0.3
- bioconda::datamash=1.1.0
- bioconda::delly=0.8.7
- bioconda::lofreq=2.1.5
- bioconda::tb-profiler=4.1.1
- bioconda::multiqc=1.11
- bioconda::fastqc=0.11.8
- bioconda::fastq_utils=0.25.1
- conda-forge::bc=1.07.1
- conda-forge::sed=4.8
- conda-forge::grep=3.11
- gatk4=4.2.6.1
- r-ggplot2=3.3.5
- pandas=1.5.1
- xlsxwriter=3.1.1
- datamash=1.1.0
- delly=0.8.7
- lofreq=2.1.5
- tb-profiler=5.0.0
- multiqc=1.11
- fastqc=0.11.8
- fastq_utils=0.25.1
23 changes: 11 additions & 12 deletions conda_envs/magma-env-2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,15 @@ name: magma-env-2
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- conda-forge::python=2.7
- bioconda::bwa=0.7.17
- bioconda::samtools=1.9
- bioconda::iqtree=2.1.2
- bioconda::snp-dists=0.8.2
- bioconda::snp-sites=2.4.0
- bioconda::bcftools=1.9
- bioconda::snpeff=4.3.1t
- bioconda::clusterpicker=1.2.3
- conda-forge::bc=1.07.1
- conda-forge::sed=4.8
- conda-forge::grep=3.11
- python=2.7
- bwa=0.7.17
- samtools=1.9
- iqtree=2.1.2
- snp-dists=0.8.2
- snp-sites=2.4.0
- bcftools=1.9
- snpeff=4.3.1t
- clusterpicker=1.2.3
- bc=1.07.1
2 changes: 1 addition & 1 deletion conda_envs/setup_conda_envs.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ cp -r ../resources/resistance_db_who ./
cd resistance_db_who

echo "INFO: Load the database within tb-profiler"
tb-profiler load_library resistance_db_who
tb-profiler load_library ./resistance_db_who

echo "INFO: Remove the local copy of the database folder"
cd ..
Expand Down
4 changes: 2 additions & 2 deletions conf/docker.config
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ process {

withName:
'GATK.*|LOFREQ.*|DELLY.*|TBPROFILER.*|MULTIQC.*|FASTQC.*|UTILS.*|FASTQ.*|SAMPLESHEET.*' {
container = "ghcr.io/torch-consortium/magma/magma-container-1:1.1.0"
container = "ghcr.io/torch-consortium/magma/magma-container-1:1.1.1"
}

withName:
'BWA.*|IQTREE.*|SNPDISTS.*|SNPSITES.*|BCFTOOLS.*|BGZIP.*|SAMTOOLS.*|SNPEFF.*|CLUSTERPICKER.*' {
container = "ghcr.io/torch-consortium/magma/magma-container-2:1.1.0"
container = "ghcr.io/torch-consortium/magma/magma-container-2:1.1.1"
}

}
Expand Down
4 changes: 2 additions & 2 deletions conf/podman.config
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ process {

withName:
'GATK.*|LOFREQ.*|DELLY.*|TBPROFILER.*|MULTIQC.*|FASTQC.*|UTILS.*|FASTQ.*|SAMPLESHEET.*' {
container = "ghcr.io/torch-consortium/magma/magma-container-1:1.1.0"
container = "ghcr.io/torch-consortium/magma/magma-container-1:1.1.1"
}

withName:
'BWA.*|IQTREE.*|SNPDISTS.*|SNPSITES.*|BCFTOOLS.*|BGZIP.*|SAMTOOLS.*|SNPEFF.*|CLUSTERPICKER.*' {
container = "ghcr.io/torch-consortium/magma/magma-container-2:1.1.0"
container = "ghcr.io/torch-consortium/magma/magma-container-2:1.1.1"
}

}
Expand Down
Loading