Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle lineage-references and outgroup #162

Closed
abhi18av opened this issue Jul 27, 2023 · 2 comments · Fixed by #170
Closed

Bundle lineage-references and outgroup #162

abhi18av opened this issue Jul 27, 2023 · 2 comments · Fixed by #170

Comments

@abhi18av
Copy link
Member

abhi18av commented Jul 27, 2023

Based on some recent uses of the pipeline, it was necessary to have some strains from different lineages as well as an outgroup.

I was wondering whether we should add a database similar to EXIT-RIF so that people can just turn on that option.

We've had some conversation over emails regarding this, boiling down to

  1. Using -k 66 with BWA_MEM since older
  2. Reducing the 4 quality thresholds to push through majority of samples
@abhi18av
Copy link
Member Author

abhi18av commented Aug 1, 2023

Following up on the previous discussions (please feel free to add your thoughts here)

  1. I used the GVCFs generated by the CALL_WF GATK Haplotype process and created a minimal GVCF for LineageAndOutgroup as well as LineageAndOutgroupAndEXITRIF.
gatk CombineGVCFs --java-options "-Xmx4G" \
    -R NC-000962-3-H37Rv.fa \
     -G StandardAnnotation -G AS_StandardAnnotation  \
    --variant MTb_L9.ERR4162024.g.vcf.gz  --variant MTb_L1.SAMN10185847.g.vcf.gz  --variant MTb_L7.ERR1971849.g.vcf.gz  --variant MTb_L6.SAMEA1877150.g.vcf.gz  --variant MTb_L8.SRR10828835.g.vcf.gz  --variant MTb_L410.ERR216945.g.vcf.gz  --variant MTb_L5.SAMEA1877169.g.vcf.gz  --variant MTb_L2.SAMEA1877219.g.vcf.gz  --variant MTb_L43.ERR1193883.g.vcf.gz  --variant MTb_L3.SAMEA1877181.g.vcf.gz  \
    --variant Mcanettii.ERR5104570.g.vcf.gz \
    --variant EXIT-RIF.g.vcf.gz  \
    -O LineagesAndOutgroupsAndEXITRIF.g.vcf.gz


  1. Then fed the GVCF file instead of any FASTQ, therefore they were incorporated in the MERGE_WF directly by skipping the filtration processes.
use_ref_exit_rif_gvcf: true

ref_exit_rif_gvcf: /MAGMA_MTB_PROJECTS/data/exit-rif/LineagesAndOutgroups.g.vcf.gz

ref_exit_rif_gvcf_tbi: /MAGMA_MTB_PROJECTS/data/exit-rif/LineagesAndOutgroups.g.vcf.gz.tbi

UPDATE (08-AUG-2023): Tim confirmed that using this GVCF the phylogeny looks spot-on https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000477

@abhi18av
Copy link
Member Author

abhi18av commented Aug 9, 2023

Temporary location and permanent MD5 sums of the files documented above

https://transfer.sh/AHUxP8LoVa/LineagesAndOutgroups.g.vcf.gz
https://transfer.sh/hul8Y16tyV/LineagesAndOutgroups.g.vcf.gz.tbi

And here are the MD5 sums

767eeb74076df4edfa6188e9af8e6a98  LineagesAndOutgroups.g.vcf.gz
ba260ee60652e5a04918bbba7bf02082  LineagesAndOutgroups.g.vcf.gz.tbi

@abhi18av abhi18av linked a pull request Aug 18, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant