Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor #574

Merged
merged 7 commits into from
Jun 26, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions .github/workflows/download_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,3 @@ jobs:

- name: Inspect download
run: tree ./${{ env.REPOTITLE_LOWERCASE }}

- name: Run the downloaded pipeline (stub)
id: stub_run_pipeline
continue-on-error: true
env:
NXF_SINGULARITY_CACHEDIR: ./
NXF_SINGULARITY_HOME_MOUNT: true
run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results
Comment on lines -72 to -79
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now neither the stub run nor the regular run of the downloaded pipeline is tested? As a user of an offline cluster, it's very nice to know that nf-core download works smoothly. But if there's no space maybe there's nothing to do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's right. The downloaded pipeline is not tested anymore. I can make sure that the downloaded pipeline works locally before a release, but the github runners just don't have enough space for both the downloaded containers and the tests :(

21 changes: 14 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- A new parameter `skip_smncopynumbercaller` to skip smncopynumbercaller module[#574](https://github.com/nf-core/raredisease/pull/574)
- A new parameter `skip_sv_calling` to skip sv calling workflow [#572](https://github.com/nf-core/raredisease/pull/572)
- Two new parameters `skip_snv_calling` and `skip_repeat_analysis` to skip snv calling and repeat analysis respectively [#571](https://github.com/nf-core/raredisease/pull/571)
- Two new parameters `mbuffer_mem` and `samtools_sort_threads` to control resources given to mbuffer and samtools sort in the bwameme module [#570](https://github.com/nf-core/raredisease/pull/570)

### `Changed`

- Remove several skip parameters that had been included in the pipeline to avoid failed CI tests (see parameters table below) [#574](https://github.com/nf-core/raredisease/pull/574)
- `readcount_intervals` parameter is now mandatory for running germlinecnvcaller. [#570](https://github.com/nf-core/raredisease/pull/570)
- Turn off CNVnator, TIDDIT, SMNCopyNumberCaller, Gens, and Vcf2cytosure for targeted analysis [#573](https://github.com/nf-core/raredisease/pull/573)

Expand All @@ -23,13 +25,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Parameters

| Old parameter | New parameter |
| ------------- | --------------------- |
| | mbuffer_mem |
| | samtools_sort_threads |
| | skip_repeat_analysis |
| | skip_snv_calling |
| | skip_sv_calling |
| Old parameter | New parameter |
| --------------- | ------------------------ |
| | mbuffer_mem |
| | samtools_sort_threads |
| | skip_repeat_analysis |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think skip_repeat_calling would be more inline with the subworkflow name, the current output and usage docs ("Variant calling - repeat expansions"), and the parameters below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm.. good point. The workflow includes stranger which, as you know already, is used to annotate STRs. So it does perform more than repeat calling. Perhaps I need to change the name of the subworkflow, and its references in the pipeline 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree skip_repeat_analysis is more inline with what it actually does.

What about splitting the subworkflow into call and annotate, like you do for SNVs and SVs? Then we could use the same annotation subworkflow in both raredisease and Nallo, and you can have skip_repeat_calling and skip_repeat_annotation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great minds think like 😆
That's what I am actually doing right now :D Its much easier, and like you said, it is in line with what we are doing for other variant types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! ⭐

| | skip_snv_calling |
| | skip_sv_calling |
| skip_eklipse | |
| skip_fastqc | |
| skip_haplocheck | |
| skip_qualimap | |
| | skip_smncopynumbercaller |

## 2.1.0 - Obelix [2024-05-29]

Expand Down
6 changes: 2 additions & 4 deletions conf/modules/qc_bam.config
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,8 @@ process {
ext.prefix = { "${meta.id}_hsmetrics" }
}

if (!params.skip_qualimap) {
withName: '.*QC_BAM:QUALIMAP_BAMQC' {
ext.prefix = { "${meta.id}_qualimap" }
}
withName: '.*QC_BAM:QUALIMAP_BAMQC' {
ext.prefix = { "${meta.id}_qualimap" }
}

withName: '.*QC_BAM:TIDDIT_COV' {
Expand Down
4 changes: 0 additions & 4 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,7 @@ params {
mito_name = 'MT'

// analysis params
skip_eklipse = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip tool on Github CI
skip_fastqc = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip tool on Github CI
skip_germlinecnvcaller = true
skip_haplocheck = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip tool on Github CI
skip_qualimap = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip tool on Github CI
skip_mt_annotation = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip annotation on Github CI
skip_mt_subsample = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip subsample on Github CI
skip_peddy = true
Expand Down
4 changes: 0 additions & 4 deletions conf/test_one_sample.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,7 @@ params {
mito_name = 'MT'

// analysis params
skip_eklipse = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip tool on Github CI
skip_fastqc = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip tool on Github CI
skip_germlinecnvcaller = true
skip_haplocheck = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip tool on Github CI
skip_qualimap = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip tool on Github CI
skip_mt_annotation = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip annotation on Github CI
skip_mt_subsample = System.getenv("GITHUB_ACTIONS").equals(null) ? false : true // skip subsample on Github CI
skip_peddy = true
Expand Down
5 changes: 1 addition & 4 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,15 @@ params {
run_mt_for_wes = false
run_rtgvcfeval = false
save_mapped_as_cram = false
skip_eklipse = false
skip_fastp = false
skip_fastqc = false
skip_gens = true
skip_germlinecnvcaller = false
skip_haplocheck = false
skip_peddy = false
skip_me_calling = false
skip_me_annotation = false
skip_mt_annotation = false
skip_qualimap = false
skip_repeat_analysis = false
skip_smncopynumbercaller = false
skip_snv_annotation = false
skip_snv_calling = false
skip_sv_annotation = false
Expand Down
25 changes: 5 additions & 20 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -508,21 +508,11 @@
"description": "Specifies whether to generate and publish alignment files as cram instead of bam",
"fa_icon": "fas fa-toggle-on"
},
"skip_fastqc": {
"type": "boolean",
"description": "Specifies whether or not to skip FASTQC.",
"fa_icon": "fas fa-toggle-on"
},
"skip_fastp": {
"type": "boolean",
"description": "Specifies whether or not to skip trimming with fastp.",
"fa_icon": "fas fa-toggle-on"
},
"skip_haplocheck": {
"type": "boolean",
"description": "Specifies whether or not to skip haplocheck.",
"fa_icon": "fas fa-toggle-on"
},
"skip_gens": {
"type": "boolean",
"description": "Specifies whether or not to skip gens preprocessing subworkflow.",
Expand All @@ -533,21 +523,11 @@
"description": "Specifies whether or not to skip CNV calling using GATK's GermlineCNVCaller",
"fa_icon": "fas fa-toggle-on"
},
"skip_eklipse": {
"type": "boolean",
"description": "Specifies whether or not to skip eKLIPse.",
"fa_icon": "fas fa-toggle-on"
},
"skip_peddy": {
"type": "boolean",
"description": "Specifies whether or not to skip peddy.",
"fa_icon": "fas fa-toggle-on"
},
"skip_qualimap": {
"type": "boolean",
"description": "Specifies whether or not to skip Qualimap.",
"fa_icon": "fas fa-toggle-on"
},
"skip_me_calling": {
"type": "boolean",
"description": "Specifies whether or not to skip calling mobile elements, and the subsequent annotation step.",
Expand All @@ -573,6 +553,11 @@
"description": "Specifies whether or not to skip calling and annotation of repeat expansions.",
"fa_icon": "fas fa-toggle-on"
},
"skip_smncopynumbercaller": {
"type": "boolean",
"description": "Specifies whether or not to skip smncopynumbercaller.",
"fa_icon": "fas fa-toggle-on"
},
"skip_snv_annotation": {
"type": "boolean",
"description": "Specifies whether or not to skip annotate SNV subworkflow.",
Expand Down
6 changes: 2 additions & 4 deletions subworkflows/local/qc_bam.nf
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,8 @@ workflow QC_BAM {

PICARD_COLLECTHSMETRICS (ch_hsmetrics_in, ch_genome_fasta, ch_genome_fai, [[],[]])

if (!params.skip_qualimap) {
ch_qualimap = QUALIMAP_BAMQC (ch_bam, []).results
ch_versions = ch_versions.mix(QUALIMAP_BAMQC.out.versions.first())
}
ch_qualimap = QUALIMAP_BAMQC (ch_bam, []).results
ch_versions = ch_versions.mix(QUALIMAP_BAMQC.out.versions.first())

TIDDIT_COV (ch_bam, [[],[]]) // 2nd pos. arg is req. only for cram input

Expand Down
12 changes: 6 additions & 6 deletions subworkflows/local/utils_nfcore_raredisease_pipeline/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -224,12 +224,12 @@ def toolCitationText() {
variant_call_text = [
params.variant_caller.equals("deepvariant") ? "DeepVariant (Poplin et al., 2018)," : "",
params.variant_caller.equals("sentieon") ? "Sentieon DNAscope (Freed et al., 2022)," : "",
params.skip_haplocheck ? "" : "Haplocheck (Weissensteiner et al., 2021),",
"Haplocheck (Weissensteiner et al., 2021),",
"CNVnator (Abyzov et al., 2011),",
"TIDDIT (Eisfeldt et al., 2017),",
"Manta (Chen et al., 2016),",
"GLnexus (Yun et al., 2021),",
params.skip_eklipse ? "" : "eKLIPse (Goudenge et al., 2019),",
"eKLIPse (Goudenge et al., 2019),",
]
repeat_call_text = [
"ExpansionHunter (Dolzhenko et al., 2019),",
Expand Down Expand Up @@ -278,7 +278,7 @@ def toolCitationText() {
"RetroSeq (Keane et al., 2013),",
]
preprocessing_text = [
params.skip_fastqc ? "" : "FastQC (Andrews 2010),",
"FastQC (Andrews 2010),",
params.skip_fastp ? "" : "Fastp (Chen, 2023),",
]
other_citation_text = [
Expand Down Expand Up @@ -333,12 +333,12 @@ def toolBibliographyText() {
variant_call_text = [
params.variant_caller.equals("deepvariant") ? "<li>Poplin, R., Chang, P.-C., Alexander, D., Schwartz, S., Colthurst, T., Ku, A., Newburger, D., Dijamco, J., Nguyen, N., Afshar, P. T., Gross, S. S., Dorfman, L., McLean, C. Y., & DePristo, M. A. (2018). A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology, 36(10), 983–987. https://doi.org/10.1038/nbt.4235</li>" : "",
params.variant_caller.equals("sentieon") ? "<li>Freed, D., Pan, R., Chen, H., Li, Z., Hu, J., & Aldana, R. (2022). DNAscope: High accuracy small variant calling using machine learning [Preprint]. Bioinformatics. https://doi.org/10.1101/2022.05.20.492556</li>" : "",
params.skip_haplocheck ? "" : "<li>Weissensteiner, H., Forer, L., Fendt, L., Kheirkhah, A., Salas, A., Kronenberg, F., & Schoenherr, S. (2021). Contamination detection in sequencing studies using the mitochondrial phylogeny. Genome Research, 31(2), 309–316. https://doi.org/10.1101/gr.256545.119</li>",
"<li>Weissensteiner, H., Forer, L., Fendt, L., Kheirkhah, A., Salas, A., Kronenberg, F., & Schoenherr, S. (2021). Contamination detection in sequencing studies using the mitochondrial phylogeny. Genome Research, 31(2), 309–316. https://doi.org/10.1101/gr.256545.119</li>",
"<li>Abyzov, A., Urban, A. E., Snyder, M., & Gerstein, M. (2011). CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Research, 21(6), 974–984. https://doi.org/10.1101/gr.114876.110</li>",
"<li>Eisfeldt, J., Vezzi, F., Olason, P., Nilsson, D., & Lindstrand, A. (2017). TIDDIT, an efficient and comprehensive structural variant caller for massive parallel sequencing data. F1000Research, 6, 664. https://doi.org/10.12688/f1000research.11168.2</li>",
"<li>Chen, X., Schulz-Trieglaff, O., Shaw, R., Barnes, B., Schlesinger, F., Källberg, M., Cox, A. J., Kruglyak, S., & Saunders, C. T. (2016). Manta: Rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics, 32(8), 1220–1222. https://doi.org/10.1093/bioinformatics/btv710</li>",
"<li>Yun, T., Li, H., Chang, P.-C., Lin, M. F., Carroll, A., & McLean, C. Y. (2021). Accurate, scalable cohort variant calls using DeepVariant and GLnexus. Bioinformatics, 36(24), 5582–5589. https://doi.org/10.1093/bioinformatics/btaa1081</li>",
params.skip_eklipse ? "" : "<li>Goudenège, D., Bris, C., Hoffmann, V., Desquiret-Dumas, V., Jardel, C., Rucheton, B., Bannwarth, S., Paquis-Flucklinger, V., Lebre, A. S., Colin, E., Amati-Bonneau, P., Bonneau, D., Reynier, P., Lenaers, G., & Procaccio, V. (2019). eKLIPse: A sensitive tool for the detection and quantification of mitochondrial DNA deletions from next-generation sequencing data. Genetics in Medicine, 21(6), 1407–1416. https://doi.org/10.1038/s41436-018-0350-8</li>",
"<li>Goudenège, D., Bris, C., Hoffmann, V., Desquiret-Dumas, V., Jardel, C., Rucheton, B., Bannwarth, S., Paquis-Flucklinger, V., Lebre, A. S., Colin, E., Amati-Bonneau, P., Bonneau, D., Reynier, P., Lenaers, G., & Procaccio, V. (2019). eKLIPse: A sensitive tool for the detection and quantification of mitochondrial DNA deletions from next-generation sequencing data. Genetics in Medicine, 21(6), 1407–1416. https://doi.org/10.1038/s41436-018-0350-8</li>",
]
repeat_call_text = [
"<li>Dolzhenko, E., Deshpande, V., Schlesinger, F., Krusche, P., Petrovski, R., Chen, S., Emig-Agius, D., Gross, A., Narzisi, G., Bowman, B., Scheffler, K., van Vugt, J. J. F. A., French, C., Sanchis-Juan, A., Ibáñez, K., Tucci, A., Lajoie, B. R., Veldink, J. H., Raymond, F. L., … Eberle, M. A. (2019). ExpansionHunter: A sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics, 35(22), 4754–4756. https://doi.org/10.1093/bioinformatics/btz431</li>",
Expand Down Expand Up @@ -389,7 +389,7 @@ def toolBibliographyText() {
"<li>Keane, T. M., Wong, K., & Adams, D. J. (2013). RetroSeq: Transposable element discovery from next-generation sequencing data. Bioinformatics, 29(3), 389–390. https://doi.org/10.1093/bioinformatics/bts697</li>",
]
preprocessing_text = [
params.skip_fastqc ? "" : "<li>Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/</li>",
"<li>Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/</li>",
params.skip_fastp ? "" : "<li>Chen, S. (2023). Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta, 2(2), e107. https://doi.org/10.1002/imt2.107</li>",
]
other_citation_text = [
Expand Down
10 changes: 4 additions & 6 deletions subworkflows/local/variant_calling/call_snv_MT.nf
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,10 @@ workflow CALL_SNV_MT {

GATK4_MUTECT2_MT (ch_bam_bai_int, ch_fasta, ch_fai, ch_dict, [], [], [],[])

if (!params.skip_haplocheck) {
HAPLOCHECK_MT (GATK4_MUTECT2_MT.out.vcf).set { ch_haplocheck }
ch_versions = ch_versions.mix(HAPLOCHECK_MT.out.versions.first())
ch_haplocheck_txt = HAPLOCHECK_MT.out.txt
ch_haplocheck_html = HAPLOCHECK_MT.out.html
}
HAPLOCHECK_MT (GATK4_MUTECT2_MT.out.vcf).set { ch_haplocheck }
ch_versions = ch_versions.mix(HAPLOCHECK_MT.out.versions.first())
ch_haplocheck_txt = HAPLOCHECK_MT.out.txt
ch_haplocheck_html = HAPLOCHECK_MT.out.html

// Filter Mutect2 calls
ch_mutect_vcf = GATK4_MUTECT2_MT.out.vcf.join(GATK4_MUTECT2_MT.out.tbi, failOnMismatch:true, failOnDuplicate:true)
Expand Down
12 changes: 5 additions & 7 deletions subworkflows/local/variant_calling/call_sv_MT.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,11 @@ workflow CALL_SV_MT {
ch_eklipse_genes = Channel.empty()
ch_eklipse_circos = Channel.empty()

if (!params.skip_eklipse){
EKLIPSE(ch_bam_bai,[])
ch_eklipse_del = EKLIPSE.out.deletions
ch_eklipse_genes = EKLIPSE.out.genes
ch_eklipse_circos = EKLIPSE.out.circos
ch_versions = ch_versions.mix(EKLIPSE.out.versions.first())
}
EKLIPSE(ch_bam_bai,[])
ch_eklipse_del = EKLIPSE.out.deletions
ch_eklipse_genes = EKLIPSE.out.genes
ch_eklipse_circos = EKLIPSE.out.circos
ch_versions = ch_versions.mix(EKLIPSE.out.versions.first())

MT_DELETION(ch_bam_bai, ch_fasta)

Expand Down
15 changes: 7 additions & 8 deletions workflows/raredisease.nf
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ if (!params.skip_gens) {
mandatoryParams += ["gens_gnomad_pos", "gens_interval_list", "gens_pon_female", "gens_pon_male"]
}

if (!params.skip_smncopynumbercaller) {
mandatoryParams += ["genome"]
}
for (param in mandatoryParams.unique()) {
if (params[param] == null) {
println("params." + param + " not set.")
Expand Down Expand Up @@ -370,10 +373,8 @@ workflow RAREDISEASE {
//
// Input QC
//
if (!params.skip_fastqc) {
FASTQC (ch_samplesheet)
ch_versions = ch_versions.mix(FASTQC.out.versions.first())
}
FASTQC (ch_samplesheet)
ch_versions = ch_versions.mix(FASTQC.out.versions.first())

//
// Create chromosome bed and intervals for splitting and gathering operations
Expand Down Expand Up @@ -695,7 +696,7 @@ workflow RAREDISEASE {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

if ( params.analysis_type.equals("wgs") ) {
if ( params.analysis_type.equals("wgs") && !params.skip_smncopynumbercaller ) {
RENAME_BAM_FOR_SMNCALLER(ch_mapped.genome_marked_bam, "bam").output
.collect{it}
.toList()
Expand Down Expand Up @@ -830,9 +831,7 @@ workflow RAREDISEASE {
)
)

if (!params.skip_fastqc) {
ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]}.ifEmpty([]))
}
ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]}.ifEmpty([]))
ch_multiqc_files = ch_multiqc_files.mix(QC_BAM.out.multiple_metrics.map{it[1]}.collect().ifEmpty([]))
ch_multiqc_files = ch_multiqc_files.mix(QC_BAM.out.hs_metrics.map{it[1]}.collect().ifEmpty([]))
ch_multiqc_files = ch_multiqc_files.mix(QC_BAM.out.qualimap_results.map{it[1]}.collect().ifEmpty([]))
Expand Down