-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pb fq2bam #7357
base: master
Are you sure you want to change the base?
Update pb fq2bam #7357
Changes from all commits
952e0a1
687b3bb
b1e2802
bb4eace
ea878a2
b4d9f88
71075d9
ad07605
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -2,24 +2,23 @@ process PARABRICKS_FQ2BAM { | |||||
tag "$meta.id" | ||||||
label 'process_high' | ||||||
label 'process_gpu' | ||||||
label 'gpu' | ||||||
stageInMode 'copy' | ||||||
|
||||||
container "nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1" | ||||||
|
||||||
input: | ||||||
tuple val(meta), path(reads) | ||||||
tuple val(meta2), path(fasta) | ||||||
tuple val(meta3), path(index) | ||||||
tuple val(meta4), path(interval_file) | ||||||
path(known_sites) | ||||||
|
||||||
tuple val(meta), val(read_group), path (r1_fastq, stageAs: "?/*"), path (r2_fastq, stageAs: "?/*"), path(interval_file) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we somehow get the Why are the reads staged as you indicated there? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This also means that you always expect the interval file in the input channel with the reads? Why not leave it seperated as before? I think that would be less confusing! |
||||||
tuple path(fasta), path(fai) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
tuple val(meta1), path(index) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
path known_sites | ||||||
|
||||||
output: | ||||||
tuple val(meta), path("*.bam") , emit: bam | ||||||
tuple val(meta), path("*.bai") , emit: bai | ||||||
tuple val(meta), path("*.table"), emit: bqsr_table , optional:true | ||||||
path("versions.yml") , emit: versions | ||||||
path("qc_metrics") , emit: qc_metrics , optional:true | ||||||
path("duplicate-metrics.txt") , emit: duplicate_metrics , optional:true | ||||||
tuple val(meta), path("*.bam"), path("*.bai") , emit: bam_bai | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Usually every file gets their own output channel. See reason: https://nf-co.re/docs/guidelines/components/modules#compression-of-input-and-output-files |
||||||
tuple val(meta), path("qc_metrics/*"), optional: true, emit: qc_metrics | ||||||
tuple val(meta), path("*.table"), optional: true, emit: bqsr_table | ||||||
tuple val(meta), path("*.duplicate-metrics.txt"), optional: true, emit: duplicate_metrics | ||||||
path "versions.yml", emit: versions | ||||||
Comment on lines
+17
to
+21
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you align this as it was before? :) |
||||||
|
||||||
when: | ||||||
task.ext.when == null || task.ext.when | ||||||
|
@@ -29,26 +28,42 @@ process PARABRICKS_FQ2BAM { | |||||
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | ||||||
error "Parabricks module does not support Conda. Please use Docker / Singularity / Podman instead." | ||||||
} | ||||||
|
||||||
def args = task.ext.args ?: '' | ||||||
def prefix = task.ext.prefix ?: "${meta.id}" | ||||||
def in_fq_command = meta.single_end ? "--in-se-fq $reads" : "--in-fq $reads" | ||||||
def known_sites_command = known_sites ? known_sites.collect{"--knownSites $it"}.join(' ') : "" | ||||||
def prefix = task.ext.suffix ? "${meta.id}${task.ext.suffix}" : "${meta.id}" | ||||||
def known_sites_command = known_sites ? (known_sites instanceof List ? known_sites.collect { "--knownSites $it" }.join(' ') : "--knownSites ${known_sites}") : "" | ||||||
def known_sites_output = known_sites ? "--out-recal-file ${prefix}.table" : "" | ||||||
def interval_file_command = interval_file ? interval_file.collect{"--interval-file $it"}.join(' ') : "" | ||||||
def interval_file_command = interval_file ? (interval_file instanceof List ? interval_file.collect { "--interval-file $it" }.join(' ') : "--interval-file ${interval_file}") : "" | ||||||
def num_gpus = task.accelerator ? "--num-gpus $task.accelerator.request" : '' | ||||||
|
||||||
def readgroups_string = read_group.collect { rg -> "@RG\\tID:${rg.read_group}__${rg.sample}\\tSM:${rg.sample}\\tPL:${rg.platform}\\tLB:${rg.sample}\\tPU:${rg.read_group}" } | ||||||
|
||||||
def in_fq_command = meta.single_end | ||||||
? (r1_fastq instanceof List | ||||||
? r1_fastq.collect { "--in-se-fq $it" }.join(' ') | ||||||
: "--in-se-fq ${r1_fastq}" | ||||||
) | ||||||
: (r1_fastq instanceof List && r2_fastq instanceof List && readgroups_string instanceof List | ||||||
? (r1_fastq.indexed().collect { idx, r1 -> "--in-fq $r1 ${r2_fastq[idx]} \"${readgroups_string[idx]}\"" }).join(' ') | ||||||
: "--in-fq ${r1_fastq} ${r2_fastq} \"${readgroups_string.join(' ')}\"" | ||||||
) | ||||||
|
||||||
Comment on lines
+39
to
+50
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you maybe explain what you are doin here exactly in a comment? At least for me its not directly clear what is going on (even though it probably makes a lot of sense!!!) :) |
||||||
""" | ||||||
INDEX=`find -L ./ -name "*.amb" | sed 's/\\.amb\$//'` | ||||||
cp $fasta \$INDEX | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (trailing whitespace needs to be removed or else linting fails) |
||||||
pbrun \\ | ||||||
fq2bam \\ | ||||||
--ref \$INDEX \\ | ||||||
$in_fq_command \\ | ||||||
--read-group-sm $meta.id \\ | ||||||
--out-bam ${prefix}.bam \\ | ||||||
$num_gpus \\ | ||||||
$known_sites_command \\ | ||||||
$known_sites_output \\ | ||||||
$interval_file_command \\ | ||||||
$num_gpus \\ | ||||||
--out-qc-metrics-dir qc_metrics \\ | ||||||
--out-duplicate-metrics ${prefix}.duplicate-metrics.txt \\ | ||||||
$args | ||||||
|
||||||
cat <<-END_VERSIONS > versions.yml | ||||||
|
@@ -62,11 +77,38 @@ process PARABRICKS_FQ2BAM { | |||||
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { | ||||||
error "Parabricks module does not support Conda. Please use Docker / Singularity / Podman instead." | ||||||
} | ||||||
def prefix = task.ext.prefix ?: "${meta.id}" | ||||||
|
||||||
def args = task.ext.args ?: '' | ||||||
def prefix = task.ext.suffix ? "${meta.id}${task.ext.suffix}" : "${meta.id}" | ||||||
def known_sites_command = known_sites ? (known_sites instanceof List ? known_sites.collect { "--knownSites $it" }.join(' ') : "--knownSites ${known_sites}") : "" | ||||||
def known_sites_output = known_sites ? "--out-recal-file ${prefix}.table" : "" | ||||||
def interval_file_command = interval_file ? (interval_file instanceof List ? interval_file.collect { "--interval-file $it" }.join(' ') : "--interval-file ${interval_file}") : "" | ||||||
|
||||||
def readgroups_string = read_group.collect { rg -> "@RG\\tID:${rg.read_group}__${rg.sample}\\tSM:${rg.sample}\\tPL:${rg.platform}\\tLB:${rg.sample}\\tPU:${rg.read_group}" } | ||||||
|
||||||
def in_fq_command = meta.single_end | ||||||
? (r1_fastq instanceof List | ||||||
? r1_fastq.collect { "--in-se-fq $it" }.join(' ') | ||||||
: "--in-se-fq ${r1_fastq}" | ||||||
) | ||||||
: (r1_fastq instanceof List && r2_fastq instanceof List && readgroups_string instanceof List | ||||||
? (r1_fastq.indexed().collect { idx, r1 -> "--in-fq $r1 ${r2_fastq[idx]} \"${readgroups_string[idx]}\"" }).join(' ') | ||||||
: "--in-fq ${r1_fastq} ${r2_fastq} \"${readgroups_string.join(' ')}\"" | ||||||
) | ||||||
|
||||||
def metrics_output_command = args = "--out-duplicate-metrics duplicate-metrics.txt" ? "touch duplicate-metrics.txt" : "" | ||||||
def known_sites_output_command = known_sites ? "touch ${prefix}.table" : "" | ||||||
def qc_metrics_output_command = args = "--out-qc-metrics-dir qc_metrics " ? "mkdir qc_metrics && touch qc_metrics/alignment.txt" : "" | ||||||
""" | ||||||
|
||||||
echo $in_fq_command | ||||||
|
||||||
touch run.log | ||||||
touch ${prefix}.bam | ||||||
touch ${prefix}.bam.bai | ||||||
|
||||||
$metrics_output_command | ||||||
$known_sites_output_command | ||||||
$qc_metrics_output_command | ||||||
cat <<-END_VERSIONS > versions.yml | ||||||
"${task.process}": | ||||||
pbrun: \$(echo \$(pbrun version 2>&1) | sed 's/^Please.* //' ) | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,61 +20,69 @@ input: | |
description: | | ||
Groovy Map containing sample information | ||
e.g. [ id:'test', single_end:false ] | ||
- reads: | ||
type: file | ||
description: fastq.gz files | ||
pattern: "*.fastq.gz" | ||
- - meta2: | ||
- read_group: | ||
type: map | ||
description: | | ||
Groovy Map containing fasta information | ||
- fasta: | ||
Groovy Map containing sample information | ||
e.g. [ id:'test', single_end:false ] | ||
- r1_fastq: | ||
type: file | ||
description: R1 fastq file | ||
pattern: "*.fastq.gz" | ||
- r2_fastq: | ||
type: file | ||
description: R2 fastq file | ||
pattern: "*.fastq.gz" | ||
Comment on lines
+28
to
+35
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That means that the module only works with paired end fastqs now? |
||
- interval_file: | ||
type: file | ||
description: file or files containing genomic intervals for use in base quality | ||
score recalibration. | ||
pattern: "*.{bed,interval_list,picard,list,intervals}" | ||
- - fasta: | ||
type: file | ||
description: reference fasta file - must be unzipped | ||
pattern: "*.fasta" | ||
- - meta3: | ||
pattern: "*.{fasta,fa}" | ||
- fai: | ||
type: file | ||
description: reference fasta fai index | ||
pattern: "*.fai" | ||
- - meta1: | ||
type: map | ||
description: | | ||
Groovy Map containing index information | ||
Groovy Map containing sample information | ||
e.g. [ id:'test', single_end:false ] | ||
- index: | ||
type: file | ||
description: reference BWA index | ||
pattern: "*.{amb,ann,bwt,pac,sa}" | ||
- - meta4: | ||
type: map | ||
description: | | ||
Groovy Map containing index information | ||
- interval_file: | ||
type: file | ||
description: (optional) file(s) containing genomic intervals for use in base | ||
quality score recalibration (BQSR) | ||
pattern: "*.{bed,interval_list,picard,list,intervals}" | ||
- - known_sites: | ||
type: file | ||
description: (optional) known sites file(s) for calculating BQSR. markdups must | ||
be true to perform BQSR. | ||
pattern: "*.vcf.gz" | ||
output: | ||
- bam: | ||
- bam_bai: | ||
- meta: | ||
type: map | ||
description: | | ||
Groovy Map containing sample information | ||
e.g. [ id:'test', single_end:false ] | ||
e.g. [ id:'test'] | ||
- "*.bam": | ||
type: file | ||
description: Sorted BAM file | ||
description: output bam file | ||
pattern: "*.bam" | ||
- bai: | ||
- meta: | ||
type: map | ||
description: | | ||
Groovy Map containing sample information | ||
e.g. [ id:'test', single_end:false ] | ||
- "*.bai": | ||
type: file | ||
description: index corresponding to sorted BAM file | ||
description: output bam bai file | ||
pattern: "*.bai" | ||
- qc_metrics: | ||
- meta: | ||
type: map | ||
description: (optional) optional directory of qc metrics | ||
- qc_metrics/*: | ||
type: directory | ||
description: (optional) optional directory of qc metrics | ||
pattern: "qc_metrics" | ||
- bqsr_table: | ||
- meta: | ||
type: map | ||
|
@@ -83,25 +91,21 @@ output: | |
e.g. [ id:'test'] | ||
- "*.table": | ||
type: file | ||
description: (optional) table from base quality score recalibration calculation, | ||
to be used with parabricks/applybqsr | ||
description: bqsr table | ||
pattern: "*.table" | ||
- versions: | ||
- versions.yml: | ||
type: file | ||
description: File containing software versions | ||
pattern: "versions.yml" | ||
- qc_metrics: | ||
- qc_metrics: | ||
type: directory | ||
description: (optional) optional directory of qc metrics | ||
pattern: "qc_metrics" | ||
- duplicate_metrics: | ||
- duplicate-metrics.txt: | ||
- meta: | ||
type: map | ||
description: (optional) metrics calculated from marking duplicates in the bam | ||
- "*.duplicate-metrics.txt": | ||
type: file | ||
description: (optional) metrics calculated from marking duplicates in the bam | ||
file | ||
pattern: "*-duplicate-metrics.txt" | ||
- versions: | ||
- versions.yml: | ||
type: file | ||
description: File containing software versions. | ||
pattern: "versions.yml" | ||
authors: | ||
- "@bsiranosian" | ||
- "@adamrtalbot" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be part of this PR :) needs to into #7363 I think!