Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pb fq2bam #7357

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions modules/nf-core/parabricks/deepvariant/meta.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be part of this PR :) needs to into #7363 I think!

Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,11 @@ output:
e.g. [ id:'test' ]
- "*.vcf":
type: file
description: vcf file created with deepvariant (does not support .gz for normal vcf), optional
description: vcf file created with deepvariant (does not support .gz for normal
vcf), optional
pattern: "*.vcf"
- gvcf:
- meta:
- meta:
type: map
description: |
Groovy Map containing sample information.
Expand Down
82 changes: 62 additions & 20 deletions modules/nf-core/parabricks/fq2bam/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,23 @@ process PARABRICKS_FQ2BAM {
tag "$meta.id"
label 'process_high'
label 'process_gpu'
label 'gpu'
stageInMode 'copy'

container "nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1"

input:
tuple val(meta), path(reads)
tuple val(meta2), path(fasta)
tuple val(meta3), path(index)
tuple val(meta4), path(interval_file)
path(known_sites)

tuple val(meta), val(read_group), path (r1_fastq, stageAs: "?/*"), path (r2_fastq, stageAs: "?/*"), path(interval_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow get the read_group value from the meta? I think that would be more practical in the pipeline later on.

Why are the reads staged as you indicated there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that you always expect the interval file in the input channel with the reads? Why not leave it seperated as before? I think that would be less confusing!

tuple path(fasta), path(fai)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple path(fasta), path(fai)
tuple val(meta2), path(fasta), path(fai)

tuple val(meta1), path(index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta1), path(index)
tuple val(meta3), path(index)

path known_sites

output:
tuple val(meta), path("*.bam") , emit: bam
tuple val(meta), path("*.bai") , emit: bai
tuple val(meta), path("*.table"), emit: bqsr_table , optional:true
path("versions.yml") , emit: versions
path("qc_metrics") , emit: qc_metrics , optional:true
path("duplicate-metrics.txt") , emit: duplicate_metrics , optional:true
tuple val(meta), path("*.bam"), path("*.bai") , emit: bam_bai
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tuple val(meta), path("qc_metrics/*"), optional: true, emit: qc_metrics
tuple val(meta), path("*.table"), optional: true, emit: bqsr_table
tuple val(meta), path("*.duplicate-metrics.txt"), optional: true, emit: duplicate_metrics
path "versions.yml", emit: versions
Comment on lines +17 to +21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you align this as it was before? :)


when:
task.ext.when == null || task.ext.when
Expand All @@ -29,26 +28,42 @@ process PARABRICKS_FQ2BAM {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
error "Parabricks module does not support Conda. Please use Docker / Singularity / Podman instead."
}

def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def in_fq_command = meta.single_end ? "--in-se-fq $reads" : "--in-fq $reads"
def known_sites_command = known_sites ? known_sites.collect{"--knownSites $it"}.join(' ') : ""
def prefix = task.ext.suffix ? "${meta.id}${task.ext.suffix}" : "${meta.id}"
def known_sites_command = known_sites ? (known_sites instanceof List ? known_sites.collect { "--knownSites $it" }.join(' ') : "--knownSites ${known_sites}") : ""
def known_sites_output = known_sites ? "--out-recal-file ${prefix}.table" : ""
def interval_file_command = interval_file ? interval_file.collect{"--interval-file $it"}.join(' ') : ""
def interval_file_command = interval_file ? (interval_file instanceof List ? interval_file.collect { "--interval-file $it" }.join(' ') : "--interval-file ${interval_file}") : ""
def num_gpus = task.accelerator ? "--num-gpus $task.accelerator.request" : ''

def readgroups_string = read_group.collect { rg -> "@RG\\tID:${rg.read_group}__${rg.sample}\\tSM:${rg.sample}\\tPL:${rg.platform}\\tLB:${rg.sample}\\tPU:${rg.read_group}" }

def in_fq_command = meta.single_end
? (r1_fastq instanceof List
? r1_fastq.collect { "--in-se-fq $it" }.join(' ')
: "--in-se-fq ${r1_fastq}"
)
: (r1_fastq instanceof List && r2_fastq instanceof List && readgroups_string instanceof List
? (r1_fastq.indexed().collect { idx, r1 -> "--in-fq $r1 ${r2_fastq[idx]} \"${readgroups_string[idx]}\"" }).join(' ')
: "--in-fq ${r1_fastq} ${r2_fastq} \"${readgroups_string.join(' ')}\""
)

Comment on lines +39 to +50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe explain what you are doin here exactly in a comment? At least for me its not directly clear what is going on (even though it probably makes a lot of sense!!!) :)

"""
INDEX=`find -L ./ -name "*.amb" | sed 's/\\.amb\$//'`
cp $fasta \$INDEX

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(trailing whitespace needs to be removed or else linting fails)

pbrun \\
fq2bam \\
--ref \$INDEX \\
$in_fq_command \\
--read-group-sm $meta.id \\
--out-bam ${prefix}.bam \\
$num_gpus \\
$known_sites_command \\
$known_sites_output \\
$interval_file_command \\
$num_gpus \\
--out-qc-metrics-dir qc_metrics \\
--out-duplicate-metrics ${prefix}.duplicate-metrics.txt \\
$args

cat <<-END_VERSIONS > versions.yml
Expand All @@ -62,11 +77,38 @@ process PARABRICKS_FQ2BAM {
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
error "Parabricks module does not support Conda. Please use Docker / Singularity / Podman instead."
}
def prefix = task.ext.prefix ?: "${meta.id}"

def args = task.ext.args ?: ''
def prefix = task.ext.suffix ? "${meta.id}${task.ext.suffix}" : "${meta.id}"
def known_sites_command = known_sites ? (known_sites instanceof List ? known_sites.collect { "--knownSites $it" }.join(' ') : "--knownSites ${known_sites}") : ""
def known_sites_output = known_sites ? "--out-recal-file ${prefix}.table" : ""
def interval_file_command = interval_file ? (interval_file instanceof List ? interval_file.collect { "--interval-file $it" }.join(' ') : "--interval-file ${interval_file}") : ""

def readgroups_string = read_group.collect { rg -> "@RG\\tID:${rg.read_group}__${rg.sample}\\tSM:${rg.sample}\\tPL:${rg.platform}\\tLB:${rg.sample}\\tPU:${rg.read_group}" }

def in_fq_command = meta.single_end
? (r1_fastq instanceof List
? r1_fastq.collect { "--in-se-fq $it" }.join(' ')
: "--in-se-fq ${r1_fastq}"
)
: (r1_fastq instanceof List && r2_fastq instanceof List && readgroups_string instanceof List
? (r1_fastq.indexed().collect { idx, r1 -> "--in-fq $r1 ${r2_fastq[idx]} \"${readgroups_string[idx]}\"" }).join(' ')
: "--in-fq ${r1_fastq} ${r2_fastq} \"${readgroups_string.join(' ')}\""
)

def metrics_output_command = args = "--out-duplicate-metrics duplicate-metrics.txt" ? "touch duplicate-metrics.txt" : ""
def known_sites_output_command = known_sites ? "touch ${prefix}.table" : ""
def qc_metrics_output_command = args = "--out-qc-metrics-dir qc_metrics " ? "mkdir qc_metrics && touch qc_metrics/alignment.txt" : ""
"""

echo $in_fq_command

touch run.log
touch ${prefix}.bam
touch ${prefix}.bam.bai

$metrics_output_command
$known_sites_output_command
$qc_metrics_output_command
cat <<-END_VERSIONS > versions.yml
"${task.process}":
pbrun: \$(echo \$(pbrun version 2>&1) | sed 's/^Please.* //' )
Expand Down
90 changes: 47 additions & 43 deletions modules/nf-core/parabricks/fq2bam/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,61 +20,69 @@ input:
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- reads:
type: file
description: fastq.gz files
pattern: "*.fastq.gz"
- - meta2:
- read_group:
type: map
description: |
Groovy Map containing fasta information
- fasta:
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- r1_fastq:
type: file
description: R1 fastq file
pattern: "*.fastq.gz"
- r2_fastq:
type: file
description: R2 fastq file
pattern: "*.fastq.gz"
Comment on lines +28 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means that the module only works with paired end fastqs now?

- interval_file:
type: file
description: file or files containing genomic intervals for use in base quality
score recalibration.
pattern: "*.{bed,interval_list,picard,list,intervals}"
- - fasta:
type: file
description: reference fasta file - must be unzipped
pattern: "*.fasta"
- - meta3:
pattern: "*.{fasta,fa}"
- fai:
type: file
description: reference fasta fai index
pattern: "*.fai"
- - meta1:
type: map
description: |
Groovy Map containing index information
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- index:
type: file
description: reference BWA index
pattern: "*.{amb,ann,bwt,pac,sa}"
- - meta4:
type: map
description: |
Groovy Map containing index information
- interval_file:
type: file
description: (optional) file(s) containing genomic intervals for use in base
quality score recalibration (BQSR)
pattern: "*.{bed,interval_list,picard,list,intervals}"
- - known_sites:
type: file
description: (optional) known sites file(s) for calculating BQSR. markdups must
be true to perform BQSR.
pattern: "*.vcf.gz"
output:
- bam:
- bam_bai:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
e.g. [ id:'test']
- "*.bam":
type: file
description: Sorted BAM file
description: output bam file
pattern: "*.bam"
- bai:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.bai":
type: file
description: index corresponding to sorted BAM file
description: output bam bai file
pattern: "*.bai"
- qc_metrics:
- meta:
type: map
description: (optional) optional directory of qc metrics
- qc_metrics/*:
type: directory
description: (optional) optional directory of qc metrics
pattern: "qc_metrics"
- bqsr_table:
- meta:
type: map
Expand All @@ -83,25 +91,21 @@ output:
e.g. [ id:'test']
- "*.table":
type: file
description: (optional) table from base quality score recalibration calculation,
to be used with parabricks/applybqsr
description: bqsr table
pattern: "*.table"
- versions:
- versions.yml:
type: file
description: File containing software versions
pattern: "versions.yml"
- qc_metrics:
- qc_metrics:
type: directory
description: (optional) optional directory of qc metrics
pattern: "qc_metrics"
- duplicate_metrics:
- duplicate-metrics.txt:
- meta:
type: map
description: (optional) metrics calculated from marking duplicates in the bam
- "*.duplicate-metrics.txt":
type: file
description: (optional) metrics calculated from marking duplicates in the bam
file
pattern: "*-duplicate-metrics.txt"
- versions:
- versions.yml:
type: file
description: File containing software versions.
pattern: "versions.yml"
authors:
- "@bsiranosian"
- "@adamrtalbot"
Expand Down
Loading
Loading