Update pb fq2bam #7357

gburnett-nvidia · 2025-01-23T19:15:18Z

PR checklist

Closes #XXX

gburnett-nvidia · 2025-01-23T19:18:22Z

This PR includes testing for fastq merging, as well as changes to the input and output spec @blajoie

blajoie · 2025-01-23T22:59:37Z

Please let us know if any changes are needed to the inputs/outputs, we did implement a small refactor there with a goal of simplifying things. Would also appreciate a look at how we handled multiple input fastq pairs and keeping the RGs organized.

A separate PR for parabricks/deepvariant is coming ~tomorrow.

cc @gburnett-nvidia

…date-pb-fq2bam

famosab

Here are my two cents on the proposed changes :)

famosab · 2025-02-03T01:10:29Z

modules/nf-core/parabricks/fq2bam/main.nf

-    tuple val(meta4), path(interval_file)
-    path(known_sites)
-
+    tuple val(meta), val(read_group), path (r1_fastq, stageAs: "?/*"), path (r2_fastq, stageAs: "?/*"), path(interval_file)


Can we somehow get the read_group value from the meta? I think that would be more practical in the pipeline later on.

Why are the reads staged as you indicated there?

famosab · 2025-02-03T01:10:54Z

modules/nf-core/parabricks/fq2bam/main.nf

+    tuple val(meta), path("*.bam"), path("*.bai") , emit: bam_bai
+    tuple val(meta), path("qc_metrics/*"), optional: true, emit: qc_metrics
+    tuple val(meta), path("*.table"), optional: true, emit: bqsr_table
+    tuple val(meta), path("*.duplicate-metrics.txt"), optional: true, emit: duplicate_metrics
+    path "versions.yml", emit: versions


Can you align this as it was before? :)

famosab · 2025-02-03T01:12:29Z

modules/nf-core/parabricks/fq2bam/main.nf

+    def readgroups_string = read_group.collect { rg -> "@RG\\tID:${rg.read_group}__${rg.sample}\\tSM:${rg.sample}\\tPL:${rg.platform}\\tLB:${rg.sample}\\tPU:${rg.read_group}" }
+
+    def in_fq_command = meta.single_end 
+        ? (r1_fastq instanceof List 
+            ? r1_fastq.collect { "--in-se-fq $it" }.join(' ') 
+            : "--in-se-fq ${r1_fastq}"
+        )
+        : (r1_fastq instanceof List && r2_fastq instanceof List && readgroups_string instanceof List
+            ? (r1_fastq.indexed().collect { idx, r1 -> "--in-fq $r1 ${r2_fastq[idx]} \"${readgroups_string[idx]}\"" }).join(' ')
+            : "--in-fq ${r1_fastq} ${r2_fastq} \"${readgroups_string.join(' ')}\""
+        )
+


Can you maybe explain what you are doin here exactly in a comment? At least for me its not directly clear what is going on (even though it probably makes a lot of sense!!!) :)

famosab · 2025-02-03T01:12:38Z

modules/nf-core/parabricks/fq2bam/main.nf

    """
    INDEX=`find -L ./ -name "*.amb" | sed 's/\\.amb\$//'`
    cp $fasta \$INDEX
-
+    


Suggested change

(trailing whitespace needs to be removed or else linting fails)

famosab · 2025-02-03T01:14:02Z

modules/nf-core/parabricks/fq2bam/meta.yml

+    - r1_fastq:
+        type: file
+        description: R1 fastq file
+        pattern: "*.fastq.gz"
+    - r2_fastq:
+        type: file
+        description: R2 fastq file
+        pattern: "*.fastq.gz"


That means that the module only works with paired end fastqs now?

famosab · 2025-02-03T01:19:42Z

modules/nf-core/parabricks/fq2bam/main.nf

-    path(known_sites)
-
+    tuple val(meta), val(read_group), path (r1_fastq, stageAs: "?/*"), path (r2_fastq, stageAs: "?/*"), path(interval_file)
+    tuple path(fasta), path(fai)


Suggested change

tuple path(fasta), path(fai)

tuple val(meta2), path(fasta), path(fai)

famosab · 2025-02-03T01:20:59Z

modules/nf-core/parabricks/fq2bam/main.nf

-    tuple val(meta4), path(interval_file)
-    path(known_sites)
-
+    tuple val(meta), val(read_group), path (r1_fastq, stageAs: "?/*"), path (r2_fastq, stageAs: "?/*"), path(interval_file)


This also means that you always expect the interval file in the input channel with the reads? Why not leave it seperated as before? I think that would be less confusing!

famosab · 2025-02-03T01:21:05Z

modules/nf-core/parabricks/fq2bam/main.nf

-
+    tuple val(meta), val(read_group), path (r1_fastq, stageAs: "?/*"), path (r2_fastq, stageAs: "?/*"), path(interval_file)
+    tuple path(fasta), path(fai)
+    tuple val(meta1), path(index)


Suggested change

tuple val(meta1), path(index)

tuple val(meta3), path(index)

famosab · 2025-02-03T01:22:32Z

modules/nf-core/parabricks/fq2bam/tests/main.nf.test

+            params {
+                module_args = '--low-memory'
+                // Ref: https://forums.developer.nvidia.com/t/problem-with-gpu/256825/6
+                // Parabricks’s fq2bam requires 24GB of memory.
+                // Using --low-memory for testing
+            }


I think this can be removed for a stub test

Suggested change

params {

module_args = '--low-memory'

// Ref: https://forums.developer.nvidia.com/t/problem-with-gpu/256825/6

// Parabricks’s fq2bam requires 24GB of memory.

// Using --low-memory for testing

}

famosab · 2025-02-03T01:23:29Z

modules/nf-core/parabricks/fq2bam/tests/main.nf.test

+                { assert snapshot(
+                    bam(process.out.bam_bai[0][1]).getReadsMD5(),
+                    file(process.out.bam_bai[0][2]).name,
+                    process.out.versions,
+                    path(process.out.versions[0]).yaml
+                ).match() }


For a stub we should be able to assert like:

Suggested change

{ assert snapshot(

bam(process.out.bam_bai[0][1]).getReadsMD5(),

file(process.out.bam_bai[0][2]).name,

process.out.versions,

path(process.out.versions[0]).yaml

).match() }

{ assert snapshot(process.out).match() }

gburnett-nvidia and others added 5 commits January 16, 2025 15:31

adding new main.nf for parabricks fq2bam

952e0a1

update pb fq2bam to support merging fastq, improve inputs/outputs

687b3bb

update snaps for PE fix

b1e2802

merging

bb4eace

updating version in main.nf

ea878a2

move interval_bed back with FQ

b4d9f88

gburnett-nvidia requested a review from sateeshperi January 23, 2025 19:25

blajoie-elembio and others added 2 commits January 24, 2025 09:37

fix linting

71075d9

Merge branch 'update-pb-fq2bam' of github.com:blajoie/modules into up…

ad07605

…date-pb-fq2bam

blajoie requested a review from famosab January 31, 2025 21:02

famosab reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update pb fq2bam #7357

Update pb fq2bam #7357

gburnett-nvidia commented Jan 23, 2025 •

edited

Loading

gburnett-nvidia commented Jan 23, 2025

blajoie commented Jan 23, 2025

famosab left a comment

famosab Feb 3, 2025

famosab Feb 3, 2025

famosab Feb 3, 2025

famosab Feb 3, 2025

famosab Feb 3, 2025

famosab Feb 3, 2025

famosab Feb 3, 2025

famosab Feb 3, 2025

famosab Feb 3, 2025

famosab Feb 3, 2025

famosab Feb 3, 2025

	tuple path(fasta), path(fai)
	tuple val(meta2), path(fasta), path(fai)

Update pb fq2bam #7357

Are you sure you want to change the base?

Update pb fq2bam #7357

Conversation

gburnett-nvidia commented Jan 23, 2025 • edited Loading

PR checklist

gburnett-nvidia commented Jan 23, 2025

blajoie commented Jan 23, 2025

famosab left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gburnett-nvidia commented Jan 23, 2025 •

edited

Loading