-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pb fq2bam #7357
base: master
Are you sure you want to change the base?
Update pb fq2bam #7357
Conversation
This PR includes testing for fastq merging, as well as changes to the input and output spec @blajoie |
Please let us know if any changes are needed to the inputs/outputs, we did implement a small refactor there with a goal of simplifying things. Would also appreciate a look at how we handled multiple input fastq pairs and keeping the RGs organized. A separate PR for parabricks/deepvariant is coming ~tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my two cents on the proposed changes :)
tuple val(meta4), path(interval_file) | ||
path(known_sites) | ||
|
||
tuple val(meta), val(read_group), path (r1_fastq, stageAs: "?/*"), path (r2_fastq, stageAs: "?/*"), path(interval_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we somehow get the read_group
value from the meta? I think that would be more practical in the pipeline later on.
Why are the reads staged as you indicated there?
tuple val(meta), path("*.bam"), path("*.bai") , emit: bam_bai | ||
tuple val(meta), path("qc_metrics/*"), optional: true, emit: qc_metrics | ||
tuple val(meta), path("*.table"), optional: true, emit: bqsr_table | ||
tuple val(meta), path("*.duplicate-metrics.txt"), optional: true, emit: duplicate_metrics | ||
path "versions.yml", emit: versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you align this as it was before? :)
def readgroups_string = read_group.collect { rg -> "@RG\\tID:${rg.read_group}__${rg.sample}\\tSM:${rg.sample}\\tPL:${rg.platform}\\tLB:${rg.sample}\\tPU:${rg.read_group}" } | ||
|
||
def in_fq_command = meta.single_end | ||
? (r1_fastq instanceof List | ||
? r1_fastq.collect { "--in-se-fq $it" }.join(' ') | ||
: "--in-se-fq ${r1_fastq}" | ||
) | ||
: (r1_fastq instanceof List && r2_fastq instanceof List && readgroups_string instanceof List | ||
? (r1_fastq.indexed().collect { idx, r1 -> "--in-fq $r1 ${r2_fastq[idx]} \"${readgroups_string[idx]}\"" }).join(' ') | ||
: "--in-fq ${r1_fastq} ${r2_fastq} \"${readgroups_string.join(' ')}\"" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you maybe explain what you are doin here exactly in a comment? At least for me its not directly clear what is going on (even though it probably makes a lot of sense!!!) :)
""" | ||
INDEX=`find -L ./ -name "*.amb" | sed 's/\\.amb\$//'` | ||
cp $fasta \$INDEX | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(trailing whitespace needs to be removed or else linting fails)
- r1_fastq: | ||
type: file | ||
description: R1 fastq file | ||
pattern: "*.fastq.gz" | ||
- r2_fastq: | ||
type: file | ||
description: R2 fastq file | ||
pattern: "*.fastq.gz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means that the module only works with paired end fastqs now?
path(known_sites) | ||
|
||
tuple val(meta), val(read_group), path (r1_fastq, stageAs: "?/*"), path (r2_fastq, stageAs: "?/*"), path(interval_file) | ||
tuple path(fasta), path(fai) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tuple path(fasta), path(fai) | |
tuple val(meta2), path(fasta), path(fai) |
tuple val(meta4), path(interval_file) | ||
path(known_sites) | ||
|
||
tuple val(meta), val(read_group), path (r1_fastq, stageAs: "?/*"), path (r2_fastq, stageAs: "?/*"), path(interval_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also means that you always expect the interval file in the input channel with the reads? Why not leave it seperated as before? I think that would be less confusing!
|
||
tuple val(meta), val(read_group), path (r1_fastq, stageAs: "?/*"), path (r2_fastq, stageAs: "?/*"), path(interval_file) | ||
tuple path(fasta), path(fai) | ||
tuple val(meta1), path(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tuple val(meta1), path(index) | |
tuple val(meta3), path(index) |
params { | ||
module_args = '--low-memory' | ||
// Ref: https://forums.developer.nvidia.com/t/problem-with-gpu/256825/6 | ||
// Parabricks’s fq2bam requires 24GB of memory. | ||
// Using --low-memory for testing | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed for a stub test
params { | |
module_args = '--low-memory' | |
// Ref: https://forums.developer.nvidia.com/t/problem-with-gpu/256825/6 | |
// Parabricks’s fq2bam requires 24GB of memory. | |
// Using --low-memory for testing | |
} |
{ assert snapshot( | ||
bam(process.out.bam_bai[0][1]).getReadsMD5(), | ||
file(process.out.bam_bai[0][2]).name, | ||
process.out.versions, | ||
path(process.out.versions[0]).yaml | ||
).match() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a stub we should be able to assert like:
{ assert snapshot( | |
bam(process.out.bam_bai[0][1]).getReadsMD5(), | |
file(process.out.bam_bai[0][2]).name, | |
process.out.versions, | |
path(process.out.versions[0]).yaml | |
).match() } | |
{ assert snapshot(process.out).match() } |
PR checklist
Closes #XXX
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda