Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module parameter inheritance and parameter wrapping #32

Closed
chris-cheshire opened this issue Apr 7, 2020 · 4 comments
Closed

Module parameter inheritance and parameter wrapping #32

chris-cheshire opened this issue Apr 7, 2020 · 4 comments

Comments

@chris-cheshire
Copy link
Contributor

chris-cheshire commented Apr 7, 2020

Copied from the slack channel:


Hi guys,

Can I get your feedback on a custom parameter inheritance model we have built-in for our modules?

Our user story is such that we wanted a set of default params defined inside the module to run the process in the case that the user imports the module and does nothing else.
We then wanted to be able to override the params with those from the parent nf file, but without making large boilerplate calls using addParams or by passing arguments as channels as we feel these should be retained for data.
Finally, we wanted to be able to set group parameters on multiple includes of the same module but retaining the ability to override the module params individually if we wanted to.
We found during our testing that any module params defined actually override the global parameters which is the opposite of what we wanted. This forces either the route via addParams or the route via channels, neither of which we wanted to use.

I constructed a custom groovy class which automatically overrides the params by matching names. First, the module params are prefixed with internal_* - then any parameter in the parent nf file can override an internal param by prefixing with the module name (e.g for cutadapt params.cutadapt_adapter_seq would override params.internal_adapter_seq inside the module.
This provides a model where defaults are used unless explicitly overridden in the parent. The same param is overridden in all module instances unless specifically overridden using addParams. This gives us the flexibility for example to define a global adapter sequence for cutadapt, but define separate output directories for each module instance.

The functionality requires 3 lines of code per module to implement.

I have posted the code below - please ignore the rest of the module parameter wise as we are still building out and generalising (we also know there is a cutadapt module, its just an easy example)

#!/usr/bin/env nextflow
// Include NfUtils
Class groovyClass = new GroovyClassLoader(getClass().getClassLoader()).parseClass(new File("groovy/NfUtils.groovy"));
GroovyObject nfUtils = (GroovyObject) groovyClass.newInstance();
// Define internal params
module_name = 'cutadapt'
// Specify DSL2
nextflow.preview.dsl = 2
// TODO check version of cutadapt in host process
// Define default nextflow internals
params.internal_outdir = './results'
params.internal_process_name = 'cutadapt'
params.internal_output_prefix = ''
params.internal_min_quality = 10
params.internal_min_length = 16
params.internal_adapter_sequence = 'AGATCGGAAGAGC'
// Check if globals need to 
nfUtils.check_internal_overrides(module_name, params)
// Trimming reusable component
process cutadapt {
    // Tag
    tag "${sample_id}"
    publishDir "${params.internal_outdir}/${params.internal_process_name}",
        mode: "copy", overwrite: true
    input:
        //tuple val(sample_id), path(reads)
        path(reads)
    output:
        //tuple val(sample_id), path("${reads.simpleName}.trimmed.fq.gz")
        path("${params.internal_output_prefix}${reads.simpleName}.trimmed.fq.gz")
    shell:
    """
    cutadapt \
        -j ${task.cpus} \
        -q ${params.internal_min_quality} \
        --minimum-length ${params.internal_min_length} \
        -a ${params.internal_adapter_sequence} \
        -o ${params.internal_output_prefix}${reads.simpleName}.trimmed.fq.gz $reads
    """
}
class NfUtils{
    def check_internal_overrides(String moduleName, Map params)
    {
        // get params set of keys
        Set paramsKeySet = params.keySet()
        // Interate through and set internals to the correct parameter at runtime
        paramsKeySet.each {
            if(it.startsWith("internal_")) {
                def searchString = moduleName + '_' + it.replace('internal_', '');
                if(paramsKeySet.contains(searchString)) {
                    params.replace(it, params.get(searchString))
                }
            }
        }
    }
}
#!/usr/bin/env nextflow
// Define DSL2
nextflow.preview.dsl=2
// Log
log.info ("Starting Cutadapt trimming test pipeline")
/* Define global params
--------------------------------------------------------------------------------------*/
params.cutadapt_output_prefix = 'trimmed_'
/* Module inclusions 
--------------------------------------------------------------------------------------*/
include cutadapt from './trim-reads.nf' addParams(cutadapt_process_name: 'cutadapt1')
include cutadapt as cutadapt2 from './trim-reads.nf' addParams(cutadapt_process_name: 'cutadapt2')
/*------------------------------------------------------------------------------------*/
/* Define input channels
--------------------------------------------------------------------------------------*/
testPaths = [
  ['Sample 1', "$baseDir/input/readfile1.fq.gz"],
  ['Sample 2', "$baseDir/input/readfile2.fq.gz"],
  ['Sample 3', "$baseDir/input/readfile3.fq.gz"],
  ['Sample 4', "$baseDir/input/readfile4.fq.gz"],
  ['Sample 5', "$baseDir/input/readfile5.fq.gz"],
  ['Sample 6', "$baseDir/input/readfile6.fq.gz"]
]
// Create channel of test data (excluding the sample ID)
 Channel
  .from(testPaths)
  .map { row -> file(row[1]) }
  .set {ch_test_inputs}
  Channel
  .from(testPaths)
  .map { row -> file(row[1]) }
  .set {ch_test_inputs2}
/*------------------------------------------------------------------------------------*/
// Run workflow
workflow {
    // Run cutadapt
    cutadapt( ch_test_inputs )
    // Run cutadapt
    cutadapt2( ch_test_inputs2 )
    // Collect file names and view output
    //cutadapt.out | view 
}
@chris-cheshire chris-cheshire changed the title A model for parameter inheritance Module parameter inheritance and parameter wrapping Jun 15, 2020
@drpatelh
Copy link
Member

Copied from Slack:

As ugly as it sounds I think the easiest and possibly the most flexible option is to only define inputs and outputs in the actual module file and no other parameter definitions (other than --threads $task.cpus maybe). This follows the NF philosophy quite well.

Additional parameters would then be provided by a string of some sort with the hope that the downstream tool validates these parameters! Would be nice if we can make the variable name for optional parameters standardised e.g. --<MODULE_NAME>_options.

@ewels
Copy link
Member

ewels commented Jun 15, 2020

Agreed - my way of saying basically the same thing (I think) is this:

I think that the default module command should be as simple as possible, with dedicated parameters for the required options (will often not be any if the essential options are covered by input and output channels). Then any additional optional flags could go in a generic string param as @drpatelh suggests above.

So using the cutadapt example above, if it was impossible to run cutadapt with the -a flag then we would have params.cutadapt_adapter_sequence and then --cutadapt_options for everything else.

As mentioned on slack, the addParams thing is cool, but I can see it struggling with edge cases such as hyphens in command-line parameters (hyphens are not allowed in groovy variable names) and other random stuff. I agree that although the generic string approach is ugly, it's also the simplest to maintain and least likely to go wrong.

Remember that pipeline authors can make these generic strings however they like, so can define their own convenience params for users where applicable.

Phil

@ewels
Copy link
Member

ewels commented Jun 15, 2020

Note that @FelixKrueger is using the "simple string" approach in his DSL2 modules. See #30 for example.

Also note that starting with the "simple string" approach is not incompatible with manually adding other params - these can be added on to that string as done in #28 - though maybe we should avoid this and keep things simple (at risk of being a complete hypocrite here, sorry).

@drpatelh
Copy link
Member

We have agreed on passing all optional parameters via an options.args string and this seems to be working really well. Especially as it allows for optional parameters to be overwritten at both the user and developer level via custom configuration files.

muffato added a commit that referenced this issue Dec 8, 2022
* Adding module for miniprot_index.

* Adding module for miniprot_index.

* Adding module for miniprot_index.

* Adding module for miniprot_index.

* Adding module for miniprot_index.

* Adding module for miniprot_index.

* Adding module for miniprot_index.

* update the wrong file name

* put back the test data path

* change index file name

Co-authored-by: Guoying Qi <[email protected]>

* Adding module for miniprot/align (#31)

* Adding module for miniprot/align. Closes #1

* Adding module for miniprot/align. Closes #1

* Adding module for <software/tool>. Closes #<issue_number>.

* removed gtf flag from main.nf and meta.yml

* removed gtf flag from main.nf and meta.yml

* incorporate comments

* incorporate comments

* fixed a bug, swapped the order of reference and protein (#32)

* Fixed the paths for the new modules structure

* Switched to the nf-core test data and the biocontainer

* This output is actually named "index"

* linting

* Fixed the tool name

* Added a meta map to the reference index too, as per the latest nf-core usage

* Added another keyword

Co-authored-by: YSims <[email protected]>
Co-authored-by: Guoying Qi <[email protected]>
Co-authored-by: Matthias De Smet <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants