From 49b598b9e8dade03c08223a35e8b19713791090d Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 1 Feb 2024 10:01:43 +0000 Subject: [PATCH 1/5] Remove Synapse workflow from pipeline --- CHANGELOG.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 856667de..0c8c3a31 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unpublished Version / DEV] +### :warning: Major enhancements + +- The Aspera CLI was recently added to [Bioconda](https://anaconda.org/bioconda/aspera-cli) and we have added it as another way of downloading FastQ files on top of the existing FTP and sra-tools support. In our limited benchmarks on all public Clouds we found ~50% speed-up in download times compared to FTP! We are not aware of any obvious downsides and have made this the default download method in the pipeline. You can however, revert to using FTP and sra-tools using the `--force_ftp_download` and `--force_sratools_download` parameters, respectively. We would love to have your feedback! +- Support for Synapse ids has been dropped in this release. We haven't had any feedback from users whether it is being used or not. Users can run earlier versions of the pipeline if required. + ### Credits Special thanks to the following for their contributions to the release: @@ -48,9 +53,11 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements ### Parameters -| Old parameter | New parameter | -| ------------- | ---------------------- | -| | `--force_ftp_download` | +| Old parameter | New parameter | +| ------------------ | ---------------------- | +| | `--force_ftp_download` | +| `--input_type` | | +| `--synapse_config` | | > **NB:** Parameter has been **updated** if both old and new parameter information is present. > **NB:** Parameter has been **added** if just the new parameter information is present. From 4472842c87be594d3008b6ded001a587e05b4844 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 1 Feb 2024 10:02:46 +0000 Subject: [PATCH 2/5] Remove Synapse workflow from pipeline --- README.md | 7 - assets/schema_input.json | 2 +- conf/test_synapse.config | 25 ---- docs/output.md | 25 +--- docs/usage.md | 37 ++--- main.nf | 15 +- modules/local/synapse_get/main.nf | 37 ----- modules/local/synapse_get/nextflow.config | 16 --- modules/local/synapse_list/main.nf | 36 ----- modules/local/synapse_list/nextflow.config | 10 -- .../local/synapse_merge_samplesheet/main.nf | 28 ---- .../synapse_merge_samplesheet/nextflow.config | 9 -- modules/local/synapse_show/main.nf | 36 ----- modules/local/synapse_show/nextflow.config | 9 -- modules/local/synapse_to_samplesheet/main.nf | 55 -------- .../synapse_to_samplesheet/nextflow.config | 8 -- nextflow.config | 13 +- nextflow_schema.json | 13 -- .../utils_nfcore_fetchngs_pipeline/main.nf | 130 +----------------- .../tests/main.function.nf.test | 19 --- .../main.workflow_pipeline_completion.test | 14 +- ...n.workflow_pipeline_initialisation.nf.test | 4 +- workflows/synapse/main.nf | 125 ----------------- workflows/synapse/nextflow.config | 5 - 24 files changed, 25 insertions(+), 653 deletions(-) delete mode 100644 conf/test_synapse.config delete mode 100644 modules/local/synapse_get/main.nf delete mode 100644 modules/local/synapse_get/nextflow.config delete mode 100644 modules/local/synapse_list/main.nf delete mode 100644 modules/local/synapse_list/nextflow.config delete mode 100644 modules/local/synapse_merge_samplesheet/main.nf delete mode 100644 modules/local/synapse_merge_samplesheet/nextflow.config delete mode 100644 modules/local/synapse_show/main.nf delete mode 100644 modules/local/synapse_show/nextflow.config delete mode 100644 modules/local/synapse_to_samplesheet/main.nf delete mode 100644 modules/local/synapse_to_samplesheet/nextflow.config delete mode 100644 workflows/synapse/main.nf delete mode 100644 workflows/synapse/nextflow.config diff --git a/README.md b/README.md index bf69b682..f652bd66 100644 --- a/README.md +++ b/README.md @@ -72,13 +72,6 @@ Via a single file of ids, provided one-per-line (see [example input file](https: - Otherwise use [`sra-tools`](https://github.com/ncbi/sra-tools) to download `.sra` files and convert them to FastQ. Use `--force_sratools_download` to force this behaviour. 4. Collate id metadata and paths to FastQ files in a single samplesheet -### Synapse ids - -1. Resolve Synapse directory ids to their corresponding FastQ files ids via the `synapse list` command. -2. Retrieve FastQ file metadata including FastQ file names, md5sums, etags, annotations and other data provenance via the `synapse show` command. -3. Download FastQ files in parallel via `synapse get` -4. Collate paths to FastQ files in a single samplesheet - ## Pipeline output The columns in the output samplesheet can be tailored to be accepted out-of-the-box by selected nf-core pipelines (see [usage docs](https://nf-co.re/fetchngs/usage#samplesheet-format)), these currently include: diff --git a/assets/schema_input.json b/assets/schema_input.json index 13044b1b..db9ffc00 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -9,7 +9,7 @@ "properties": { "": { "type": "string", - "pattern": "^(((SR|ER|DR)[APRSX])|(SAM(N|EA|D))|(PRJ(NA|EB|DB))|(GS[EM])|(syn))(\\d+)$", + "pattern": "^(((SR|ER|DR)[APRSX])|(SAM(N|EA|D))|(PRJ(NA|EB|DB))|(GS[EM]))(\\d+)$", "errorMessage": "Please provide a valid SRA, ENA, DDBJ or GEO identifier" } } diff --git a/conf/test_synapse.config b/conf/test_synapse.config deleted file mode 100644 index 1ac1388a..00000000 --- a/conf/test_synapse.config +++ /dev/null @@ -1,25 +0,0 @@ -/* -======================================================================================== - Nextflow config file for running minimal tests -======================================================================================== - Defines input files and everything required to run a fast and simple pipeline test. - - Use as follows: - nextflow run nf-core/fetchngs -profile test_synapse, - ----------------------------------------------------------------------------------------- -*/ - -params { - config_profile_name = 'Test profile using Synapse ids' - config_profile_description = 'Minimal test dataset to check pipeline function' - - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - - // Input data - input = 'https://raw.githubusercontent.com/nf-core/test-datasets/fetchngs/synapse_ids_test.csv' - input_type = 'synapse' -} diff --git a/docs/output.md b/docs/output.md index aad48fe8..5a27bfca 100644 --- a/docs/output.md +++ b/docs/output.md @@ -8,9 +8,7 @@ This document describes the output produced by the pipeline. The directories lis The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data depending on the type of ids provided: -- Download FastQ files and create samplesheet from: - 1. [SRA / ENA / DDBJ / GEO ids](#sra--ena--ddbj--geo-ids) - 2. [Synapse ids](#synapse-ids) +- Download FastQ files and create samplesheet from [SRA / ENA / DDBJ / GEO ids](#sra--ena--ddbj--geo-ids) - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution Please see the [usage documentation](https://nf-co.re/fetchngs/usage#introduction) for a list of supported public repository identifiers and how to provide them to the pipeline. @@ -36,27 +34,6 @@ Please see the [usage documentation](https://nf-co.re/fetchngs/usage#introductio The final sample information for all identifiers is obtained from the ENA which provides direct download links for FastQ files as well as their associated md5 sums. If download links exist, the files will be downloaded in parallel by FTP. Otherwise they are downloaded using sra-tools. -### Synapse ids - -
-Output files - -- `fastq/` - - `*.fastq.gz`: Paired-end/single-end reads downloaded from Synapse. -- `fastq/md5/` - - `*.md5`: Files containing `md5` sum for FastQ files downloaded from the Synapse platform. -- `samplesheet/` - - `samplesheet.csv`: Auto-created samplesheet with collated metadata and paths to downloaded FastQ files. -- `metadata/` - - `*.metadata.txt`: Original metadata file generated using the `synapse show` command. - - `*.list.txt`: Original output of the `synapse list` command, containing the Synapse ids, file version numbers, file names, and other file-specific data for the Synapse directory ID provided. - -
- -FastQ files and corresponding sample information for `Synapse` identifiers are downloaded in parallel directly from the [Synapse](https://www.synapse.org/#) platform. A [configuration file](http://python-docs.synapse.org/build/html/Credentials.html#use-synapseconfig) containing valid login credentials is required for Synapse downloads. - -The final sample information for the FastQ files downloaded from `Synapse` is obtained from the file name itself. The file names are parsed according to the glob pattern `*{1,2}*`. This returns the sample name, presumed to be the longest possible string matching the glob pattern, with the fewest number of wildcard insertions. Further information on sample name parsing can be found in the [usage documentation](https://nf-co.re/fetchngs/usage#introduction). - ### Pipeline information
diff --git a/docs/usage.md b/docs/usage.md index 42d134f3..06ff7802 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -8,15 +8,15 @@ The pipeline has been set-up to automatically download and process the raw FastQ files from both public and private repositories. Identifiers can be provided in a file, one-per-line via the `--input` parameter. Currently, the following types of example identifiers are supported: -| `SRA` | `ENA` | `DDBJ` | `GEO` | `Synapse` | -| ------------ | ------------ | ------------ | ---------- | ----------- | -| SRR11605097 | ERR4007730 | DRR171822 | GSM4432381 | syn26240435 | -| SRX8171613 | ERX4009132 | DRX162434 | GSE147507 | | -| SRS6531847 | ERS4399630 | DRS090921 | | | -| SAMN14689442 | SAMEA6638373 | SAMD00114846 | | | -| SRP256957 | ERP120836 | DRP004793 | | | -| SRA1068758 | ERA2420837 | DRA008156 | | | -| PRJNA625551 | PRJEB37513 | PRJDB4176 | | | +| `SRA` | `ENA` | `DDBJ` | `GEO` | +| ------------ | ------------ | ------------ | ---------- | +| SRR11605097 | ERR4007730 | DRR171822 | GSM4432381 | +| SRX8171613 | ERX4009132 | DRX162434 | GSE147507 | +| SRS6531847 | ERS4399630 | DRS090921 | | +| SAMN14689442 | SAMEA6638373 | SAMD00114846 | | +| SRP256957 | ERP120836 | DRP004793 | | +| SRA1068758 | ERA2420837 | DRA008156 | | +| PRJNA625551 | PRJEB37513 | PRJDB4176 | | ### SRR / ERR / DRR ids @@ -34,25 +34,6 @@ If you have a GEO accession (found in the data availability section of published This downloads a text file called `SRR_Acc_List.txt` that can be directly provided to the pipeline once renamed with a .csv extension e.g. `--input SRR_Acc_List.csv`. -### Synapse ids - -[Synapse](https://www.synapse.org/#) is a collaborative research platform created by [Sage Bionetworks](https://sagebionetworks.org/). Its aim is to promote reproducible research and responsible data sharing throughout the biomedical community. To download data from `Synapse`, the Synapse id of the _directory_ containing all files to be downloaded should be provided. The Synapse id should be an eleven-characters beginning with `syn`. - -This Synapse id will then be resolved to the Synapse id of the corresponding FastQ files contained within the directory. The individual FastQ files are then downloaded in parellel using the `synapse get` command. All Synapse metadata, annotations and data provenance are also downloaded using the `synapse show` command, and are outputted to a separate metadata file. By default, only the md5sums, file sizes, etags, Synapse ids, file names, and file versions are shown. - -In order to download data from Synapse, an account must be created and a user configuration file provided via the parameter `--synapse_config`. For more information about Synapse configuration, please see the [Synapse client configuration](https://help.synapse.org/docs/Client-Configuration.1985446156.html) documentation. - -The final sample information for the FastQ files used for samplesheet generation is obtained from the file name itself. The file names are parsed according to the glob pattern `*{1,2}*`, which returns the sample name, presumed to be the longest possible string matching the glob pattern, with the fewest number of wildcard insertions. - -
-Supported File Names - -- Files named `SRR493366_1.fastq` and `SRR493366_2.fastq` will have a sample name of `SRR493366` -- Files named `SRR_493_367_1.fastq` and `SRR_493_367_2.fastq` will have a sample name of `SRR_493_367` -- Files named `filename12_1.fastq` and `filename12_2.fastq` will have a sample name of `filename12` - -
- ### Samplesheet format As a bonus, the columns in the auto-created samplesheet can be tailored to be accepted out-of-the-box by selected nf-core pipelines, these currently include: diff --git a/main.nf b/main.nf index d3779d16..b5b499d5 100644 --- a/main.nf +++ b/main.nf @@ -17,8 +17,7 @@ nextflow.enable.dsl = 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -if (params.input_type == 'sra') include { SRA } from './workflows/sra' -if (params.input_type == 'synapse') include { SYNAPSE } from './workflows/synapse' +include { SRA } from './workflows/sra' // // WORKFLOW: Run main nf-core/fetchngs analysis pipeline depending on type of identifier provided @@ -33,15 +32,7 @@ workflow NFCORE_FETCHNGS { // // WORKFLOW: Download FastQ files for SRA / ENA / GEO / DDBJ ids // - if (params.input_type == 'sra') { - SRA ( ids ) - - // - // WORKFLOW: Download FastQ files for Synapse ids - // - } else if (params.input_type == 'synapse') { - SYNAPSE ( ids ) - } + SRA ( ids ) } @@ -69,7 +60,6 @@ workflow { params.monochrome_logs, params.outdir, params.input, - params.input_type, params.ena_metadata_fields ) @@ -84,7 +74,6 @@ workflow { // SUBWORKFLOW: Run completion tasks // PIPELINE_COMPLETION ( - params.input_type, params.email, params.email_on_fail, params.plaintext_email, diff --git a/modules/local/synapse_get/main.nf b/modules/local/synapse_get/main.nf deleted file mode 100644 index c8a6d7a4..00000000 --- a/modules/local/synapse_get/main.nf +++ /dev/null @@ -1,37 +0,0 @@ - -process SYNAPSE_GET { - tag "$meta.id" - label 'process_low' - label 'error_retry' - - conda "bioconda::synapseclient=2.7.1" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/synapseclient:2.7.1--pyh7cba7a3_0' : - 'biocontainers/synapseclient:2.7.1--pyh7cba7a3_0' }" - - input: - val meta - path config - - output: - tuple val(meta), path("*.fastq.gz"), emit: fastq - tuple val(meta), path("*md5") , emit: md5 - path "versions.yml" , emit: versions - - script: - def args = task.ext.args ?: '' - """ - synapse \\ - -c $config \\ - get \\ - $args \\ - $meta.id - - echo "${meta.md5} \t ${meta.name}" > ${meta.id}.md5 - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - synapse: \$(synapse --version | sed -e "s/Synapse Client //g") - END_VERSIONS - """ -} diff --git a/modules/local/synapse_get/nextflow.config b/modules/local/synapse_get/nextflow.config deleted file mode 100644 index 9c42e741..00000000 --- a/modules/local/synapse_get/nextflow.config +++ /dev/null @@ -1,16 +0,0 @@ -process { - withName: 'SYNAPSE_GET' { - publishDir = [ - [ - path: { "${params.outdir}/fastq" }, - mode: params.publish_dir_mode, - pattern: "*.fastq.gz" - ], - [ - path: { "${params.outdir}/fastq/md5" }, - mode: params.publish_dir_mode, - pattern: "*.md5" - ] - ] - } -} diff --git a/modules/local/synapse_list/main.nf b/modules/local/synapse_list/main.nf deleted file mode 100644 index 0c03f8b2..00000000 --- a/modules/local/synapse_list/main.nf +++ /dev/null @@ -1,36 +0,0 @@ - -process SYNAPSE_LIST { - tag "$id" - label 'process_low' - - conda "bioconda::synapseclient=2.7.1" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/synapseclient:2.7.1--pyh7cba7a3_0' : - 'biocontainers/synapseclient:2.7.1--pyh7cba7a3_0' }" - - input: - val id - path config - - output: - path "*.txt" , emit: txt - path "versions.yml", emit: versions - - script: - def args = task.ext.args ?: '' - def args2 = task.ext.args2 ?: '' - """ - synapse \\ - -c $config \\ - list \\ - $args \\ - $id \\ - $args2 \\ - > ${id}.list.txt - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - syanpse: \$(synapse --version | sed -e "s/Synapse Client //g") - END_VERSIONS - """ -} diff --git a/modules/local/synapse_list/nextflow.config b/modules/local/synapse_list/nextflow.config deleted file mode 100644 index 15124234..00000000 --- a/modules/local/synapse_list/nextflow.config +++ /dev/null @@ -1,10 +0,0 @@ -process { - withName: SYNAPSE_LIST { - ext.args = '--long' - publishDir = [ - path: { "${params.outdir}/metadata" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - } -} diff --git a/modules/local/synapse_merge_samplesheet/main.nf b/modules/local/synapse_merge_samplesheet/main.nf deleted file mode 100644 index 4cb2abc3..00000000 --- a/modules/local/synapse_merge_samplesheet/main.nf +++ /dev/null @@ -1,28 +0,0 @@ - -process SYNAPSE_MERGE_SAMPLESHEET { - - conda "conda-forge::sed=4.7" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : - 'nf-core/ubuntu:20.04' }" - - input: - path ('samplesheets/*') - - output: - path "samplesheet.csv", emit: samplesheet - path "versions.yml" , emit: versions - - script: - """ - head -n 1 `ls ./samplesheets/* | head -n 1` > samplesheet.csv - for fileid in `ls ./samplesheets/*`; do - awk 'NR>1' \$fileid >> samplesheet.csv - done - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - sed: \$(echo \$(sed --version 2>&1) | sed 's/^.*GNU sed) //; s/ .*\$//') - END_VERSIONS - """ -} diff --git a/modules/local/synapse_merge_samplesheet/nextflow.config b/modules/local/synapse_merge_samplesheet/nextflow.config deleted file mode 100644 index c94c53dd..00000000 --- a/modules/local/synapse_merge_samplesheet/nextflow.config +++ /dev/null @@ -1,9 +0,0 @@ -process { - withName: SYNAPSE_MERGE_SAMPLESHEET { - publishDir = [ - path: { "${params.outdir}/samplesheet" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - } -} diff --git a/modules/local/synapse_show/main.nf b/modules/local/synapse_show/main.nf deleted file mode 100644 index e1f756a5..00000000 --- a/modules/local/synapse_show/main.nf +++ /dev/null @@ -1,36 +0,0 @@ - -process SYNAPSE_SHOW { - tag "$id" - label 'process_low' - - conda "bioconda::synapseclient=2.7.1" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/synapseclient:2.7.1--pyh7cba7a3_0' : - 'biocontainers/synapseclient:2.7.1--pyh7cba7a3_0' }" - - input: - val id - path config - - output: - path "*.txt" , emit: metadata - path "versions.yml", emit: versions - - script: - def args = task.ext.args ?: '' - def args2 = task.ext.args2 ?: '' - """ - synapse \\ - -c $config \\ - show \\ - $args \\ - $id \\ - $args2 \\ - > ${id}.metadata.txt - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - synapse: \$(synapse --version | sed -e "s/Synapse Client //g") - END_VERSIONS - """ -} diff --git a/modules/local/synapse_show/nextflow.config b/modules/local/synapse_show/nextflow.config deleted file mode 100644 index 5b9167b3..00000000 --- a/modules/local/synapse_show/nextflow.config +++ /dev/null @@ -1,9 +0,0 @@ -process { - withName: 'SYNAPSE_SHOW' { - publishDir = [ - path: { "${params.outdir}/metadata" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] - } -} diff --git a/modules/local/synapse_to_samplesheet/main.nf b/modules/local/synapse_to_samplesheet/main.nf deleted file mode 100644 index 393203de..00000000 --- a/modules/local/synapse_to_samplesheet/main.nf +++ /dev/null @@ -1,55 +0,0 @@ - -process SYNAPSE_TO_SAMPLESHEET { - tag "$meta.id" - - executor 'local' - memory 100.MB - - input: - tuple val(meta), path(fastq) - val pipeline - val strandedness - - output: - tuple val(meta), path("*.csv"), emit: samplesheet - - exec: - - // Remove custom keys - def meta_map = meta.clone() - meta_map.remove("id") - - def fastq_1 = "${params.outdir}/fastq/${fastq}" - def fastq_2 = '' - if (fastq instanceof List && fastq.size() == 2) { - fastq_1 = "${params.outdir}/fastq/${fastq[0]}" - fastq_2 = "${params.outdir}/fastq/${fastq[1]}" - } - - // Add relevant fields to the beginning of the map - pipeline_map = [ - sample : "${meta.id}", - fastq_1 : fastq_1, - fastq_2 : fastq_2 - ] - - // Add nf-core pipeline specific entries - if (pipeline) { - if (pipeline == 'rnaseq') { - pipeline_map << [ strandedness: strandedness ] - } else if (pipeline == 'atacseq') { - pipeline_map << [ replicate: 1 ] - } else if (pipeline == 'taxprofiler') { - pipeline_map << [ fasta: '' ] - } - } - pipeline_map << meta_map - - // Create a samplesheet - samplesheet = pipeline_map.keySet().collect{ '"' + it + '"'}.join(",") + '\n' - samplesheet += pipeline_map.values().collect{ '"' + it + '"'}.join(",") - - // Write samplesheet to file - def samplesheet_file = task.workDir.resolve("${meta.id}.samplesheet.csv") - samplesheet_file.text = samplesheet -} diff --git a/modules/local/synapse_to_samplesheet/nextflow.config b/modules/local/synapse_to_samplesheet/nextflow.config deleted file mode 100644 index 83af86b6..00000000 --- a/modules/local/synapse_to_samplesheet/nextflow.config +++ /dev/null @@ -1,8 +0,0 @@ -process { - withName: SYNAPSE_TO_SAMPLESHEET { - publishDir = [ - path: { "${params.outdir}/samplesheet" }, - enabled: false - ] - } -} diff --git a/nextflow.config b/nextflow.config index 593f9ad0..c9b67f04 100644 --- a/nextflow.config +++ b/nextflow.config @@ -11,12 +11,10 @@ params { // Input options input = null - input_type = 'sra' nf_core_pipeline = null nf_core_rnaseq_strandedness = 'auto' ena_metadata_fields = null sample_mapping_fields = 'experiment_accession,run_accession,sample_accession,experiment_alias,run_alias,sample_alias,experiment_title,sample_title,sample_description' - synapse_config = null force_ftp_download = false force_sratools_download = false skip_fastq_download = false @@ -67,11 +65,7 @@ try { } // Workflow specific configs -if (params.input_type == 'sra') { - includeConfig './workflows/sra/nextflow.config' -} else if (params.input_type == 'synapse') { - includeConfig './workflows/synapse/nextflow.config' -} +includeConfig './workflows/sra/nextflow.config' // Load nf-core/fetchngs custom profiles from different institutions. // Warning: Uncomment only if a pipeline-specific institutional config already exists on nf-core/configs! @@ -173,9 +167,8 @@ profiles { executor.cpus = 4 executor.memory = 8.GB } - test { includeConfig 'conf/test.config' } - test_synapse { includeConfig 'conf/test_synapse.config' } - test_full { includeConfig 'conf/test_full.config' } + test { includeConfig 'conf/test.config' } + test_full { includeConfig 'conf/test_full.config' } } // Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile diff --git a/nextflow_schema.json b/nextflow_schema.json index bdf37cd7..77ae6673 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -22,13 +22,6 @@ "fa_icon": "fas fa-file-excel", "description": "File containing SRA/ENA/GEO/DDBJ identifiers one per line to download their associated metadata and FastQ files." }, - "input_type": { - "type": "string", - "default": "sra", - "description": "Specifies the type of identifier provided via `--input` - available options are 'sra' and 'synapse'.", - "fa_icon": "fas fa-keyboard", - "enum": ["sra", "synapse"] - }, "ena_metadata_fields": { "type": "string", "fa_icon": "fas fa-columns", @@ -90,12 +83,6 @@ "fa_icon": "fas fa-envelope", "help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.", "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" - }, - "synapse_config": { - "type": "string", - "description": "Path to Synapse configuration file", - "fa_icon": "fas fa-users-cog", - "hidden": true } } }, diff --git a/subworkflows/local/utils_nfcore_fetchngs_pipeline/main.nf b/subworkflows/local/utils_nfcore_fetchngs_pipeline/main.nf index 821344d0..7b46200f 100644 --- a/subworkflows/local/utils_nfcore_fetchngs_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_fetchngs_pipeline/main.nf @@ -35,7 +35,6 @@ workflow PIPELINE_INITIALISATION { monochrome_logs // boolean: Do not use coloured log outputs outdir // string: The output directory where the results will be saved input // string: File containing SRA/ENA/GEO/DDBJ identifiers one per line to download their associated metadata and FastQ files - input_type // string: Specifies the type of identifier provided via `--input` - available options are 'sra' and 'synapse' ena_metadata_fields // string: Comma-separated list of ENA metadata fields to fetch before downloading data main: @@ -74,18 +73,10 @@ workflow PIPELINE_INITIALISATION { // Auto-detect input id type // ch_input = file(input) - def inferred_input_type = '' if (isSraId(ch_input)) { - inferred_input_type = 'sra' sraCheckENAMetadataFields(ena_metadata_fields) - } else if (isSynapseId(ch_input)) { - inferred_input_type = 'synapse' } else { - error('Ids provided via --input not recognised please make sure they are either SRA / ENA / GEO / DDBJ or Synapse ids!') - } - - if (input_type != inferred_input_type) { - error("Ids auto-detected as ${inferred_input_type}. Please provide '--input_type ${inferred_input_type}' as a parameter to the pipeline!") + error('Ids provided via --input not recognised please make sure they are either SRA / ENA / GEO / DDBJ ids!') } // Read in ids from --input file @@ -109,7 +100,6 @@ workflow PIPELINE_INITIALISATION { workflow PIPELINE_COMPLETION { take: - input_type // string: 'sra' or 'synapse' email // string: email address email_on_fail // string: email address sent on pipeline failure plaintext_email // boolean: Send plain-text email instead of HTML @@ -135,11 +125,7 @@ workflow PIPELINE_COMPLETION { imNotification(summary_params, hook_url) } - if (input_type == 'sra') { - sraCurateSamplesheetWarn() - } else if (input_type == 'synapse') { - synapseCurateSamplesheetWarn() - } + sraCurateSamplesheetWarn() } } @@ -175,32 +161,6 @@ def isSraId(input) { return is_sra } -// -// Check if input ids are from the Synapse platform -// -def isSynapseId(input) { - def is_synapse = false - def total_ids = 0 - def no_match_ids = [] - def pattern = /^syn\d{8}$/ - input.eachLine { line -> - total_ids += 1 - if (!(line =~ pattern)) { - no_match_ids << line - } - } - - def num_match = total_ids - no_match_ids.size() - if (num_match > 0) { - if (num_match == total_ids) { - is_synapse = true - } else { - error("Mixture of ids provided via --input: ${no_match_ids.join(', ')}\nPlease provide either SRA / ENA / GEO / DDBJ or Synapse ids!") - } - } - return is_synapse -} - // // Check and validate parameters // @@ -226,89 +186,3 @@ def sraCurateSamplesheetWarn() { " running nf-core/other pipelines.\n" + "===================================================================================" } - -// -// Convert metadata obtained from the 'synapse show' command to a Groovy map -// -def synapseShowToMap(synapse_file) { - def meta = [:] - def category = '' - synapse_file.eachLine { line -> - def entries = [null, null] - if (!line.startsWith(' ') && !line.trim().isEmpty()) { - category = line.tokenize(':')[0] - } else { - entries = line.trim().tokenize('=') - } - meta["${category}|${entries[0]}"] = entries[1] - } - meta.id = meta['properties|id'] - meta.name = meta['properties|name'] - meta.md5 = meta['File|md5'] - return meta.findAll{ it.value != null } -} - -// -// Print a warning after pipeline has completed -// -def synapseCurateSamplesheetWarn() { - log.warn "=============================================================================\n" + - " Please double-check the samplesheet that has been auto-created by the pipeline.\n\n" + - " Where applicable, default values will be used for sample-specific metadata\n" + - " such as strandedness, controls etc as this information is not provided\n" + - " in a standardised manner when uploading data to Synapse.\n" + - "===================================================================================" -} - -// -// Obtain Sample ID from File Name -// -def synapseSampleNameFromFastQ(input_file, pattern) { - - def sampleids = "" - - def filePattern = pattern.toString() - int p = filePattern.lastIndexOf('/') - if( p != -1 ) - filePattern = filePattern.substring(p+1) - - input_file.each { - String fileName = input_file.getFileName().toString() - - String indexOfWildcards = filePattern.findIndexOf { it=='*' || it=='?' } - String indexOfBrackets = filePattern.findIndexOf { it=='{' || it=='[' } - if( indexOfWildcards==-1 && indexOfBrackets==-1 ) { - if( fileName == filePattern ) - return actual.getSimpleName() - throw new IllegalArgumentException("Not a valid file pair globbing pattern: pattern=$filePattern file=$fileName") - } - - int groupCount = 0 - for( int i=0; i WorkflowMain.synapseShowToMap(it) } - .set { ch_samples_meta } - - // - // MODULE: Download FastQs by synapse id - // - SYNAPSE_GET ( - ch_samples_meta, - ch_synapse_config - ) - ch_versions = ch_versions.mix(SYNAPSE_GET.out.versions.first()) - - // Combine channels for PE/SE FastQs: [ [ id:SRR6357070, synapse_ids:syn26240474;syn26240477 ], [ fastq_1, fastq_2 ] ] - SYNAPSE_GET - .out - .fastq - .map { meta, fastq -> [ WorkflowMain.synapseSampleNameFromFastQ( fastq , "*{1,2}*"), fastq ] } - .groupTuple(sort: { it -> it.baseName }) - .set { ch_fastq } - - SYNAPSE_GET - .out - .fastq - .map { meta, fastq -> [ WorkflowMain.synapseSampleNameFromFastQ( fastq , "*{1,2}*"), meta.id ] } - .groupTuple() - .join(ch_fastq) - .map { id, synids, fastq -> - def meta = [ id:id, synapse_ids:synids.join(';') ] - [ meta, fastq ] - } - .set { ch_fastq } - - // - // MODULE: Create samplesheet per sample - // - SYNAPSE_TO_SAMPLESHEET ( - ch_fastq, - params.nf_core_pipeline ?: '', - params.nf_core_rnaseq_strandedness ?: 'auto' - ) - - // - // MODULE: Merge samplesheets - // - SYNAPSE_MERGE_SAMPLESHEET ( - SYNAPSE_TO_SAMPLESHEET.out.samplesheet.collect{ it[1] } - ) - ch_versions = ch_versions.mix(SYNAPSE_MERGE_SAMPLESHEET.out.versions) - - // - // Collate and save software versions - // - softwareVersionsToYAML(ch_versions) - .collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_fetchngs_software_mqc_versions.yml', sort: true, newLine: true) - - emit: - fastq = ch_fastq - samplesheet = SYNAPSE_MERGE_SAMPLESHEET.out.samplesheet - versions = ch_versions.unique() -} - -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - THE END -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/ diff --git a/workflows/synapse/nextflow.config b/workflows/synapse/nextflow.config deleted file mode 100644 index ad9c69ed..00000000 --- a/workflows/synapse/nextflow.config +++ /dev/null @@ -1,5 +0,0 @@ -includeConfig "../../modules/local/synapse_get/nextflow.config" -includeConfig "../../modules/local/synapse_to_samplesheet/nextflow.config" -includeConfig "../../modules/local/synapse_list/nextflow.config" -includeConfig "../../modules/local/synapse_merge_samplesheet/nextflow.config" -includeConfig "../../modules/local/synapse_show/nextflow.config" From cd4b6bf98336db21c84c52cb1303a0da5405a5f6 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 1 Feb 2024 10:45:33 +0000 Subject: [PATCH 3/5] Update CHANGELOG --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index b0061622..b54f463c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - The Aspera CLI was recently added to [Bioconda](https://anaconda.org/bioconda/aspera-cli) and we have added it as another way of downloading FastQ files on top of the existing FTP and sra-tools support. In our limited benchmarks on all public Clouds we found ~50% speed-up in download times compared to FTP! We are not aware of any obvious downsides and have made this the default download method in the pipeline. You can however, revert to using FTP and sra-tools using the `--force_ftp_download` and `--force_sratools_download` parameters, respectively. We would love to have your feedback! - Support for Synapse ids has been dropped in this release. We haven't had any feedback from users whether it is being used or not. Users can run earlier versions of the pipeline if required. +- We have significantly refactored and standardised the way we are using nf-test within this pipeline. This pipeline is now the current, best-practice implementation for nf-test usage on nf-core. We required a number of features to be added to nf-test and a huge shoutout to [Lukas Forer](https://github.com/lukfor) for entertaining our requests and implementing them within upstream :heart:! ### Credits @@ -17,6 +18,7 @@ Special thanks to the following for their contributions to the release: - [Adam Talbot](https://github.com/adamrtalbot) - [Alexandru Mizeranschi](https://github.com/nicolae06) - [Alexander Blaessle](https://github.com/alexblaessle) +- [Lukas Forer](https://github.com/lukfor) - [Maxime Garcia](https://github.com/maxulysse) - [Sebastian Uhrig](https://github.com/suhrig) From 76b5fd9ebb0473b9da93bf54d59212504bfd7441 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 1 Feb 2024 10:47:23 +0000 Subject: [PATCH 4/5] Bump pipeline version to 1.12.0 --- CHANGELOG.md | 2 +- nextflow.config | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b54f463c..3807610a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [Unpublished Version / DEV] +## [[1.12.0](https://github.com/nf-core/fetchngs/releases/tag/1.12.0)] - 2024-02-02 ### :warning: Major enhancements diff --git a/nextflow.config b/nextflow.config index c9b67f04..fad805bf 100644 --- a/nextflow.config +++ b/nextflow.config @@ -226,7 +226,7 @@ manifest { description = """Pipeline to fetch metadata and raw FastQ files from public databases""" mainScript = 'main.nf' nextflowVersion = '!>=23.04.0' - version = '1.12.0dev' + version = '1.12.0' doi = 'https://doi.org/10.5281/zenodo.5070524' } From 3e825cca662854cde363876444a2d404f8bd43c9 Mon Sep 17 00:00:00 2001 From: Harshil Patel Date: Thu, 1 Feb 2024 11:15:48 +0000 Subject: [PATCH 5/5] Update CHANGELOG --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 3807610a..750abeff 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -41,6 +41,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements - [PR #261](https://github.com/nf-core/fetchngs/pull/261) - Revert sratools fasterqdump version ([#221](https://github.com/nf-core/fetchngs/issues/221)) - [PR #262](https://github.com/nf-core/fetchngs/pull/262) - Use nf-test version v0.8.4 and remove implicit tags - [PR #263](https://github.com/nf-core/fetchngs/pull/263) - Refine tags used for workflows +- [PR #264](https://github.com/nf-core/fetchngs/pull/264) - Remove synapse workflow from pipeline ### Software dependencies