Skip to content

Commit

Permalink
Merge pull request #832 from ewels/download-dsl2-containers
Browse files Browse the repository at this point in the history
Download: Get DSL2 singularity containers
  • Loading branch information
ewels authored Feb 8, 2021
2 parents bbd26ab + e4a397e commit e95872c
Show file tree
Hide file tree
Showing 6 changed files with 478 additions and 77 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@
### Tools helper code

* Fixed some bugs in the command line interface for `nf-core launch` and improved formatting [[#829](https://github.com/nf-core/tools/pull/829)]
* New functionality for `nf-core download` to make it compatible with DSL2 pipelines [[#832](https://github.com/nf-core/tools/pull/832)]
* Singularity images in module files are now discovered and fetched
* Direct downloads of Singularity images in python allowed (much faster than running `singularity pull`)
* Downloads now work with `$NXF_SINGULARITY_CACHEDIR` so that pipelines sharing containers have efficient downloads

### Linting

Expand Down
78 changes: 58 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,9 +276,11 @@ Do you want to run this command now? [y/N]: n

## Downloading pipelines for offline use

Sometimes you may need to run an nf-core pipeline on a server or HPC system that has no internet connection. In this case you will need to fetch the pipeline files first, then manually transfer them to your system.
Sometimes you may need to run an nf-core pipeline on a server or HPC system that has no internet connection.
In this case you will need to fetch the pipeline files first, then manually transfer them to your system.

To make this process easier and ensure accurate retrieval of correctly versioned code and software containers, we have written a download helper tool. Simply specify the name of the nf-core pipeline and it will be downloaded to your current working directory.
To make this process easier and ensure accurate retrieval of correctly versioned code and software containers, we have written a download helper tool.
Simply specify the name of the nf-core pipeline and it will be downloaded to your current working directory.

By default, the pipeline will download the pipeline code and the [institutional nf-core/configs](https://github.com/nf-core/configs) files.
If you specify the flag `--singularity`, it will also download any singularity image files that are required.
Expand All @@ -297,9 +299,9 @@ $ nf-core download methylseq -r 1.4 --singularity
nf-core/tools version 1.10

INFO Saving methylseq
Pipeline release: 1.4
Pull singularity containers: No
Output file: nf-core-methylseq-1.4.tar.gz
Pipeline release: '1.4'
Pull singularity containers: 'No'
Output file: 'nf-core-methylseq-1.4.tar.gz'
INFO Downloading workflow files from GitHub
INFO Downloading centralised configs from GitHub
INFO Compressing download..
Expand All @@ -311,7 +313,7 @@ The tool automatically compresses all of the resulting file in to a `.tar.gz` ar
You can choose other formats (`.tar.bz2`, `zip`) or to not compress (`none`) with the `-c`/`--compress` flag.
The console output provides the command you need to extract the files.

Once uncompressed, you will see the following file structure for the downloaded pipeline:
Once uncompressed, you will see something like the following file structure for the downloaded pipeline:

```console
$ tree -L 2 nf-core-methylseq-1.4/
Expand All @@ -326,8 +328,6 @@ nf-core-methylseq-1.4
│   ├── nextflow.config
│   ├── nfcore_custom.config
│   └── README.md
├── singularity-images
│   └── nf-core-methylseq-1.4.simg
└── workflow
├── assets
├── bin
Expand All @@ -342,25 +342,63 @@ nf-core-methylseq-1.4
├── nextflow.config
├── nextflow_schema.json
└── README.md

10 directories, 15 files
```

The pipeline files are automatically updated so that the local copy of institutional configs are available when running the pipeline.
The pipeline files are automatically updated (`params.custom_config_base` is set to `../configs`), so that the local copy of institutional configs are available when running the pipeline.
So using `-profile <NAME>` should work if available within [nf-core/configs](https://github.com/nf-core/configs).

You can run the pipeline by simply providing the directory path for the `workflow` folder.
Note that if using Singularity, you will also need to provide the path to the Singularity image.
For example:
You can run the pipeline by simply providing the directory path for the `workflow` folder to your `nextflow run` command.

```bash
nextflow run /path/to/nf-core-methylseq-1.4/workflow/ \
-profile singularity \
-with-singularity /path/to/nf-core-methylseq-1.4/singularity-images/nf-core-methylseq-1.4.simg \
# .. other normal pipeline parameters from here on..
--input '*_R{1,2}.fastq.gz' --genome GRCh38
By default, the download will not run if a target directory or archive already exists. Use the `--force` flag to overwrite / delete any existing download files _(not including those in the Singularity cache directory, see below)_.

### Downloading singularity containers

If you're using Singularity, the `nf-core download` command can also fetch the required Singularity container images for you.
To do this, specify the `--singularity` option.
Your archive / target output directory will then include three folders: `workflow`, `configs` and also `singularity-containers`.

The downloaded workflow files are again edited to add the following line to the end of the pipeline's `nextflow.config` file:

```nextflow
singularity.cacheDir = "${projectDir}/../singularity-images/"
```

This tells Nextflow to use the `singularity-containers` directory relative to the workflow for the singularity image cache directory.
All images should be downloaded there, so Nextflow will use them instead of trying to pull from the internet.

### Singularity cache directory

We highly recommend setting the `$NXF_SINGULARITY_CACHEDIR` environment variable on your system, even if that is a different system to where you will be running Nextflow.

If found, the tool will fetch the Singularity images to this directory first before copying to the target output archive / directory.
Any images previously fetched will be found there and copied directly - this includes images that may be shared with other pipelines or previous pipeline version downloads or download attempts.

If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won't have an internet connection at a later date), you can choose specify `--singularity-cache`.
This instructs `nf-core download` to fetch all Singularity images to the `$NXF_SINGULARITY_CACHEDIR` directory but does _not_ copy them to the workflow archive / directory.
The workflow config file is _not_ edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly.

### How the Singularity image downloads work

The Singularity image download finds containers using two methods:

1. It runs `nextflow config` on the downloaded workflow to look for a `process.container` statement for the whole pipeline.
This is the typical method used for DSL1 pipelines.
2. It scrapes any files it finds with a `.nf` file extension in the workflow `modules` directory for lines
that look like `container = "xxx"`. This is the typical method for DSL2 pipelines, which have one container per process.

Some DSL2 modules have container addresses for docker (eg. `quay.io/biocontainers/fastqc:0.11.9--0`) and also URLs for direct downloads of a Singularity continaer (eg. `https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0`).
Where both are found, the download URL is preferred.

Once a full list of containers is found, they are processed in the following order:

1. If the target image already exists, nothing is done (eg. with `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache` specified)
2. If found in `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache` is _not_ specified, they are copied to the output directory
3. If they start with `http` they are downloaded directly within Python (default 4 at a time, you can customise this with `--parallel-downloads`)
4. If they look like a Docker image name, they are fetched using a `singularity pull` command
* This requires Singularity to be installed on the system and is substantially slower

Note that compressing many GBs of binary files can be slow, so specifying `--compress none` is recommended when downloading Singularity images.

## Pipeline software licences

Sometimes it's useful to see the software licences of the tools used in a pipeline. You can use the `licences` subcommand to fetch and print the software licence from each conda / PyPI package used in an nf-core pipeline.
Expand Down
25 changes: 18 additions & 7 deletions nf_core/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,23 +202,34 @@ def launch(pipeline, id, revision, command_only, params_in, params_out, save_all
@nf_core_cli.command(help_priority=3)
@click.argument("pipeline", required=True, metavar="<pipeline name>")
@click.option("-r", "--release", type=str, help="Pipeline release")
@click.option("-s", "--singularity", is_flag=True, default=False, help="Download singularity containers")
@click.option("-o", "--outdir", type=str, help="Output directory")
@click.option(
"-c",
"--compress",
type=click.Choice(["tar.gz", "tar.bz2", "zip", "none"]),
default="tar.gz",
help="Compression type",
help="Archive compression type",
)
def download(pipeline, release, singularity, outdir, compress):
@click.option("-f", "--force", is_flag=True, default=False, help="Overwrite existing files")
@click.option("-s", "--singularity", is_flag=True, default=False, help="Download singularity images")
@click.option(
"-c",
"--singularity-cache",
is_flag=True,
default=False,
help="Don't copy images to the output directory, don't set 'singularity.cacheDir' in workflow",
)
@click.option("-p", "--parallel-downloads", type=int, default=4, help="Number of parallel image downloads")
def download(pipeline, release, outdir, compress, force, singularity, singularity_cache, parallel_downloads):
"""
Download a pipeline, configs and singularity container.
Download a pipeline, nf-core/configs and pipeline singularity images.
Collects all workflow files and shared configs from nf-core/configs.
Configures the downloaded workflow to use the relative path to the configs.
Collects all files in a single archive and configures the downloaded
workflow to use relative paths to the configs and singularity images.
"""
dl = nf_core.download.DownloadWorkflow(pipeline, release, singularity, outdir, compress)
dl = nf_core.download.DownloadWorkflow(
pipeline, release, outdir, compress, force, singularity, singularity_cache, parallel_downloads
)
dl.download_workflow()


Expand Down
Loading

0 comments on commit e95872c

Please sign in to comment.