Skip to content

03. GENERA structure

pierrebecker edited this page May 16, 2022 · 3 revisions

GENERA structure

GEN

The GENERA tools are all developed in Nextflow, using singularity images.

1. What is Nextflow

Nextflow is a highly used workflow system. It allows to write a computational workflow by merging all bioinformatic steps together. The most common scripting languages can be embedded together into a single Nextflow script. For the users of the GENERA project, a script is not different from another bioinformatic tools. All GENERA Nextflow scripts can be executed with one single command.

https://www.nextflow.io

In practice, a Nextflow script is divided into parts called processes. A process corresponds to a series of related commands. The file and variable are transmitted between the processes using channels.

1.1 Nextflow log

executor >  local (6)
[34/bee8e8] process > Taxonomy (1)          [100%] 1 of 1 ✔
[6d/2c81bd] process > RefSeq (1)            [100%] 1 of 1 ✔
[c4/cb3690] process > GenBank (1)           [100%] 1 of 1 ✔
[42/e9131a] process > GetGenomesRefseq (1)  [100%] 1 of 1 ✔
[ba/b200bf] process > GetGenomesGenbank (1) [100%] 1 of 1 ✔
[7d/b3b754] process > CombineGenomes (1)    [100%] 1 of 1 ✔
Completed at: 08-nov.-2021 15:10:14
Duration    : 8m 35s
CPU hours   : 0.2
Succeeded   : 6

1.2 Nextflow directories

During the run of the Nextflow script, a directory called work will be created. The process will be executed there, the numbers specified in the log correspond to subdirectories within the work directory.

ls -lrt work/
drwxrwxr-x 3 lcornet lcornet 1 nov  8 14:45 c1
drwxrwxr-x 3 lcornet lcornet 1 nov  8 14:53 d3
drwxrwxr-x 3 lcornet lcornet 1 nov  8 14:53 3c
drwxrwxr-x 3 lcornet lcornet 1 nov  8 14:55 b7
drwxrwxr-x 3 lcornet lcornet 1 nov  8 14:58 6e
drwxrwxr-x 3 lcornet lcornet 1 nov  8 15:01 34
drwxrwxr-x 3 lcornet lcornet 1 nov  8 15:01 c4
drwxrwxr-x 3 lcornet lcornet 1 nov  8 15:01 6d
drwxrwxr-x 3 lcornet lcornet 1 nov  8 15:03 42
drwxrwxr-x 3 lcornet lcornet 1 nov  8 15:06 ba
drwxrwxr-x 3 lcornet lcornet 1 nov  8 15:09 7d

2. What is Singularity

A singularity container is a box with programs installed inside, let's called it an image. These images can be executed on any operating system where singularity is installed, like NIC5. This means that if a bioinformatic program is installed on a singularity image, you can use it without having to install it.

singularity exec /scratch/ulg/GENERA/Nextflow-scripts/Genome-downloader.sif ali2fasta.pl

On HPC system, a part to associate (bind) the current directory with the container should frequently be added. The binding looks like this:

singularity exec --bind /scratch/ulg/bioec/lcornet/GENOME-DW/:/mnt /home/ulg/bioec/lcornet/GENERA/Genome-downloader.sif ali2fasta.pl

3. How to use Singularity with Nextflow

Although the singularity images can be executed alone, Nextflow offers the possibility to use these images as operating systems. One or multiple singularity images are associated with each GENERA script. To this end, a file called nextflow.config is used to make the connection between nextflow and singularity.

3.1 Nextflow config file

process.container = '/scratch/ulg/GENERA/Nextflow-scripts/Genome-downloader.sif'
singularity.enabled = true
singularity.cacheDir = "$PWD"
singularity.autoMounts = false
singularity.runOptions = '-B /scratch/ulg/bioec/lcornet/GENOME-DW -B /scratch/users/l/c/lcornet/GENOME-DW'

The last line of the file corresponds to the binding option of singularity.

4. GENERA on NIC5

The singularity images and the Nextflow scripts are available on NIC5, in a shared folder.

4.1 Shared folder

/scratch/ulg/GENERA
-rw-r--r-- 1 lcornet lcornet 3971080192 jui 27 06:57 antismash-6.0.1.sif
-rwxr-xr-x 1 lcornet lcornet 3438620672 sep 15 16:52 smrtlink-tools_pbipa.sif
-rwxr-xr-x 1 lcornet lcornet  580304896 nov  5 18:01 Genome-downloader.sif
drwxrwxr-x 2 lcornet lcornet        111 nov  8 14:29 Nextflow-scripts
lcornet@nic5-login1 ~/GENERA/Nextflow-scripts $ ls -lrt
    -rw-r--r-- 1 lcornet lcornet 9905 nov  8 13:46 Genome-downloader.nf
    -rw-rw-r-- 1 lcornet lcornet  213 nov  8 14:30 Genome-downloader.config
    -rw-rw-r-- 1 lcornet lcornet  290 nov  8 14:31 Genome-downloader.job