-
Notifications
You must be signed in to change notification settings - Fork 1
03. GENERA structure
The GENERA tools are all developed in Nextflow, using singularity images.
Nextflow is a highly used workflow system. It allows to write a computational workflow by merging all bioinformatic steps together. The most common scripting languages can be embedded together into a single Nextflow script. For the users of the GENERA project, a script is not different from another bioinformatic tools. All GENERA Nextflow scripts can be executed with one single command.
In practice, a Nextflow script is divided into parts called processes. A process corresponds to a series of related commands. The file and variable are transmitted between the processes using channels.
executor > local (6)
[34/bee8e8] process > Taxonomy (1) [100%] 1 of 1 ✔
[6d/2c81bd] process > RefSeq (1) [100%] 1 of 1 ✔
[c4/cb3690] process > GenBank (1) [100%] 1 of 1 ✔
[42/e9131a] process > GetGenomesRefseq (1) [100%] 1 of 1 ✔
[ba/b200bf] process > GetGenomesGenbank (1) [100%] 1 of 1 ✔
[7d/b3b754] process > CombineGenomes (1) [100%] 1 of 1 ✔
Completed at: 08-nov.-2021 15:10:14
Duration : 8m 35s
CPU hours : 0.2
Succeeded : 6
During the run of the Nextflow script, a directory called work will be created. The process will be executed there, the numbers specified in the log correspond to subdirectories within the work directory.
ls -lrt work/
drwxrwxr-x 3 lcornet lcornet 1 nov 8 14:45 c1
drwxrwxr-x 3 lcornet lcornet 1 nov 8 14:53 d3
drwxrwxr-x 3 lcornet lcornet 1 nov 8 14:53 3c
drwxrwxr-x 3 lcornet lcornet 1 nov 8 14:55 b7
drwxrwxr-x 3 lcornet lcornet 1 nov 8 14:58 6e
drwxrwxr-x 3 lcornet lcornet 1 nov 8 15:01 34
drwxrwxr-x 3 lcornet lcornet 1 nov 8 15:01 c4
drwxrwxr-x 3 lcornet lcornet 1 nov 8 15:01 6d
drwxrwxr-x 3 lcornet lcornet 1 nov 8 15:03 42
drwxrwxr-x 3 lcornet lcornet 1 nov 8 15:06 ba
drwxrwxr-x 3 lcornet lcornet 1 nov 8 15:09 7d
A singularity container is a box with programs installed inside, let's called it an image. These images can be executed on any operating system where singularity is installed, like NIC5. This means that if a bioinformatic program is installed on a singularity image, you can use it without having to install it.
singularity exec /scratch/ulg/GENERA/Nextflow-scripts/Genome-downloader.sif ali2fasta.pl
On HPC system, a part to associate (bind) the current directory with the container should frequently be added. The binding looks like this:
singularity exec --bind /scratch/ulg/bioec/lcornet/GENOME-DW/:/mnt /home/ulg/bioec/lcornet/GENERA/Genome-downloader.sif ali2fasta.pl
Although the singularity images can be executed alone, Nextflow offers the possibility to use these images as operating systems. One or multiple singularity images are associated with each GENERA script. To this end, a file called nextflow.config is used to make the connection between nextflow and singularity.
process.container = '/scratch/ulg/GENERA/Nextflow-scripts/Genome-downloader.sif'
singularity.enabled = true
singularity.cacheDir = "$PWD"
singularity.autoMounts = false
singularity.runOptions = '-B /scratch/ulg/bioec/lcornet/GENOME-DW -B /scratch/users/l/c/lcornet/GENOME-DW'
The last line of the file corresponds to the binding option of singularity.
The singularity images and the Nextflow scripts are available on NIC5, in a shared folder.
/scratch/ulg/GENERA
-rw-r--r-- 1 lcornet lcornet 3971080192 jui 27 06:57 antismash-6.0.1.sif
-rwxr-xr-x 1 lcornet lcornet 3438620672 sep 15 16:52 smrtlink-tools_pbipa.sif
-rwxr-xr-x 1 lcornet lcornet 580304896 nov 5 18:01 Genome-downloader.sif
drwxrwxr-x 2 lcornet lcornet 111 nov 8 14:29 Nextflow-scripts
lcornet@nic5-login1 ~/GENERA/Nextflow-scripts $ ls -lrt
-rw-r--r-- 1 lcornet lcornet 9905 nov 8 13:46 Genome-downloader.nf
-rw-rw-r-- 1 lcornet lcornet 213 nov 8 14:30 Genome-downloader.config
-rw-rw-r-- 1 lcornet lcornet 290 nov 8 14:31 Genome-downloader.job