- use one or more references
- control adaptaters and the insert size
- adjust the exact number of sequences
- adapt the error model coming from your sequencer
- eventually add SNPs to introduce diversity
- generate around 1000 sequences by second
conda install -c bioconda hmnrandomread
HmnRandomRead \
-r/--reference <string, required><double, required> \
-r1/--read-forward <string, required> \
-r2/--read-reverse <string, required> \
-lengthReads/--length-reads <int, optional, 150> \
-meanInsert/--mean-insert-size <int, optional, 500> \
-stdInsert/--std-insert-size <int, optional, 50> \
-profileDiversity/--profile-diversity <string, optional> \
-profileError/--profile-error <string, optional> \
-profileErrorId/--profile-error-id <string, optional> \
-s/--seed <int, optional, 0>
Use one or more FASTA file used as reference sequence. Indicate also the number of sequence to generate for each reference.
read-forward
and read-reverse
are required, gzip compressed or not.
length-reads
: the size of the library as sequenced by the sequencer
mean-insert-size
and std-insert-size
: the gaussian parameters to represent the fragment size.
profile-diversity
a CSV file, comma separated, with header:
- identifier: ID of the fasta file
- Mutation Rate: probability to change the sequence
- Indel Fraction: rate of indel compare to single mutation
- Indel Extend: probability to extend the indel at each base added
- Maximum Insertion Size: maximal size of insertion The header is mandatory.
profile-error
a CSV file, comma separated, with header:
- identifier: an ID choose by
profile-error-id
- sequencer: name of the sequencer
- flowcell: kind of flowcell
- version: the kit version.
- strand:
forward
orreverse
- cycles total: by strand
- error by cycle: rate of error by cycle, semi-colon separated. Equal to the number of
cycles total
. The header is mandatory.
pytest is required:
make test
- htslib - Write FASTQ files
- Guillaume Gricourt