Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iss generate does not simulate the number of reads requested #195

Closed
sergioSEa opened this issue Nov 2, 2020 · 3 comments
Closed

iss generate does not simulate the number of reads requested #195

sergioSEa opened this issue Nov 2, 2020 · 3 comments
Assignees
Labels

Comments

@sergioSEa
Copy link

Dear Hadrien,

Thank you very much for developing this tool.
I had been using it for some time when I realised that the number of reads that the tool is generating does not correspond to the number of reads I specify with n_reads.
I did realised about this issue using some complex designs, but I still see the same behaviour in the simpler cases.
For instance, let's use this reference genome as an example: https://www.ncbi.nlm.nih.gov/assembly/GCF_005845145.1

Here we have multiple contigs, so in principle I would like to use the "--draft" command, but we can also assume that each of the contigs is in fact a complete assembly, and provide the "--genome" command. The result is the same.

Here there are some examples:

Using the bioconda installation (v1.5.1)

iss generate --n_reads 1000 --draft Bacteroides_ovatus.fa  --model hiseq --output TEST_sim --seed 10 --cpus 1
wc -l TEST_sim_R1.fastq
1936

(assuming that the 1,000 reads are divided between the two files, I would expect to get 2,000 lines)

If I repeat the process with the --genome command:

wc -l TEST_sim_R1.fastq
1916

I wondered if it was due to the conda version, so I tried directly with the Git repository.

python InSilicoSeq/iss/__main__.py -n_reads 1000 --draft Bacteroides_ovatus.fa  --model hiseq --output TEST_sim --seed 10 --cpus 1 
wc -l TEST_sim_R1.fastq
1936

This is an example in which there were not that many reads missing, but I have run this over multiple files and I find a bigger percentage of missing reads.
Any clue of what is going on? Also tried with --gc_bias and there is no difference.

Here there is the list of installed python packages, which should be the only dependence. Python itself is v3.7.7

pip list:

Package      Version
------------ -------------------
biopython    1.77
brotlipy     0.7.0
certifi      2020.6.20
cffi         1.14.2
chardet      3.0.4
cryptography 3.0
future       0.18.2
idna         2.10
InSilicoSeq  1.5.1
joblib       0.16.0
mkl-fft      1.1.0
mkl-random   1.1.1
mkl-service  2.3.0
numpy        1.19.1
pip          20.2.2
pycparser    2.20
pyOpenSSL    19.1.0
pysam        0.15.3
PySocks      1.7.1
requests     2.24.0
scipy        1.5.2
setuptools   49.6.0.post20200814
six          1.15.0
urllib3      1.25.10
wheel        0.35.1

Thank you very much for your help,

Cheers,

Sergio.

@HadrienG HadrienG self-assigned this Nov 2, 2020
@HadrienG
Copy link
Owner

HadrienG commented Nov 2, 2020

Hi!

I haven't been able to reproduce the bug yet. Is the number of reads you get wrong every time?
Can you re-run your example with the --debug flag and post the output?

Best,
/Hadrien

@sergioSEa
Copy link
Author

sergioSEa commented Nov 3, 2020

Hi Hadrien,

Thanks for the fast answer. The number I get is always below the amount I specified.

Find attached the information I get using the --debug option. It seems that is only generating 410 paired-end reads instead of 500.

DEBUG_info.txt

Thanks for your help!

Sergio.

EDIT: This is the code I ran:
iss generate --n_reads 1000 --draft Bacteroides_ovatus.fa --model hiseq --output TEST_sim_R1.fastq --seed 10 --cpus 1 --debug 2> DEBUG_info.txt

@HadrienG
Copy link
Owner

Hi!

This will be fixed in the next minor release 🎉

@HadrienG HadrienG added the bug label Jan 29, 2021
@HadrienG HadrienG mentioned this issue Jan 29, 2021
@HadrienG HadrienG closed this as completed Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants