-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with inserting known variants with VCF file. #134
Comments
I will have time around the holidays to dig into the code and see what is happening! |
I had the same problem. I didn't look into it too deeply, the superficial issue was that in paired end reads, for some reason read_2's |
Thanks zlian! Will give these changes a go and see if it resolved my issue. |
That's the fix we're thinking as well. We'll hopefully post an updated version soon!
…-Josh
________________________________
From: zlian1758 ***@***.***>
Sent: Tuesday, January 7, 2025 7:09 PM
To: ncsa/NEAT ***@***.***>
Cc: Allen, Josh ***@***.***>; Assign ***@***.***>
Subject: Re: [ncsa/NEAT] Issue with inserting known variants with VCF file. (Issue #134)
I had the same problem. I didn't look into it too deeply, the superficial issue was that in paired end reads, for some reason read_2's self.quality_array is an array of strings such as ['28', '31',...] rather than an array of ints. My temporary solution was to use ... chr(int(x) + ...) on line 344 of read.py
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/ncsa/NEAT/issues/134*issuecomment-2576511942__;Iw!!DZ3fjg!70IvuU-fFeA1sjKsihFMTOIJ6I8W0SDDJRbE5Pdomdzd_Eu_6cdH5ZgYWvf2qRbdsWUAx4M_uRXmnO5GwwKwU8dQo0cOkA$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGMI72YYQOVIXQWVQ4DFFLT2JR3D7AVCNFSM6AAAAABTNIERXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZWGUYTCOJUGI__;!!DZ3fjg!70IvuU-fFeA1sjKsihFMTOIJ6I8W0SDDJRbE5Pdomdzd_Eu_6cdH5ZgYWvf2qRbdsWUAx4M_uRXmnO5GwwKwU8fQaAFECw$>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
I can confirm that the fix suggested by zilan did work. However, it did identify another problem which is that if the vcf file includes any insertions or deletions it will error out and not impliment them with the error message below:
I have tested to see if this same VCF file works with neat v3.4 which it does and inserts both insertions and deletions. The vcf file will work if the insertion/deletion is removed and all that are left are specififc SNPs. I have attached both these vcf files below so you can replcate the issue (please use the same ref file from the original issue). Problem vcf: variants_ins.txt |
Right, that fix will only patch the output, it doesn't fix the underlying problem. We'll try to get a fix as soon as we can.
…________________________________
From: Matthew Bird ***@***.***>
Sent: Friday, January 10, 2025 4:43 AM
To: ncsa/NEAT ***@***.***>
Cc: Allen, Josh ***@***.***>; Assign ***@***.***>
Subject: Re: [ncsa/NEAT] Issue with inserting known variants with VCF file. (Issue #134)
I can confirm that the fix suggested by zilan did work. However, it did identify another problem which is that if the vcf file includes any insertions or deletions it will error out and not impliment them with the error message below:
(neat4) matt:/mnt/c/Users/Matt/Desktop/UKHSA/Projects/Current/UKHSA_TB/amr_syn/neat4$ neat read-simulator -c neat_config.yml -o test_3
NEAT run log: /mnt/c/Users/Matt/Desktop/UKHSA/Projects/Current/UKHSA_TB/amr_syn/neat4/1736505231.2002442_NEAT.log
2025-01-10 10:33:51,620:INFO:neat.common.logging:writing log to: /mnt/c/Users/Matt/Desktop/UKHSA/Projects/Current/UKHSA_TB/amr_syn/neat4/1736505231.2002442_NEAT.log
2025-01-10 10:33:51,620:INFO:neat.read_simulator.runner:Using configuration file neat_config.yml
2025-01-10 10:33:51,622:INFO:neat.read_simulator.runner:Saving output files to .
2025-01-10 10:33:51,628:INFO:neat.read_simulator.utils.options:Run Configuration...
2025-01-10 10:33:51,628:INFO:neat.read_simulator.utils.options:Input fasta: refs/Mycobacterium_tuberculosis_H37Rv.fasta
2025-01-10 10:33:51,628:INFO:neat.read_simulator.utils.options:Producing the following files:
- /mnt/c/Users/Matt/Desktop/UKHSA/Projects/Current/UKHSA_TB/amr_syn/neat4/test_3_r1.fastq.gz
- /mnt/c/Users/Matt/Desktop/UKHSA/Projects/Current/UKHSA_TB/amr_syn/neat4/test_3_r2.fastq.gz
2025-01-10 10:33:51,628:INFO:neat.read_simulator.utils.options:Single threading - 1 thread.
2025-01-10 10:33:51,628:INFO:neat.read_simulator.utils.options:Using a read length of 150
2025-01-10 10:33:51,629:INFO:neat.read_simulator.utils.options:Generating fragments based on mean=300, stand. dev=30
2025-01-10 10:33:51,629:INFO:neat.read_simulator.utils.options:Running in paired-ended mode.
2025-01-10 10:33:51,629:INFO:neat.read_simulator.utils.options:Average coverage: 20
2025-01-10 10:33:51,629:INFO:neat.read_simulator.utils.options:Using default error model.
2025-01-10 10:33:51,629:INFO:neat.read_simulator.utils.options:User defined average sequencing error rate: 0.1.
2025-01-10 10:33:51,629:INFO:neat.read_simulator.utils.options:Ploidy value: 2
2025-01-10 10:33:51,629:INFO:neat.read_simulator.utils.options:Vcf of variants to include: test_dir/variants.vcf
2025-01-10 10:33:51,630:INFO:neat.read_simulator.utils.options:RNG seed value for run: 4652726890925496
2025-01-10 10:33:51,630:INFO:neat.read_simulator.runner:Reading Models...
2025-01-10 10:33:51,630:INFO:neat.read_simulator.runner:Reading refs/Mycobacterium_tuberculosis_H37Rv.fasta.
2025-01-10 10:33:51,735:INFO:neat.read_simulator.runner:Reading input VCF: test_dir/variants.vcf.
2025-01-10 10:33:51,736:INFO:neat.read_simulator.utils.vcf_func:Parsing input vcf test_dir/variants.vcf
2025-01-10 10:33:53,351:INFO:neat.read_simulator.utils.vcf_func:Found 15 variants in input VCF.
2025-01-10 10:33:53,352:INFO:neat.read_simulator.utils.vcf_func:Skipped 0 variants because of multiples at the same location
2025-01-10 10:33:53,352:INFO:neat.read_simulator.utils.vcf_func:Skipped 0 variants because of a mismatch between Ref and reference.
2025-01-10 10:33:53,677:INFO:neat.read_simulator.runner:Beginning simulation.
2025-01-10 10:33:53,788:INFO:neat.read_simulator.runner:Generating variants for ChrI
2025-01-10 10:33:53,902:INFO:neat.read_simulator.utils.generate_variants:Finished generating random mutations in 0.00 minutes
2025-01-10 10:33:53,903:INFO:neat.read_simulator.utils.generate_variants:Added 0 mutations to ChrI
2025-01-10 10:33:53,903:INFO:neat.read_simulator.utils.generate_reads:Sampling reads...
2025-01-10 10:34:38,842:ERROR:neat:read-simulator failed, see the traceback below
Traceback (most recent call last):
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/cli/cli.py", line 131, in main
cmd(args)
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/cli/commands/read_simulator.py", line 47, in execute
read_simulator_runner(arguments.config, arguments.output)
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/read_simulator/runner.py", line 313, in read_simulator_runner
read1_fastq_paired, read1_fastq_single, read2_fastq_paired, read2_fastq_single = generate_reads(
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/read_simulator/utils/generate_reads.py", line 345, in generate_reads
read_1.finalize_read_and_write(
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 342, in finalize_read_and_write
self.apply_variants_for_final_output(qual_model, rng)
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 261, in apply_variants_for_final_output
self.apply_mutations(list(quality_model.quality_scores), rng)
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 224, in apply_mutations
reference_length = variant_to_apply.get_ref_len()
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/variants/unknown_variant.py", line 62, in get_ref_len
return len(self.metadata['REF'])
KeyError: 'REF'
ERROR: read-simulator failed, showing the last error
Traceback (most recent call last):
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/cli/cli.py", line 131, in main
cmd(args)
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/cli/commands/read_simulator.py", line 47, in execute
read_simulator_runner(arguments.config, arguments.output)
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/read_simulator/runner.py", line 313, in read_simulator_runner
read1_fastq_paired, read1_fastq_single, read2_fastq_paired, read2_fastq_single = generate_reads(
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/read_simulator/utils/generate_reads.py", line 345, in generate_reads
read_1.finalize_read_and_write(
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 342, in finalize_read_and_write
self.apply_variants_for_final_output(qual_model, rng)
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 261, in apply_variants_for_final_output
self.apply_mutations(list(quality_model.quality_scores), rng)
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/read_simulator/utils/read.py", line 224, in apply_mutations
reference_length = variant_to_apply.get_ref_len()
File "/home/matt/anaconda3/envs/neat4/lib/python3.10/site-packages/neat/variants/unknown_variant.py", line 62, in get_ref_len
return len(self.metadata['REF'])
KeyError: 'REF'
I have tested to see if this same VCF file works with neat v3.4 which it does and inserts both insertions and deletions. The vcf file will work if the insertion/deletion is removed and all that are left are specififc SNPs. I have attached both these vcf files below so you can replcate the issue (please use the same ref file from the original issue).
Problem vcf: variants_ins.txt<https://urldefense.com/v3/__https://github.com/user-attachments/files/18375602/variants_ins.txt__;!!DZ3fjg!_NZwUUc2r9X5HlYD01rSb98I05GKQ90F90Ulz_xjvbErTXoL4UwFXD7oiX0NHtK1C9B_5vIdweE6oIkqeBVUmek4VjdHAQ$>
Working vcf: variants_works.txt<https://urldefense.com/v3/__https://github.com/user-attachments/files/18375613/variants_works.txt__;!!DZ3fjg!_NZwUUc2r9X5HlYD01rSb98I05GKQ90F90Ulz_xjvbErTXoL4UwFXD7oiX0NHtK1C9B_5vIdweE6oIkqeBVUmekoFVTAVA$>
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/ncsa/NEAT/issues/134*issuecomment-2582393994__;Iw!!DZ3fjg!_NZwUUc2r9X5HlYD01rSb98I05GKQ90F90Ulz_xjvbErTXoL4UwFXD7oiX0NHtK1C9B_5vIdweE6oIkqeBVUmenPGKfinw$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGMI724JYTK6MHVECMYOBUT2J6P6DAVCNFSM6AAAAABTNIERXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBSGM4TGOJZGQ__;!!DZ3fjg!_NZwUUc2r9X5HlYD01rSb98I05GKQ90F90Ulz_xjvbErTXoL4UwFXD7oiX0NHtK1C9B_5vIdweE6oIkqeBVUmenQydNQrw$>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
Describe the bug
I can't introduce know variants into the sunthetic fastq files using a VCF file. I have also tried using the VCF and other test data in the data folder and also get the same error. I can see that it starts to sample reads for a split second getting to maybe 2-3% then the error message appears. Unsure its to do with my VCF file or my install of NEAT as i am able to produce snthetic reads without a VCF file input.
To Reproduce
The VCF file variants_bug.txt
The Config file: config_bug.txt
The log file: 1733915460.8927002_NEAT_bug.txt
The ref file: Mycobacterium_tuberculosis_H37Rv_bug.txt
Expected behavior
Two paired-end fastq files with 17 known variants inserted.
Error message
Conda enviroment:
The text was updated successfully, but these errors were encountered: