Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory error outputting vcf #139

Open
giobus75 opened this issue Feb 3, 2025 · 4 comments
Open

Memory error outputting vcf #139

giobus75 opened this issue Feb 3, 2025 · 4 comments

Comments

@giobus75
Copy link

giobus75 commented Feb 3, 2025

Hi,
I was checking the fix #138 of the issue #126 .
I ran the read simulator (using the same command as specified in issue #126 with the NEAT version 4.2.8. The process generated a 2.4MB VCF file containing data for chr1, but it failed with a MemoryError while writing the output. This occurred despite the machine having 378GB RAM.

Log Excerpt (Final Lines)

2025-01-24 09:39:07,008:INFO:neat.read_simulator.runner:Generating variants for HLA-DRB1*16:02:01
2025-01-24 09:39:07,015:INFO:neat.read_simulator.utils.generate_variants:Finished generating random mutations in 0.00 minutes
2025-01-24 09:39:07,015:INFO:neat.read_simulator.utils.generate_variants:Added 10 mutations to HLA-DRB1*16:02:01
2025-01-24 09:39:07,015:INFO:neat.read_simulator.utils.generate_reads:Sampling reads...
2025-01-24 09:39:07,497:INFO:neat.read_simulator.utils.generate_reads:Contig fastq(s) written in: 0.01 m
2025-01-24 09:39:07,497:INFO:neat.read_simulator.utils.generate_reads:Finished sampling reads in 0.01 m
2025-01-24 09:39:07,498:INFO:neat.read_simulator.runner:Outputting golden vcf: /home/neat/simulated_stuff_golden.vcf.gz
2025-01-30 01:20:32,838:ERROR:neat:read-simulator failed, see the traceback below
Traceback (most recent call last):
  File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/cli/cli.py", line 131, in main
    cmd(args)
  File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/cli/commands/read_simulator.py", line 47, in execute
    read_simulator_runner(arguments.config, arguments.output)
  File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/runner.py", line 339, in read_simulator_runner
    output_file_writer.write_final_vcf(local_variant_files, reference_index)
  File "/opt/conda/envs/neat/lib/python3.10/site-packages/neat/read_simulator/utils/output_file_writer.py", line 163, in write_final_vcf
    ref, alt = variants.get_ref_alt(variant, reference[contig])
  File "/opt/conda/envs/neat/lib/python3.10/site-packages/Bio/File.py", line 227, in __getitem__
    record = self._proxy.get(self._offsets[key])
  File "/opt/conda/envs/neat/lib/python3.10/site-packages/Bio/SeqIO/_index.py", line 52, in get
    return next(self._iterator(StringIO(self.get_raw(offset).decode())))
MemoryError

Question

Is it expected that writing results consumes so much memory? Could this be a bug or an inefficiency in the output handling?

Thank you!

@joshfactorial
Copy link
Collaborator

joshfactorial commented Feb 3, 2025 via email

@giobus75
Copy link
Author

giobus75 commented Feb 4, 2025

Ok, I'm gonna try the approach you suggested to get one chromosome per run.
Do you have an idea about the time needed to address this memory problem?
Thank you

@joshfactorial
Copy link
Collaborator

No current timeframe, as we have no current funding for this project. Getting it to run faster and more efficiently is our top priority, though.

@joshfactorial
Copy link
Collaborator

Other options: you can try NEAT3, which is closer in structure to the original version, or NEAT2 (requires Python 2.X), the original. They are faster and a little more reliable than NEAT4 is proving to be. Check our release page for the older versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants