EDTA_raw.pl run with -type tir is glitchy #135

sanyalab · 2020-11-27T05:24:49Z

Hi Shujun,

I find that when the EDTA_raw.pl is run with the -type tir, the success of the run is erratic. Recently I ran the script with the following Resource specifications.
1) First Run
Memory: 70000
CPUs: 36
Chromosomes in genome: 20
Genome Size: 2.8G
Result: Success

2) Second Run
Memory: 70000
CPUs: 36
Scaffolds in Genome: 894
Genome Size: 2.8G
Result: Fail, Insufficient memory

It seems to me that the assembly quality plays a big role in the success of failure of the TIR module. I do not face this with LTR or Helitron predictions. Is it possible that the genome gets loaded 36 times, therefore the memory overshoots? Probably if I ran with 16 processors, the time required would increase but the memory will be under check

Thanks
Abhijit

oushujun · 2020-12-01T09:21:27Z

Hi Abhijit,

Nice benchmark! This was observed before, that small scaffolds will significantly increase memory usage, but finally, there is a comparable benchmark.

The three components were written in different programming languages by different authors: LTR - Perl; Helitron - Java; TIR - Python. I am not very familiar with Python, but I suspect it's CPU and memory management is not as efficient as others. You may check the code of TIR-Learner and let me know if you pinpoint the issue.

Best,
Shujun

oushujun · 2021-04-19T08:59:33Z

#175 provides some workarounds.

oushujun closed this as completed Apr 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EDTA_raw.pl run with -type tir is glitchy #135

EDTA_raw.pl run with -type tir is glitchy #135

sanyalab commented Nov 27, 2020

oushujun commented Dec 1, 2020

oushujun commented Apr 19, 2021

EDTA_raw.pl run with -type tir is glitchy #135

EDTA_raw.pl run with -type tir is glitchy #135

Comments

sanyalab commented Nov 27, 2020

oushujun commented Dec 1, 2020

oushujun commented Apr 19, 2021