Does the EDTA cannot process a genome containing too many sequences? #129

AntetokounmJie · 2020-11-03T08:30:14Z

Hi shujun
I run EDTA on test file and the Arabidopsis genome sucessfully.
Then i wanna run it on a maize genome that contain both chromosome sequence and unanchored sequence. The total sequence number is about 700.
The command is "EDTA.pl --genome genome.fa --species Maize --anno 1 --threads 32 &>log &".
Then i got this error message:
########################################################

Extensive de-novo TE Annotator (EDTA) v1.9.4

Shujun Ou ([email protected])

########################################################

Mon Nov 2 08:33:11 CST 2020 Dependency checking:
All passed!

Mon Nov 2 08:33:52 CST 2020 Obtain raw TE libraries using various structure-based programs:
Mon Nov 2 08:33:52 CST 2020 EDTA_raw: Check dependencies, prepare working directories.

Mon Nov 2 08:34:22 CST 2020 Start to find LTR candidates.

Mon Nov 2 08:34:22 CST 2020 Identify LTR retrotransposon candidates from scratch.

Mon Nov 2 22:46:40 CST 2020 Finish finding LTR candidates.

Mon Nov 2 22:46:40 CST 2020 Start to find TIR candidates.

Mon Nov 2 22:46:40 CST 2020 Identify TIR candidates from scratch.

Species: Maize
cat: '10080.fa': No such file or directory
cat: '11680.fa': No such file or directory
cat: '11780.fa': No such file or directory
...
cat: '66780.fa': No such file or directory
cat: '66880.fa': No such file or directory
cat: '66980.fa': No such file or directory
cat: '67080.fa': No such file or directory
cat: '67180.fa': No such file or directory
terminate called after throwing an instance of 'terminate called after throwing an instance of 'terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::system_errorterminate called after throwing an instance of 'std::system_errorstd::system_errorstd::system_error'
std::system_error'
'
'
'
what(): what(): what(): Resource temporarily unavailable what(): what(): Resource temporarily unavailableResource temporarily unavailableterminate called after throwing an instance of '
Resource temporarily unavailableResource temporarily unavailable

terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::system_error
terminate called after throwing an instance of '
std::system_errorstd::system_error'
std::system_error'
'
'
what(): Resource temporarily unavailable what():
Resource temporarily unavailable
terminate called after throwing an instance of 'std::system_error'
...
terminate called after throwing an instance of 'std::system_error'
...
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable

Do you have any idea about it?
Maybe i should just run it on chromosome sequence? But what about unanchored sequence?

best regards

The text was updated successfully, but these errors were encountered:

sanyalab · 2020-11-03T09:03:27Z

Hi Shujun,

I face a similar issue when EDTA_raw.pl is run on a genome with too many sequences.

I was running it on a genome with 128647 contigs. Total genome size 30G. While the helitron and LTR modules went to completion (2 and 6 days respectively, edta v1.9.0), the TIR run exited, as memory consumption was too much. I had allocated 100G mem for each of the three predictions (ltr, tir and helitron) following the "divide and conquer" technique. However consumption for the TIR run was more than 250G of memory.

I know that I can chunk the genome into 1 gig files and run them independently in EDTA_raw.pl. This will probably be OK. I am not sure how I will consolidate the results in a way that a subsequent run of EDTA.pl does not give me any errors.

Which result files of a EDTA_raw.pl run for "TIR" "Helitron" and "LTR" does EDTA.pl use?
What protocol do I follow to properly concatenate result files each from a TIR, Helitron and LTR run of EDTA_raw.pl

Thanks
Abhijit

oushujun · 2020-11-03T09:19:41Z

@Zea1nfO #51

oushujun · 2020-11-03T09:21:24Z

@sanyalab #61

AntetokounmJie · 2020-11-03T12:18:20Z

Hi shujun

Sorry about duplicating the issue.
I check all the users` processes on my service.
Then i find one of them almost take all the memory.
So i think the insuffcient memory is the real cause.

Best regards

AntetokounmJie changed the title ~~Does the EDTA cannot process a genome containing too many sequence?~~ Does the EDTA cannot process a genome containing too many sequences? Nov 3, 2020

oushujun added the duplicate This issue or pull request already exists label Nov 3, 2020

oushujun closed this as completed Nov 3, 2020

oushujun mentioned this issue Nov 3, 2020

How to run EDTA in large genomes (>10Gb)? #61

Closed

oushujun mentioned this issue Dec 6, 2020

System_error: Resource temporarily unavailable #139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the EDTA cannot process a genome containing too many sequences? #129

Does the EDTA cannot process a genome containing too many sequences? #129

AntetokounmJie commented Nov 3, 2020 •

edited

Loading

sanyalab commented Nov 3, 2020

oushujun commented Nov 3, 2020

oushujun commented Nov 3, 2020

AntetokounmJie commented Nov 3, 2020 •

edited

Loading

Does the EDTA cannot process a genome containing too many sequences? #129

Does the EDTA cannot process a genome containing too many sequences? #129

Comments

AntetokounmJie commented Nov 3, 2020 • edited Loading

Extensive de-novo TE Annotator (EDTA) v1.9.4

Shujun Ou ([email protected])

sanyalab commented Nov 3, 2020

oushujun commented Nov 3, 2020

oushujun commented Nov 3, 2020

AntetokounmJie commented Nov 3, 2020 • edited Loading

AntetokounmJie commented Nov 3, 2020 •

edited

Loading

AntetokounmJie commented Nov 3, 2020 •

edited

Loading