Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the EDTA cannot process a genome containing too many sequences? #129

Closed
AntetokounmJie opened this issue Nov 3, 2020 · 4 comments
Closed
Labels
duplicate This issue or pull request already exists

Comments

@AntetokounmJie
Copy link

AntetokounmJie commented Nov 3, 2020

Hi shujun
I run EDTA on test file and the Arabidopsis genome sucessfully.
Then i wanna run it on a maize genome that contain both chromosome sequence and unanchored sequence. The total sequence number is about 700.
The command is "EDTA.pl --genome genome.fa --species Maize --anno 1 --threads 32 &>log &".
Then i got this error message:
########################################################

Extensive de-novo TE Annotator (EDTA) v1.9.4
Shujun Ou ([email protected])

########################################################

Mon Nov 2 08:33:11 CST 2020 Dependency checking:
All passed!

Mon Nov 2 08:33:52 CST 2020 Obtain raw TE libraries using various structure-based programs:
Mon Nov 2 08:33:52 CST 2020 EDTA_raw: Check dependencies, prepare working directories.

Mon Nov 2 08:34:22 CST 2020 Start to find LTR candidates.

Mon Nov 2 08:34:22 CST 2020 Identify LTR retrotransposon candidates from scratch.

Mon Nov 2 22:46:40 CST 2020 Finish finding LTR candidates.

Mon Nov 2 22:46:40 CST 2020 Start to find TIR candidates.

Mon Nov 2 22:46:40 CST 2020 Identify TIR candidates from scratch.

Species: Maize
cat: '10080.fa': No such file or directory
cat: '11680.fa': No such file or directory
cat: '11780.fa': No such file or directory
...
cat: '66780.fa': No such file or directory
cat: '66880.fa': No such file or directory
cat: '66980.fa': No such file or directory
cat: '67080.fa': No such file or directory
cat: '67180.fa': No such file or directory
terminate called after throwing an instance of 'terminate called after throwing an instance of 'terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::system_errorterminate called after throwing an instance of 'std::system_errorstd::system_errorstd::system_error'
std::system_error'
'
'
'
what(): what(): what(): Resource temporarily unavailable what(): what(): Resource temporarily unavailableResource temporarily unavailableterminate called after throwing an instance of '
Resource temporarily unavailableResource temporarily unavailable

terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::system_error
terminate called after throwing an instance of '
std::system_errorstd::system_error'
std::system_error'
'
'
what(): Resource temporarily unavailable what():
Resource temporarily unavailable
terminate called after throwing an instance of 'std::system_error'
...
terminate called after throwing an instance of 'std::system_error'
...
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable

Do you have any idea about it?
Maybe i should just run it on chromosome sequence? But what about unanchored sequence?

best regards

@AntetokounmJie AntetokounmJie changed the title Does the EDTA cannot process a genome containing too many sequence? Does the EDTA cannot process a genome containing too many sequences? Nov 3, 2020
@sanyalab
Copy link

sanyalab commented Nov 3, 2020

Hi Shujun,

I face a similar issue when EDTA_raw.pl is run on a genome with too many sequences.

I was running it on a genome with 128647 contigs. Total genome size 30G. While the helitron and LTR modules went to completion (2 and 6 days respectively, edta v1.9.0), the TIR run exited, as memory consumption was too much. I had allocated 100G mem for each of the three predictions (ltr, tir and helitron) following the "divide and conquer" technique. However consumption for the TIR run was more than 250G of memory.

I know that I can chunk the genome into 1 gig files and run them independently in EDTA_raw.pl. This will probably be OK. I am not sure how I will consolidate the results in a way that a subsequent run of EDTA.pl does not give me any errors.

  1. Which result files of a EDTA_raw.pl run for "TIR" "Helitron" and "LTR" does EDTA.pl use?
  2. What protocol do I follow to properly concatenate result files each from a TIR, Helitron and LTR run of EDTA_raw.pl

Thanks
Abhijit

@oushujun oushujun added the duplicate This issue or pull request already exists label Nov 3, 2020
@oushujun
Copy link
Owner

oushujun commented Nov 3, 2020

@Zea1nfO #51

@oushujun
Copy link
Owner

oushujun commented Nov 3, 2020

@sanyalab #61

@AntetokounmJie
Copy link
Author

AntetokounmJie commented Nov 3, 2020

Hi shujun

Sorry about duplicating the issue.
I check all the users` processes on my service.
Then i find one of them almost take all the memory.
So i think the insuffcient memory is the real cause.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

3 participants