Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split genome into several sequences #142

Closed
AntetokounmJie opened this issue Dec 28, 2020 · 12 comments
Closed

Split genome into several sequences #142

AntetokounmJie opened this issue Dec 28, 2020 · 12 comments
Labels
question Further information is requested

Comments

@AntetokounmJie
Copy link

AntetokounmJie commented Dec 28, 2020

Hi shujun
Due to some reason, i wanna do TE annotation in this way:
a) split large genome into several sequences
b) run EDTA_raw.pl with each sequence for each type(ltr, helitron, tir)
c) combine all sequences` raw results to generate the large genome`s all three type raw result.
d) then run EDTA.pl to finish the whole genome TE annotation.
Can it works?
Thanks a lot.

@oushujun
Copy link
Owner

oushujun commented Dec 30, 2020 via email

@AntetokounmJie
Copy link
Author

AntetokounmJie commented Dec 31, 2020

Hi shujun
I have test the method mentioned above on a SGE system. But it encounter some error message.
After i submit the job, the EDTA_raw.pl (ltr) is good at first.
However ,when it came to LTR_retriever step, the error message came out.
The error is just like this:
sh: fork: retry: No child processes
Can't fork, trying again in 5 seconds at ${path to conda}/anaconda3/envs/edta/share/LTR_retriever/bin/align_flanking.pl line 76.
sh: fork: Resource temporarily unavailable

Besides , i try EDTA_raw.pl (ltr) with a 10Mb sequence on SGE system before, but there was no error message. This time i try it with a sequence which is about 200Mb, the error message came out.

Do you got any idea to solve it?
Thanks a lot.

@oushujun
Copy link
Owner

oushujun commented Jan 3, 2021 via email

@AntetokounmJie
Copy link
Author

There seems to be a fork issue - try to lower the CPU number to avoid system resource drainage. Also I checked the LTR_retriever code, it requires the input sequence order (-genome) matching the candidate sequence order (-inharvest), so providing separate runs of LTRharvest/LTR_FINDER for LTR_retrieve may confuse the program and make it run into errors. You may run LTR_retriever separately for these batches and concatenate their results to mock EDTA_raw. Best, Shujun

On Thu, Dec 31, 2020 at 4:38 PM Zea1nfO @.***> wrote: Hi shujun I have test the method mentioned above on a SGE system. But it encounter some error message. After i submit the job, the EDTA_raw.pl (ltr) is good at first. However ,when it came to LTR_retriever step, the error message came out. The error is just like this: sh: fork: retry: No child processes Can't fork, trying again in 5 seconds at ${path to conda}/anaconda3/envs/edta/share/LTR_retriever/bin/align_flanking.pl http://align_flanking.pl line 76. sh: fork: Resource temporarily unavailable Do you got any idea to solve it? Thanks a lot. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#142 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NHBISLOZH6FM43ZIA3SXQZ6VANCNFSM4VL66IGQ .

OK, i will try it as you advised.

@oushujun oushujun added the question Further information is requested label Jan 9, 2021
@AntetokounmJie
Copy link
Author

Hi shujun
I try it several times, but the same error still came out. I dont know what the cause is. : (
Maybe the SGE system i use got some strange settings.

@oushujun
Copy link
Owner

Did you successfully run a normal LTR_retriever without splitting the genome before? You need to confirm the program is running properly, then try on different experiments. Also, I would suggest splitting a small genome (ie. Arabidopsis) into two and test on these files first, before using your large genome files.

Best,
Shujun

@AntetokounmJie
Copy link
Author

AntetokounmJie commented Jan 22, 2021

Hi shujun
I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system.
Besides, i encounter two strange erorrs recently:
a) "Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330."
I see this error in v1.9.4 and v1.9.6, but it seems that this error doesn`t affect the all process to generate result.
b)"Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl line 190."(in new v1.9.6)
It seems that the new cleanup_nested.pl got some flaws.

Sorry about so many questions, hope to help make EDTA to be better.
Thanks a lot.

@oushujun
Copy link
Owner

Hello,

Thanks for reporting the errors. It seems like they are random and quite rare, and as you mentioned, it would generate results despite these random errors. Because these errors are on a single sequence-basis, which will not have a huge impact on the overall annotation if not occurred in all sequences. I will leave them for the moment unless more reports show up.

For your split genome experiments, can you describe your processes in more detail? Also, testing on an interactive node locally may better help to find the cause.

Best,
Shujun

@oushujun
Copy link
Owner

Please find more discussions in #175.

@C-grapes
Copy link

Hi shujun
I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system.
Besides, i encounter two strange erorrs recently:
a) "Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330."
I see this error in v1.9.4 and v1.9.6, but it seems that this error doesn`t affect the all process to generate result.
b)"Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl line 190."(in new v1.9.6)
It seems that the new cleanup_nested.pl got some flaws.

Sorry about so many questions, hope to help make EDTA to be better.
Thanks a lot.

Hi,Shujun,

Hi shujun
I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system.
Besides, i encounter two strange erorrs recently:
a) "Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330."
I see this error in v1.9.4 and v1.9.6, but it seems that this error doesn`t affect the all process to generate result.
b)"Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl line 190."(in new v1.9.6)
It seems that the new cleanup_nested.pl got some flaws.

Sorry about so many questions, hope to help make EDTA to be better.
Thanks a lot.

Hi Shuju,
I ran into the same problem as mentioned here. I installed noarch/edta-1.9.6-0.tar.bz2 and edta-1.9.6-hdfd78af_2.tar.bz2 through conda. When I first started running the package edta-1.9.6-hdfd78af_2.tar.bz2, it went smoothly without any errors. Later, I switched to the conda environment of 1.9.6.0 once, but I switched back to the conda environment of 1.9.6.2 again. When running, there is always this error:
Thread 16 terminated abnormally: substr outside of string at ${path}/test2_edta-1.9.6-hdfd78af_2/share/EDTA/util/cleanup_nested.pl line 190.
Use of uninitialized value $seq_new in substr at ${path}/test2_edta-1.9.6-hdfd78af_2/share/EDTA/util/cleanup_nested.pl line 190.
I don’t know where is the problem? Also, is there a big difference between these two packages? I found that the results of their running did not seem to be very different.

Best wish!
Putao

@oushujun
Copy link
Owner

oushujun commented May 21, 2021 via email

@C-grapes
Copy link

Hi putao, There aren't big differences in the later updates, or at least you won't see big differences Most of the time if your case is not applicable for the improvements. Please check out the release note for more details. If you can reproduce an error with the latest version, then I can take a look at your case. Best, Shujun

On Fri, May 21, 2021 at 11:39 AM C-grapes @.***> wrote: Hi shujun I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system. Besides, i encounter two strange erorrs recently: a) "Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330." I see this error in v1.9.4 and v1.9.6, but it seems that this error doesnt affect the all process to generate result. b)"*Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl <http://cleanup_nested.pl> line 190.*"(in new v1.9.6) It seems that the new cleanup_nested.pl got some flaws. Sorry about so many questions, hope to help make EDTA to be better. Thanks a lot. Hi,Shujun, Hi shujun I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system. Besides, i encounter two strange erorrs recently: a) "*Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330.*" I see this error in v1.9.4 and v1.9.6, but it seems that this error doesnt affect the all process to generate result. b)"Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl http://cleanup_nested.pl line 190."(in new v1.9.6) It seems that the new cleanup_nested.pl got some flaws. Sorry about so many questions, hope to help make EDTA to be better. Thanks a lot. Hi Shuju, I ran into the same problem as mentioned here. I installed noarch/edta-1.9.6-0.tar.bz2 and edta-1.9.6-hdfd78af_2.tar.bz2 through conda. When I first started running the package edta-1.9.6-hdfd78af_2.tar.bz2, it went smoothly without any errors. Later, I switched to the conda environment of 1.9.6.0 once, but I switched back to the conda environment of 1.9.6.2 again. When running, there is always this error: Thread 16 terminated abnormally: substr outside of string at ${path}/test2_edta-1.9.6-hdfd78af_2/share/EDTA/util/cleanup_nested.pl line 190. Use of uninitialized value $seq_new in substr at ${path}/test2_edta-1.9.6-hdfd78af_2/share/EDTA/util/cleanup_nested.pl line 190. I don’t know where is the problem? Also, is there a big difference between these two packages? I found that the results of their running did not seem to be very different. Best wish! Putao — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#142 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NC2YBRMIN2BLX2DNDTTOXIWJANCNFSM4VL66IGQ .

Thank you for your reply.I'll try it as you said .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants