Split genome into several sequences #142

AntetokounmJie · 2020-12-28T12:24:42Z

Hi shujun
Due to some reason, i wanna do TE annotation in this way:
a) split large genome into several sequences
b) run EDTA_raw.pl with each sequence for each type(ltr, helitron, tir)
c) combine all sequences` raw results to generate the large genome`s all three type raw result.
d) then run EDTA.pl to finish the whole genome TE annotation.
Can it works?
Thanks a lot.

oushujun · 2020-12-30T09:46:04Z

Good idea. You may check the EDTA_raw code to find the check points and mock the raw results before the check point, then it will pick them up as if they were generated from one run. Let me know if it works! Best, Shujun

…

On Mon, Dec 28, 2020 at 8:24 PM Zea1nfO ***@***.***> wrote: Hi shujun Due to some reason, i wanna do TE annotation in this way: a) split large genome into several sequences b) run EDTA_raw.pl with each sequence for each type(ltr, helitron, tir) c) combine all sequences raw results to generate the large genomes all three type raw result. d) then run EDTA.pl to finish the whole genome TE annotation. Can it works? Thanks a lot. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#142>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABNX4NEBPQXUIFTTHE35OUDSXB2JPANCNFSM4VL66IGQ> .

AntetokounmJie · 2020-12-31T08:37:50Z

Hi shujun
I have test the method mentioned above on a SGE system. But it encounter some error message.
After i submit the job, the EDTA_raw.pl (ltr) is good at first.
However ,when it came to LTR_retriever step, the error message came out.
The error is just like this:
sh: fork: retry: No child processes
Can't fork, trying again in 5 seconds at ${path to conda}/anaconda3/envs/edta/share/LTR_retriever/bin/align_flanking.pl line 76.
sh: fork: Resource temporarily unavailable

Besides , i try EDTA_raw.pl (ltr) with a 10Mb sequence on SGE system before, but there was no error message. This time i try it with a sequence which is about 200Mb, the error message came out.

Do you got any idea to solve it?
Thanks a lot.

oushujun · 2021-01-03T02:11:57Z

There seems to be a fork issue - try to lower the CPU number to avoid system resource drainage. Also I checked the LTR_retriever code, it requires the input sequence order (-genome) matching the candidate sequence order (-inharvest), so providing separate runs of LTRharvest/LTR_FINDER for LTR_retrieve may confuse the program and make it run into errors. You may run LTR_retriever separately for these batches and concatenate their results to mock EDTA_raw. Best, Shujun

…

On Thu, Dec 31, 2020 at 4:38 PM Zea1nfO ***@***.***> wrote: Hi shujun I have test the method mentioned above on a SGE system. But it encounter some error message. After i submit the job, the EDTA_raw.pl (ltr) is good at first. However ,when it came to LTR_retriever step, the error message came out. The error is just like this: *sh: fork: retry: No child processes Can't fork, trying again in 5 seconds at ${path to conda}/anaconda3/envs/edta/share/LTR_retriever/bin/align_flanking.pl <http://align_flanking.pl> line 76. sh: fork: Resource temporarily unavailable* Do you got any idea to solve it? Thanks a lot. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#142 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABNX4NHBISLOZH6FM43ZIA3SXQZ6VANCNFSM4VL66IGQ> .

AntetokounmJie · 2021-01-03T14:23:57Z

There seems to be a fork issue - try to lower the CPU number to avoid system resource drainage. Also I checked the LTR_retriever code, it requires the input sequence order (-genome) matching the candidate sequence order (-inharvest), so providing separate runs of LTRharvest/LTR_FINDER for LTR_retrieve may confuse the program and make it run into errors. You may run LTR_retriever separately for these batches and concatenate their results to mock EDTA_raw. Best, Shujun
…
On Thu, Dec 31, 2020 at 4:38 PM Zea1nfO @.***> wrote: Hi shujun I have test the method mentioned above on a SGE system. But it encounter some error message. After i submit the job, the EDTA_raw.pl (ltr) is good at first. However ,when it came to LTR_retriever step, the error message came out. The error is just like this: sh: fork: retry: No child processes Can't fork, trying again in 5 seconds at ${path to conda}/anaconda3/envs/edta/share/LTR_retriever/bin/align_flanking.pl http://align_flanking.pl line 76. sh: fork: Resource temporarily unavailable Do you got any idea to solve it? Thanks a lot. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#142 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NHBISLOZH6FM43ZIA3SXQZ6VANCNFSM4VL66IGQ .

OK, i will try it as you advised.

AntetokounmJie · 2021-01-19T01:37:45Z

Hi shujun
I try it several times, but the same error still came out. I dont know what the cause is. : (
Maybe the SGE system i use got some strange settings.

oushujun · 2021-01-19T03:40:55Z

Did you successfully run a normal LTR_retriever without splitting the genome before? You need to confirm the program is running properly, then try on different experiments. Also, I would suggest splitting a small genome (ie. Arabidopsis) into two and test on these files first, before using your large genome files.

Best,
Shujun

AntetokounmJie · 2021-01-22T07:37:38Z

Hi shujun
I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system.
Besides, i encounter two strange erorrs recently:
a) "Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330."
I see this error in v1.9.4 and v1.9.6, but it seems that this error doesn`t affect the all process to generate result.
b)"Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl line 190."(in new v1.9.6)
It seems that the new cleanup_nested.pl got some flaws.

Sorry about so many questions, hope to help make EDTA to be better.
Thanks a lot.

oushujun · 2021-01-27T16:51:13Z

Hello,

Thanks for reporting the errors. It seems like they are random and quite rare, and as you mentioned, it would generate results despite these random errors. Because these errors are on a single sequence-basis, which will not have a huge impact on the overall annotation if not occurred in all sequences. I will leave them for the moment unless more reports show up.

For your split genome experiments, can you describe your processes in more detail? Also, testing on an interactive node locally may better help to find the cause.

Best,
Shujun

oushujun · 2021-04-19T08:55:10Z

Please find more discussions in #175.

C-grapes · 2021-05-21T03:39:01Z

Hi shujun
I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system.
Besides, i encounter two strange erorrs recently:
a) "Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330."
I see this error in v1.9.4 and v1.9.6, but it seems that this error doesn`t affect the all process to generate result.
b)"Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl line 190."(in new v1.9.6)
It seems that the new cleanup_nested.pl got some flaws.

Sorry about so many questions, hope to help make EDTA to be better.
Thanks a lot.

Hi，Shujun,

Hi shujun
I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system.
Besides, i encounter two strange erorrs recently:
a) "Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330."
I see this error in v1.9.4 and v1.9.6, but it seems that this error doesn`t affect the all process to generate result.
b)"Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl line 190."(in new v1.9.6)
It seems that the new cleanup_nested.pl got some flaws.

Sorry about so many questions, hope to help make EDTA to be better.
Thanks a lot.

Hi Shuju,
I ran into the same problem as mentioned here. I installed noarch/edta-1.9.6-0.tar.bz2 and edta-1.9.6-hdfd78af_2.tar.bz2 through conda. When I first started running the package edta-1.9.6-hdfd78af_2.tar.bz2, it went smoothly without any errors. Later, I switched to the conda environment of 1.9.6.0 once, but I switched back to the conda environment of 1.9.6.2 again. When running, there is always this error:
Thread 16 terminated abnormally: substr outside of string at ${path}/test2_edta-1.9.6-hdfd78af_2/share/EDTA/util/cleanup_nested.pl line 190.
Use of uninitialized value $seq_new in substr at ${path}/test2_edta-1.9.6-hdfd78af_2/share/EDTA/util/cleanup_nested.pl line 190.
I don’t know where is the problem? Also, is there a big difference between these two packages? I found that the results of their running did not seem to be very different.

Best wish!
Putao

oushujun · 2021-05-21T03:45:38Z

Hi putao, There aren't big differences in the later updates, or at least you won't see big differences Most of the time if your case is not applicable for the improvements. Please check out the release note for more details. If you can reproduce an error with the latest version, then I can take a look at your case. Best, Shujun

…

On Fri, May 21, 2021 at 11:39 AM C-grapes ***@***.***> wrote: Hi shujun I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system. Besides, i encounter two strange erorrs recently: a) "*Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330.*" I see this error in v1.9.4 and v1.9.6, but it seems that this error doesn`t affect the all process to generate result. b)"*Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl <http://cleanup_nested.pl> line 190.*"(in new v1.9.6) It seems that the new cleanup_nested.pl got some flaws. Sorry about so many questions, hope to help make EDTA to be better. Thanks a lot. Hi，Shujun, Hi shujun I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system. Besides, i encounter two strange erorrs recently: a) "*Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330.*" I see this error in v1.9.4 and v1.9.6, but it seems that this error doesn`t affect the all process to generate result. b)"*Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl <http://cleanup_nested.pl> line 190.*"(in new v1.9.6) It seems that the new cleanup_nested.pl got some flaws. Sorry about so many questions, hope to help make EDTA to be better. Thanks a lot. Hi Shuju, I ran into the same problem as mentioned here. I installed noarch/edta-1.9.6-0.tar.bz2 and edta-1.9.6-hdfd78af_2.tar.bz2 through conda. When I first started running the package edta-1.9.6-hdfd78af_2.tar.bz2, it went smoothly without any errors. Later, I switched to the conda environment of 1.9.6.0 once, but I switched back to the conda environment of 1.9.6.2 again. When running, there is always this error: Thread 16 terminated abnormally: substr outside of string at ${path}/test2_edta-1.9.6-hdfd78af_2/share/EDTA/util/cleanup_nested.pl line 190. Use of uninitialized value $seq_new in substr at ${path}/test2_edta-1.9.6-hdfd78af_2/share/EDTA/util/cleanup_nested.pl line 190. I don’t know where is the problem? Also, is there a big difference between these two packages? I found that the results of their running did not seem to be very different. Best wish! Putao — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#142 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABNX4NC2YBRMIN2BLX2DNDTTOXIWJANCNFSM4VL66IGQ> .

C-grapes · 2021-05-21T03:53:08Z

Hi putao, There aren't big differences in the later updates, or at least you won't see big differences Most of the time if your case is not applicable for the improvements. Please check out the release note for more details. If you can reproduce an error with the latest version, then I can take a look at your case. Best, Shujun
…
On Fri, May 21, 2021 at 11:39 AM C-grapes @.***> wrote: Hi shujun I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system. Besides, i encounter two strange erorrs recently: a) "Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330." I see this error in v1.9.4 and v1.9.6, but it seems that this error doesnt affect the all process to generate result. b)"*Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl <http://cleanup_nested.pl> line 190.*"(in new v1.9.6) It seems that the new cleanup_nested.pl got some flaws. Sorry about so many questions, hope to help make EDTA to be better. Thanks a lot. Hi，Shujun, Hi shujun I had tried the EDTA_raw.pl with a 10 Mb genome. And there was no error. But when i try it with larger genome(such as 100Mb), the error came out. And i am sure about that i give it enough memory when i submit the job on the SGE system. Besides, i encounter two strange erorrs recently: a) "*Use of uninitialized value $lLTR_length in string ne at ${path to EDTA}/EDTA-1.9.6/util/rename_LTR_skim.pl line 29, line 20330.*" I see this error in v1.9.4 and v1.9.6, but it seems that this error doesnt affect the all process to generate result. b)"Thread 27 terminated abnormally: substr outside of string at ${path to EDTA}/EDTA-1.9.6/util/cleanup_nested.pl http://cleanup_nested.pl line 190."(in new v1.9.6) It seems that the new cleanup_nested.pl got some flaws. Sorry about so many questions, hope to help make EDTA to be better. Thanks a lot. Hi Shuju, I ran into the same problem as mentioned here. I installed noarch/edta-1.9.6-0.tar.bz2 and edta-1.9.6-hdfd78af_2.tar.bz2 through conda. When I first started running the package edta-1.9.6-hdfd78af_2.tar.bz2, it went smoothly without any errors. Later, I switched to the conda environment of 1.9.6.0 once, but I switched back to the conda environment of 1.9.6.2 again. When running, there is always this error: Thread 16 terminated abnormally: substr outside of string at ${path}/test2_edta-1.9.6-hdfd78af_2/share/EDTA/util/cleanup_nested.pl line 190. Use of uninitialized value $seq_new in substr at ${path}/test2_edta-1.9.6-hdfd78af_2/share/EDTA/util/cleanup_nested.pl line 190. I don’t know where is the problem? Also, is there a big difference between these two packages? I found that the results of their running did not seem to be very different. Best wish! Putao — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#142 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NC2YBRMIN2BLX2DNDTTOXIWJANCNFSM4VL66IGQ .

Thank you for your reply.I'll try it as you said .

oushujun added the question Further information is requested label Jan 9, 2021

oushujun closed this as completed Apr 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split genome into several sequences #142

Split genome into several sequences #142

AntetokounmJie commented Dec 28, 2020 •

edited

Loading

oushujun commented Dec 30, 2020 via email

AntetokounmJie commented Dec 31, 2020 •

edited

Loading

oushujun commented Jan 3, 2021 via email

AntetokounmJie commented Jan 3, 2021

AntetokounmJie commented Jan 19, 2021

oushujun commented Jan 19, 2021

AntetokounmJie commented Jan 22, 2021 •

edited

Loading

oushujun commented Jan 27, 2021

oushujun commented Apr 19, 2021

C-grapes commented May 21, 2021

oushujun commented May 21, 2021 via email

C-grapes commented May 21, 2021

Split genome into several sequences #142

Split genome into several sequences #142

Comments

AntetokounmJie commented Dec 28, 2020 • edited Loading

oushujun commented Dec 30, 2020 via email

AntetokounmJie commented Dec 31, 2020 • edited Loading

oushujun commented Jan 3, 2021 via email

AntetokounmJie commented Jan 3, 2021

AntetokounmJie commented Jan 19, 2021

oushujun commented Jan 19, 2021

AntetokounmJie commented Jan 22, 2021 • edited Loading

oushujun commented Jan 27, 2021

oushujun commented Apr 19, 2021

C-grapes commented May 21, 2021

oushujun commented May 21, 2021 via email

C-grapes commented May 21, 2021

AntetokounmJie commented Dec 28, 2020 •

edited

Loading

AntetokounmJie commented Dec 31, 2020 •

edited

Loading

AntetokounmJie commented Jan 22, 2021 •

edited

Loading