Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High LTR unknown percentage #537

Open
rajatpruthi4 opened this issue Jan 14, 2025 · 1 comment
Open

High LTR unknown percentage #537

rajatpruthi4 opened this issue Jan 14, 2025 · 1 comment
Labels
question Further information is requested

Comments

@rajatpruthi4
Copy link

rajatpruthi4 commented Jan 14, 2025

Hi Shujun

Thanks for creating EDTA, keep doing the great work. I wanted to share some observations from my analysis. I have been running EDTA on several red raspberry assemblies and have noticed an unusually high percentage of LTR unknown repeats. Interestingly, most of these repeats seem to be concentrated in the centromeric regions. This trend is consistent across more than 100 genome assemblies.

Could you please provide your insights on why I am observing this unusually high percentage of LTR unknown repeats? I have attached the .sum output and TE density plot to this issue for your reference.
EDTA.TEanno.density_plots.pdf

<style> </style>
Repeat Classes
==============
Total Sequences: 11
Total Length: 287904378bp
ClassCountbpMasked%masked
=========================
LINE------
I4834273360.15%
L1501023769530.83%
LTR------
Copia11758118014254.10%
Gypsy14344209197987.27%
unknown862796777183523.54%
SINE------
tRNA382401060.01%
TIR------
CACTA939542954591.49%
Mutator1939155517031.93%
PIF_Harbinger1435442245401.47%
Tc1_Mariner3431183580.04%
hAT1280744137021.53%
nonLTR------
pararetrovirus27241090.01%
nonTIR------
helitron34547148556215.16%
repeat_fragment1907251250361.78%
total interspersed22819214194598149.30%
snRNA7883310.00%
Total22827014195431249.31%
@oushujun
Copy link
Owner

Thanks for sharing your results. One reason is that these LTR/unknown are misannotated from centromeric repeats. You will need to curate them to make sure. Highly recommend using TEtrimmer for this purpose. You may read the paper for more information about the tool usage and intepretation. Another reason - they are real LTR elements with unknown classifications - again, curation will tell you more.

Thanks!
Shujun

@oushujun oushujun added the question Further information is requested label Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants