Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Util folder missing (prepare curated library) #530

Open
weilu1998 opened this issue Dec 26, 2024 · 2 comments
Open

Util folder missing (prepare curated library) #530

weilu1998 opened this issue Dec 26, 2024 · 2 comments

Comments

@weilu1998
Copy link

Hi,

Thanks for developing this pipeline! I am trying to prepare a curated library for EDTA run based on Dfam curated families. However, I am a bit confused about the class/super family naming requirement, "EDTA/util/TE_Sequence_Ontology.txt" in wiki no longer exist. Where can I find the file with EDTA supported class and super family names?

Also I am wondering would you recommend to use Dfam genus specific (non-model organisms) families as curated library for EDTA?

Thanks,
Wei

@oushujun
Copy link
Owner

oushujun commented Dec 26, 2024 via email

@weilu1998
Copy link
Author

Hi Wei, Until has been migrated to bin, sorry for the confusion. You may use the Dfam library as input for EDTA, but they may not be super helpful if the sequences are highly divergent. Shujun

Thank you Shujun for the quick reply! If I understand the merging rule 80-80-60 correctly, a slightly more divergent Dfam library shouldn't be detrimental, is that correct? The worst case scenario is that many of the provided Dfam curations not being used. In this case, I am working on the repeat annotation of a butterfly, can I use the whole lepidoptera (~600) + drosophila Dfam (~200) curated libraries?

I also have a unrelated question. Can I filter cds region in an ad-hoc way (start EDTA run without providing cds.fa)? I think the gene annotation might be a bit problematic for my species, for example some cds annotations are actually rDNA. Any suggestions on how to look for potentially misidentified cds in the repeat library?

Thanks,
Wei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants