Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't find LTR and NO SINE on Split Genome #510

Open
rr9002 opened this issue Oct 11, 2024 · 2 comments
Open

can't find LTR and NO SINE on Split Genome #510

rr9002 opened this issue Oct 11, 2024 · 2 comments
Labels
question Further information is requested

Comments

@rr9002
Copy link

rr9002 commented Oct 11, 2024

First of all, thank you for providing such an excellent tool for TE annotation. I’m currently using EDTA v2.2.1 to annotate transposable elements for a large genome, GCA_014155895.2 (~16G). Due to its size, I’ve split the genome by chromosomes into 9 parts, each processed separately with EDTA. Below is the command I’m using for each split:

perl ../EDTA/EDTA.pl --genome new.part_002.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1

I encountered the following issues:

  1. Issue with LTR detection: When running EDTA on chromosome 2 (new.part_002.fasta, 2.0G), I received an error, and no LTR elements were detected. Could you please advise on why this may be happening for this specific chromosome?
The start time is: 2024-10-09 21:29:56 
My job ID is: 15283037 
The total cores is: 64 
The hosts is: 
i05r3n18:64

#########################################################
##### Extensive de-novo TE Annotator (EDTA) v2.2.1  #####
##### Shujun Ou ([email protected])             #####
#########################################################

Parameters: --genome new.part_002.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1


Wed Oct  9 21:30:14 CST 2024	Dependency checking:
				All passed!

Wed Oct  9 21:31:50 CST 2024	The longest sequence ID in the genome contains 87 characters, which is longer than the limit (13)
				Trying to reformat seq IDs...
				Attempt 1...
Wed Oct  9 21:32:24 CST 2024	Seq ID conversion successful!

Wed Oct  9 21:32:24 CST 2024	Obtain raw TE libraries using various structure-based programs: 

Wed Oct  9 21:32:24 CST 2024	EDTA_raw: Check dependencies, prepare working directories.

Wed Oct  9 21:32:49 CST 2024	Start to find LTR candidates.

Wed Oct  9 21:32:49 CST 2024	Identify LTR retrotransposon candidates from scratch.

Out of memory!
Out of memory!
cat: new.part_002.fasta.mod.harvest.combine.scn: No such file or directory
cat: new.part_002.fasta.mod.finder.combine.scn: No such file or directory
grep: new.part_002.fasta.mod.retriever.scn: No such file or directory
Argument "" isn't numeric in numeric gt (>) at /work/home/acbirxa1yd/miniconda3/envs/EDTA2/share/LTR_retriever/LTR_retriever line 380.

ERROR: No candidate is found in the file(s) you specified.

awk: fatal: cannot open file `new.part_002.fasta.mod.pass.list' for reading: No such file or directory
Warning: LOC list - is empty.

	perl rename_LTR_skim.pl target_sequence.fa LTR_retriever.defalse


Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
	Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
	Author: Shujun Ou ([email protected]) 10/11/2019
	
mv: cannot stat 'new.part_002.fasta.mod.LTR.intact.fa.ori.dusted.cln.cln': No such file or directory
mv: cannot stat 'new.part_002.fasta.mod.LTR.intact.fa.ori.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'new.part_002.fasta.mod.LTR.intact.raw.fa.anno.list': No such file or directory
ERROR: No such file or directory at /work/home/acbirxa1yd/renhongbin/EDTA/util/output_by_list.pl line 39.

	perl filter_gff3.pl file.gff3 file.list > new.gff3

Wed Oct  9 21:35:32 CST 2024	Warning: The LTR result file has 0 bp!

Wed Oct  9 21:35:32 CST 2024	Start to find SINE candidates.

Thu Oct 10 03:26:20 CST 2024	Finish finding SINE candidates.

Thu Oct 10 03:26:20 CST 2024	Start to find LINE candidates.

Thu Oct 10 03:26:20 CST 2024	Existing result file new.part_002.fasta.mod-families.fa found!
				Will keep this file without rerunning this module.
				Please specify --overwrite 1 if you want to rerun this module.

Thu Oct 10 03:26:30 CST 2024	Finish finding LINE candidates.

Thu Oct 10 03:26:30 CST 2024	Start to find TIR candidates.

Thu Oct 10 03:26:30 CST 2024	Identify TIR candidates from scratch.

Species: others
Thu Oct 10 16:34:43 CST 2024	Finish finding TIR candidates.

Thu Oct 10 16:34:43 CST 2024	Start to find Helitron candidates.

Thu Oct 10 16:34:43 CST 2024	Existing result file new.part_002.fasta.mod.Helitron.intact.raw.fa found!
				Will keep this file without rerunning this module.
				Please specify --overwrite 1 if you want to rerun this module.

Thu Oct 10 16:34:43 CST 2024	Finish finding Helitron candidates.

Thu Oct 10 16:34:43 CST 2024	Execution of EDTA_raw.pl is finished!

Thu Oct 10 16:34:43 CST 2024	Obtain raw TE libraries finished.
				All intact TEs found by EDTA: 
					new.part_002.fasta.mod.EDTA.intact.raw.fa 
					new.part_002.fasta.mod.EDTA.intact.raw.gff3

Thu Oct 10 16:34:43 CST 2024	Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library: 

Thu Oct 10 16:35:50 CST 2024	EDTA advance filtering finished.

Thu Oct 10 16:35:50 CST 2024	Perform EDTA final steps to generate a non-redundant comprehensive TE library.

				Skipping the RepeatModeler results (--sensitive 0).
				Run EDTA.pl --step final --sensitive 1 if you want to add RepeatModeler results.

				Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

Thu Oct 10 16:37:06 CST 2024	EDTA final stage finished! You may check out:
				The final EDTA TE library: new.part_002.fasta.mod.EDTA.TElib.fa
The end time is: 2024-10-10 16:37:06

Warning: No sequences were masked
  1. Issue with SINE detection: For other chromosome parts, while LTR elements were detected, no SINE elements were found during the annotation process. Is there something that could be affecting SINE detection across these chromosomes?
The start time is: 2024-09-25 16:01:12 
My job ID is: 14944128 
The total cores is: 32 
The hosts is: 
g06r4n15:32

#########################################################
##### Extensive de-novo TE Annotator (EDTA) v2.2.1  #####
##### Shujun Ou ([email protected])             #####
#########################################################

Parameters: --genome new.part_006.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1

Wed Sep 25 16:01:14 CST 2024	Dependency checking:
				All passed!

Wed Sep 25 16:02:34 CST 2024	The longest sequence ID in the genome contains 61 characters, which is longer than the limit (13)
				Trying to reformat seq IDs...
				Attempt 1...
Wed Sep 25 16:03:01 CST 2024	Seq ID conversion successful!

Wed Sep 25 16:03:01 CST 2024	Obtain raw TE libraries using various structure-based programs: 

Wed Sep 25 16:03:01 CST 2024	EDTA_raw: Check dependencies, prepare working directories.

Wed Sep 25 16:03:22 CST 2024	Start to find LTR candidates.

Wed Sep 25 16:03:22 CST 2024	Identify LTR retrotransposon candidates from scratch.

Thu Sep 26 09:26:36 CST 2024	Finish finding LTR candidates.

Thu Sep 26 09:26:36 CST 2024	Start to find SINE candidates.

cp: cannot stat 'new.part_006.fasta.mod.SINE.raw.fa': No such file or directory
Error: SINE results not found!

cat: new.part_006.fasta.mod.TIR.intact.raw.bed: No such file or directory
cat: new.part_006.fasta.mod.Helitron.intact.raw.bed: No such file or directory
cp: cannot stat '../new.part_006.fasta.mod.EDTA.raw/new.part_006.fasta.mod.RM2.fa': No such file or directory

Thu Sep 26 09:26:37 CST 2024	Obtain raw TE libraries finished.
				All intact TEs found by EDTA: 
					new.part_006.fasta.mod.EDTA.intact.raw.fa 
					new.part_006.fasta.mod.EDTA.intact.raw.gff3

Thu Sep 26 09:26:37 CST 2024	Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library: 

Thu Sep 26 09:34:02 CST 2024	EDTA advance filtering finished.

Thu Sep 26 09:34:02 CST 2024	Perform EDTA final steps to generate a non-redundant comprehensive TE library.

				Skipping the RepeatModeler results (--sensitive 0).
				Run EDTA.pl --step final --sensitive 1 if you want to add RepeatModeler results.

				Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

Thu Sep 26 10:28:10 CST 2024	EDTA final stage finished! You may check out:
				The final EDTA TE library: new.part_006.fasta.mod.EDTA.TElib.fa
The end time is: 2024-09-26 10:28:10

If you need further information or logs, I’d be happy to provide them. I appreciate your time and help with these issues.

Thank you again for your continued support and for developing such a valuable tool!

Best regards,
rr

@oushujun
Copy link
Owner

oushujun commented Oct 17, 2024 via email

@oushujun
Copy link
Owner

Any lucks?

@oushujun oushujun added the question Further information is requested label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants