Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw SINE results not found #540

Open
bbista opened this issue Jan 27, 2025 · 15 comments
Open

Raw SINE results not found #540

bbista opened this issue Jan 27, 2025 · 15 comments

Comments

@bbista
Copy link

bbista commented Jan 27, 2025

Hello,
I was running EDTA on a house fly genome and I get this error. It is a housefly genome.
I installed EDTA using conda.

#########################################################

Extensive de-novo TE Annotator (EDTA) v2.2.2
Shujun Ou ([email protected])

#########################################################

Parameters: --genome /project/meisel/users/bbista/genomes/USDA_David/3MV3/Mdom_3M-v3b_clean.fasta -t 36

Fri Jan 17 11:11:59 AM CST 2025 Dependency checking:
All passed!

Fri Jan 17 11:12:16 AM CST 2025 Obtain raw TE libraries using various structure-based programs:
Fri Jan 17 11:12:16 AM CST 2025 EDTA_raw: Check dependencies, prepare working directories.

Fri Jan 17 11:12:26 AM CST 2025 Start to find LTR candidates.

Fri Jan 17 11:12:26 AM CST 2025 Identify LTR retrotransposon candidates from scratch.

Fri Jan 17 01:00:22 PM CST 2025 Finish finding LTR candidates.

Fri Jan 17 01:00:22 PM CST 2025 Start to find SINE candidates.

cp: cannot stat 'Mdom_3M-v3b_clean.fasta.mod.SINE.raw.fa': No such file or directory
Error: SINE results not found!

ERROR: Raw SINE results not found in Mdom_3M-v3b_clean.fasta.mod.EDTA.raw/Mdom_3M-v3b_clean.fasta.mod.SINE.raw.fa
If you believe the program is working properly, this may be caused by the lack of SINEs in your genome.
slurmstepd: error: Detected 1 oom_kill event in StepId=4398198.batch. Some of the step tasks have been OOM Killed.

@Chriswinefield
Copy link

Hi Shujun,

It appears I am having the same issues as well (same errors). It appears as the .fasta.mod.SINE.raw. and.fa directory/files are missing. I have tried this both with the full pipeline and the SINE only pipe. In both cases I cant find the fasta.mod or the fasta.mod.EDTA.raw directory being written to the working directory.

I am proceeding to take a divide and rule approach to check the other parts of the pipeline. The LTR discovery has worked (as for #bbista)
Regards
Chris

@Chriswinefield
Copy link

Hi Again,

Have tried the ITR module and the same issue appears. See below the error log:

Thu Jan 30 02:25:03 UTC 2025 EDTA_raw: Check dependencies, prepare working directories.

Thu Jan 30 02:25:36 UTC 2025 Start to find TIR candidates.

Thu Jan 30 02:25:36 UTC 2025 Identify TIR candidates from scratch.

Species: others
Traceback (most recent call last):
File "/scale_wlg_nobackup/filesets/nobackup/lincoln03032/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 14, in
from bin.main import TIRLearner
File "/scale_wlg_nobackup/filesets/nobackup/lincoln03032/EDTA/bin/TIR-Learner3.0/bin/main.py", line 22, in
from prog_const import *
File "/scale_wlg_nobackup/filesets/nobackup/lincoln03032/EDTA/bin/TIR-Learner3.0/bin/prog_const.py", line 15, in
import regex as re
ModuleNotFoundError: No module named 'regex'
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /scale_wlg_nobackup/filesets/nobackup/lincoln03032/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list NbLx03.shortheader.fasta.mod.TIR.ext30.list is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
Author: Shujun Ou ([email protected]) 10/11/2019

mv: cannot stat 'NbLx03.shortheader.fasta.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'NbLx03.shortheader.fasta.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'NbLx03.shortheader.fasta.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
ERROR: No such file or directory at /scale_wlg_nobackup/filesets/nobackup/lincoln03032/EDTA/util/output_by_list.pl line 39.
Error: TIR results not found!

Appears writing of the inital raw data files/directory for modules other than LTR are not being written

@B10inform
Copy link

Hi, This error still persist.

Running directly TIR-Learner.py also gives the same problem.

(EDTA) @farm:~$ ./TIR-Learner.py -h
Traceback (most recent call last):
File "/home/.conda/envs/EDTA/share/TIR-Learner3/TIR-Learner.py", line 15, in
from bin.main import TIRLearner
File "/home/.conda/envs/EDTA/share/TIR-Learner3/bin/main.py", line 36, in
class TIRLearner:
File "/home/.conda/envs/EDTA/share/TIR-Learner3/bin/main.py", line 383, in TIRLearner
def progress_check(self, progress_or_module: int | list, step: int = None) -> bool:
TypeError: unsupported operand type(s) for |: 'type' and 'type'

Any update would be great.
Thanks

@oushujun
Copy link
Owner

I received several of these error reports, but could not reproduce them on my end. Can anyone provide a reproducible case? Thanks!

@Chriswinefield
Copy link

Hi Shujun,

I have repeatedly had these issues. I am wondering (from your comment), whether the issue may be a bad install.

I will clean the current instance off my account and reinstall and get back to you ASAP to see if this is the issue persists. Since you aren't seeing any issues on your end with the current versions I am wondering if this may be the issue.

Another thing that pops into mind is whether the raw data needs to be within the EDTA folder. In my case, I call EDTA from my working folder where the raw genome file is housed. All the EDTA folders and associated mod.raw files are generated within the EDTA folder (i.e. where the EDTA.pl files are installed). This doesn't appear to be a problem for LTR-finder but might be causing a problem for the other parts of the pipeline?

Regards
Chris

@B10inform
Copy link

Hi Chris,

The issue is with the TIR-Learner not the LTR-finder.

Thanks

@Chriswinefield
Copy link

Yes you are right - but initiating the SINE/LINE and Helitron pipelines independently via EDTA also causes the same crash (but for each of the appropriate sub-pipelines) - basically as far as I can see the folders and mod.raw files for these individual packages are not being written - the programmes initiate and fail as they cannot find the correct input files and folders

@oushujun
Copy link
Owner

oushujun commented Feb 11, 2025 via email

@Chriswinefield
Copy link

Interesting - I have been using the Conda install rather than the apptainer. I'll ask our HPC bods if we can get an apptainer instance installed. Also I'll see if we are installing the CPU version of tensorflow as you suggest.

I think you are correct in that there may be an interaction with the HPC/server setup causing the issues. I'll keep the thread informed as we work through the issues with our setup.

Thanks for the help
Chris

@B10inform
Copy link

Another recent issue indicates that you may need to install the cpu version
of tensorflow. Not sure if it’s related.

EDTA does not require input being in the program folder.

You may need to test with different servers/platforms to rule out corner
cases. Are you using conda or apptainer?

Shujun

Hi Shujun,

Do you think it is problem of the TIR-Learner itself?

Installed with conda (including the beta version).
conda create -n TIR-Learner
conda activate TIR-Learner
mamba install -c conda-forge -c bioconda tir-learner

TIR-Learner -h

Traceback (most recent call last):
File "/home/.conda/envs/TIR-Learner/share/TIR-Learner3/TIR-Learner.py", line 15, in
from bin.main import TIRLearner
File "/home/.conda/envs/TIR-Learner/share/TIR-Learner3/bin/main.py", line 36, in
class TIRLearner:
File "/home/.conda/envs/TIR-Learner/share/TIR-Learner3/bin/main.py", line 383, in TIRLearner
def progress_check(self, progress_or_module: int | list, step: int = None) -> bool:
TypeError: unsupported operand type(s) for |: 'type' and 'type'

Thanks

@B10inform
Copy link

B10inform commented Feb 12, 2025

Interesting - I have been using the Conda install rather than the apptainer. I'll ask our HPC bods if we can get an apptainer instance installed. Also I'll see if we are installing the CPU version of tensorflow as you suggest.

I think you are correct in that there may be an interaction with the HPC/server setup causing the issues. I'll keep the thread informed as we work through the issues with our setup.

Thanks for the help Chris

Hi Chris,

I download EDTA locally on my comp. I still get the same error so may be not due to the HPC/server setup.

EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --threads 10 > EDTA.test

Wed Feb 12 16:35:13 EST 2025 EDTA_raw: Check dependencies, prepare working directories.
Wed Feb 12 16:35:32 EST 2025 Start to find LTR candidates.
Wed Feb 12 16:35:32 EST 2025 Identify LTR retrotransposon candidates from scratch.
Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.
Wed Feb 12 16:36:43 EST 2025 Finish finding LTR candidates.
Wed Feb 12 16:36:43 EST 2025 Start to find SINE candidates.
Wed Feb 12 16:38:20 EST 2025 Warning: The SINE result file has 0 bp!
Wed Feb 12 16:38:20 EST 2025 Start to find LINE candidates.
Wed Feb 12 16:38:20 EST 2025 Identify LINE retrotransposon candidates from scratch.
Wed Feb 12 16:42:15 EST 2025 Warning: The LINE result file has 0 bp!
Wed Feb 12 16:42:15 EST 2025 Start to find TIR candidates.
Wed Feb 12 16:42:15 EST 2025 Identify TIR candidates from scratch.

Species: others
Traceback (most recent call last):
File "/home/anaconda3/envs/EDTA/share/TIR-Learner3/TIR-Learner.py", line 15, in
from bin.main import TIRLearner
File "/home/anaconda3/envs/EDTA/share/TIR-Learner3/bin/main.py", line 36, in
class TIRLearner:
File "/home/anaconda3/envs/EDTA/share/TIR-Learner3/bin/main.py", line 383, in TIRLearner
def progress_check(self, progress_or_module: int | list, step: int = None) -> bool:
TypeError: unsupported operand type(s) for |: 'type' and 'type'
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /home/anaconda3/envs/EDTA/share/EDTA/bin/rename_tirlearner.pl line 19.
Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
Author: Shujun Ou ([email protected]) 10/11/2019

mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
ERROR: No such file or directory at /home/anaconda3/envs/EDTA/share/EDTA/bin/output_by_list.pl line 39.
Error: TIR results not found!

ERROR: Raw TIR results not found in genome.fa.mod.EDTA.raw/genome.fa.mod.TIR.intact.raw.fa
If you believe the program is working properly, this may be caused by the lack of intact TIRs in your genome. Consider to use the --force 1 parameter to overwrite this check/anaconda3/envs/EDTA/share/TIR-Learner3/TIR-Learner.py", line 15, in
from bin.main import TIRLearner
File "/home/anaconda3/envs/EDTA/share/TIR-Learner3/bin/main.py", line 36, in
class TIRLearner:
File "/home/anaconda3/envs/EDTA/share/TIR-Learner3/bin/main.py", line 383, in TIRLearner
def progress_check(self, progress_or_module: int | list, step: int = None) -> bool:
TypeError: unsupported operand type(s) for |: 'type' and 'type'
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /home/anaconda3/envs/EDTA/share/EDTA/bin/rename_tirlearner.pl line 19.
Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
Author: Shujun Ou ([email protected]) 10/11/2019

mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
ERROR: No such file or directory at /home/anaconda3/envs/EDTA/share/EDTA/bin/output_by_list.pl line 39.
Error: TIR results not found!

ERROR: Raw TIR results not found in genome.fa.mod.EDTA.raw/genome.fa.mod.TIR.intact.raw.fa
If you believe the program is working properly, this may be caused by the lack of intact TIRs in your genome. Consider to use the --force 1 parameter to overwrite this check

@Chriswinefield
Copy link

I will see if I can get the Aptainer install working as this might relate to something that has broken in the the generation of the Conda environment.

@oushujun
Copy link
Owner

@B10inform The current TIR-Learner recipe uses pytorch-cuda, so if your machine does not have a Nvidia GPU it will run into error. Please try the updated yml file for installation of EDTA, which uses pytorch-cpu.

Thanks!
Shujun

@B10inform
Copy link

Hi Shujin,

I am still getting the same problem.
EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --threads 10 > EDTA.test

Species: others
Traceback (most recent call last):
File "/home/.conda/envs/EDTA/share/TIR-Learner3/TIR-Learner.py", line 15, in
from bin.main import TIRLearner
File "/home/.conda/envs/EDTA/share/TIR-Learner3/bin/main.py", line 36, in
class TIRLearner:
File "/home/.conda/envs/EDTA/share/TIR-Learner3/bin/main.py", line 383, in TIRLearner
def progress_check(self, progress_or_module: int | list, step: int = None) -> bool:
TypeError: unsupported operand type(s) for |: 'type' and 'type'
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /home/.conda/envs/EDTA/share/EDTA/bin/rename_tirlearner.pl line 19.
Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.

conda installation:
conda create -n EDTA
conda activate EDTA
mamba install -c conda-forge -c bioconda edta

Linking annosine2-2.0.8-pyh7e72e81_0
Linking ltr_retriever-3.0.1-hdfd78af_1
Linking repeatmodeler-2.0.6-pl5321hdfd78af_0
Linking r-tibble-3.2.1-r44hdb488b9_3
Linking pytorch-2.6.0-cpu_generic_py39_h57ffae5_0
Linking r-dplyr-1.1.4-r44h0d4f4ea_1
Linking r-tidyr-1.3.1-r44h0d4f4ea_1
Linking r-ggplot2-3.5.1-r44hc72bb7e_1
Linking tir-learner-3.0.5-hdfd78af_0
Linking edta-2.2.2-hdfd78af_1

git installation.
git clone https://github.com/oushujun/EDTA.git

perl ./EDTA/EDTA.pl --check_dependencies
Error: AnnoSINE is not found in the AnnoSINE path ./!

Thanks

@B10inform
Copy link

B10inform commented Feb 20, 2025

Hi Shujin,

It is working now. I had to completely(clean) remove conda environment (all traces of the configuration files).

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants