Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR Identifying TIRs - swifter #551

Open
diriano opened this issue Mar 14, 2025 · 0 comments
Open

ERROR Identifying TIRs - swifter #551

diriano opened this issue Mar 14, 2025 · 0 comments

Comments

@diriano
Copy link

diriano commented Mar 14, 2025

Hi,
We are running EDTA 2.2.2 on some Citrus genomes, and got the following error during the TIR identification:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/EDTA2.2/lib/python3.12/site-packages/swifter/swifter.py", line 419, in apply
    tmp_df = func(sample, *args, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/EDTA2.2/share/TIR-Learner3/bin/get_fasta_sequence.py", line 17, in <lambda>
    df["end"] = df.swifter.progress_bar(flag_verbose).apply(lambda x: min(x["end"], fasta_len_dict[x["seqid"]]), axis=1)

EDTA continues, but the TIR directory does not have all result files, and the pipeline complains about some missing file sunder the TIR directory:

KeyError: 'CM039161.1_split_1of5'
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /home/user/programs/EDTA/bin/rename_tirlearner.pl line 19.
Warning: LOC list GCA_022201045.1_DVS_A1.0_genomic.fna.mod.TIR.ext30.list is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
	Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
	Author: Shujun Ou ([email protected]) 10/11/2019
	
mv: cannot stat 'GCA_022201045.1_DVS_A1.0_genomic.fna.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'GCA_022201045.1_DVS_A1.0_genomic.fna.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'GCA_022201045.1_DVS_A1.0_genomic.fna.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
ERROR: No such file or directory at /home/user/programs/EDTA/bin/output_by_list.pl line 39.
Error: TIR results not found!

cat: GCA_022201045.1_DVS_A1.0_genomic.fna.mod.Helitron.intact.raw.bed: No such file or directory

Here are the version of some of the python (3.12.8) modules I think may be relevant:

>>> import swifter
>>> print(swifter.__version__)
1.4.0
>>> import numpy
>>> print(numpy.__version__)
2.2.3
>>> import pandas
>>> print(pandas.__version__)
2.2.2

EDTA was installed using the YML file: https://raw.githubusercontent.com/oushujun/EDTA/refs/heads/master/EDTA_2.2.x.yml and there were no problems during installation.

I replaced line 17 of file share/TIR-Learner3/bin/get_fasta_sequence.py with the following line:

df["end"] = df.swifter.progress_bar(flag_verbose).apply(lambda x: min(x["end"], fasta_len_dict.get(x["seqid"], float('inf'))), axis=1)

And EDTA could complete without problems producing TIR results.
I would appreciate your advice on whether this is a good approach or if you have any alternative suggestions.
Thanks in advance, and I truly appreciate the great software!

Best,
Diego

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant