Releases · AuReMe/esmecata

change in intermediary files of clustering and annotation in order to reduce disk space used by EsMeCaTa and the number of operations performed by the methods. Instead of assigning one file per observation name, this new version assigns one file per taxon used by EsMeCaTa. This removes a lot of redundant work that slowed EsMeCaTa and could lead to issue.
annotation with eggnog-mapper is now the default workflow methods of EsMeCaTa. The previous annotation methods with UniProt has been moved to annotation_uniprot and workflow_uniprot.

Add

Add sub-commands annotation_uniprot and workflow_uniprot to use the old method of protein annotation.
Add check subcommand that performs the first step of EsMeCaTa without downloading the proteomes. This is helpful when you want to have a glimpse on the available knowledge for your dataset.
Error message if incorrect extension is given as input to esmecata.

Fix

Missing import in proteomes.
Github Actions.

Modify

Modify intermediary files to associate them with taxon name selected by EsMeCaTa instead of the observation name (based on an idea of @PaulineGHG). This change replaces tsv files, that were created for each observation names. Now they will be created for each taxon instead. This means that observation names with the same taxon will be associated with the same file. This reduces the redundancy of the file and decreases the number of operations made by EsMeCaTa.
Modify how the log json files are created so if a run failed, a new log json file is created instead of erasing the previous ones.
Move from setup.py and setup.cfg to pyproject.toml.
Update readme and tutorial.
Update license year.

Remove

Remove sub-commands annotation_eggnog and workflow_eggnog which are now the default sub-commands annotation and workflow.

Contributors

PaulineGHG

Assets 2

10 Dec 17:56

ArnaudBelcour

0.3.0

d37b91e

esmecata 0.3.0

Add a new way to annotate protein clusters using eggnog-mapper. From test on metagenomcis data, it is more accurate than the methods with UniProt.
Also modify the default option of EsMeCaTa for option with better results on tested data (minimal number of proteomes from 1 to 5 and clustering threshold from 0.95 to 0.5).

Add

Add a new method to annotate protein clusters using eggnog-mapper: new script eggnog.py, new commands annotation_eggnog and workflow_eggnog.
Add option to query uniprot dat files during annotation (--annotation-files, needs biopython>=1.81).
Add an option to use bioservices for annotation queries (--bioservices, requires bioservices>=1.11.2).
Add more tests for proteomes selection.
Add an option to update taxonomic affiliations (--update-affiliations).
Show the failedIDs during mapping for annotation.
Add an option to specify eggnog-mapper tmp fodler (--eggnog-tmp). By default, it is in esmecata output folder.
Add KEGG reaction in annotation_reference file when using eggnog-mapper.
Add a function to compare Input taxa information to esmecata taxa information (taxa name, taxa ID, taxa rank) + precise OTUs associated. Thanks to @PaulineGHG.

Fix

Do not use already annotated proteins when using annotation files.
Fix issue in esmecata proteomes, not using non-reference proteome.
Fix issue with missing reference proteome when parsing SPARQL results.
Fix issue in main with cli.
Fix an issue with minimal-nb-proteomes and non-reference proteomes.

Modify

Modify default options to --minimal-nb-proteomes of 5 (from 1 in previous version) and -t (clustering threshold from 0.95 to 0.5).
Modify rank_limit option to make it more understandable. Only taxon ranks inferior or equal to the one given will be kept.
Remove several output folders (proteomes result, clustering fasta_consensus and clustering fasta_representative) to reduce size of EsMeCaTa results.
Rename tmp_proteome into proteomes.
Remove FROM in SPARQL queries to speed up the queries (could speed up SPARQL queries).
Change header for annotation files, especially: 'gos', 'ecs', 'interpros', 'rhea_ids' into 'GO', 'EC', 'InterPro', 'Rhea'.
Add column cluster_members in annotation reference file and renamed column protein into protein_cluster.
Do not create fasta file when there are no protein clusters.
Update license year.
Update esmecata worfklow picture.
Update the doc of esmecata.