Skip to content

Releases: AuReMe/esmecata

esmecata 0.6.4

07 Mar 17:03
7dfdf1c
Compare
Choose a tag to compare

Add

  • Function get_domain_or_superkingdom_from_ncbi_tax_database in esmecata.utils to check if domain is called domain or superkingdom in NCBI Taxonomy database.

Fix

  • Taxonomic rank superkingdom has been renamed to domain in recent version of the NCBI Taxonomy database. Fix several related issues in different parts of EsMeCaTa.
  • esmecata_gseapy was not working with results from esmecata precomputed due to missing proteome_tax_id.tsv in 2_annotation folder.

esmecata 0.6.3

07 Mar 14:59
5ba9130
Compare
Choose a tag to compare

Fix

  • KeyError when using precomputed database from EsMeCaTa article (as there is no KEGG_reaction in these databases).

esmecata 0.6.2

07 Mar 14:23
ca6bb4c
Compare
Choose a tag to compare

Modify

  • Replace datapane by arakawa for HTML report creation in esmecata_report (issue #20). Make HTMLs standalone (they are a lot bigger but bo not require internet to display).
  • Replace ete3 by ete4 as ete3 is no longer maintained (issue #18).

esmecata 0.6.1

03 Mar 14:35
76642bf
Compare
Choose a tag to compare

Fix

  • Issue with tax synonyms on the same tax rank when filtering rank in proteomes.

Modify

  • Use taxon ID to search for taxon presence in precomputed database.
  • Use Trusted publisher to publish on PyPI.

esmecata 0.6.0

27 Jan 15:11
acf38e8
Compare
Choose a tag to compare

Add

  • New command esmecata_create_db to create database from different output folders of esmecata (from_runs).
  • Full release of esmecata precomputed associated with the first version of esmecata precomputed database.
  • Option threshold (-t) to precomputed.
  • Add --gseapyCutOff option to gseapy_enrichr.
  • A check after database creation to detect taxon with few predicted proteins compared to higher affiliated taxon.
  • Check the good format of the gzip file.
  • Header KEGG_reaction in annotation_reference from annotation_uniprot to avoid issues with esmecata_create_db.

Fix

  • Issue with protein IDs from UniParc during annotation (incorrect split on '|').
  • Fix issue in get_taxon_obs_name function.
  • Issues in test.

Modify

  • Add database version in log.
  • Rename test_workflow.py into test_workflow_uniprot.py, to better reflect what is done.
  • Update workflow figure.
  • Update readme.
  • Update article_data folder and the associated readme.

esmecata 0.5.4

06 Nov 14:32
Compare
Choose a tag to compare

Fix

  • Issue with proteomes from UniParc during clustering (incorrect split on '|').
  • Issue in test with updated taxonomic group.

esmecata 0.5.3

31 Oct 14:42
Compare
Choose a tag to compare

Fix

  • Handle an issue with requests.exceptions.ChunkedEncodingError.

Modify

  • Remove unused header in output file of gseapy.

esmecata 0.5.2

21 Oct 10:38
Compare
Choose a tag to compare

Add

  • New way to search proteomes by using UniParc. Some proteomes, when downloaded directly from UniProt are empty. A solution is to search for them in UniParc and retrieved the associated protein sequences.
  • New plot in report showing proteomes according to tax_rank.
  • Database number version when creating precomputed database.
  • The possibility to give a file containing manually selected groups of observation names for esmecata_gseapy gseapy_enrichr.
  • Tests for esmecata_gseapy gseapy_enrichr.

Fix

  • Issue in creating heatmap of proteomes (missing taxon rank) in report creation.
  • Issue when creating database: there was a possibility that a taxon without consensus proteomes and associated annotations was kept.

Modify

  • Update parameter description for SPARQL option to indicate the value to query SPARQL UniProt Endpoint.
  • Rename esmecata_gseapy gseapy_taxon into esmecata_gseapy gseapy_enrichr to reflect the changes in the command.
  • Modify how esmecata_gseapy gseapy_enrichr works by adding a grouping parameters allowing to choose either groups according to taxon_rank or with a file created by the user and containing manually selected groups of observation names.
  • Update readme according to the different changes made in this release.

TODO

  • Investigate and solve memory leak when mapping UniParc IDs to UniProt with bioservices.
  • Add handling of UniParc IDs with SPARQL queries.

esmecata 0.5.1

04 Oct 10:10
Compare
Choose a tag to compare

Add

  • Metadata file for esmecata_gseapy gseapy_taxon.

Fix

  • Several issues in esmecata_gseapy gseapy_taxon.
  • Issues in tests due to UniProt updates.

Modify

  • Update metadata files for annotation and eggnog by adding missing dependencies.

esmecata 0.5.0

02 Oct 10:03
d3ccee2
Compare
Choose a tag to compare

WARNING: Changes in the structure of the python package of EsMeCaTa.
If you have been importing the package in Python, you will need to modify your import.

Add

  • New command esmecata_report to create a report from the output folder of EsMeCaTa. Scripts of esmecata_report allow to create html, pdf and tsv reports from EsMeCaTa (work of @alimatai and @PaulineGHG). This command has several subcommands:
    • (1) create_report to create a report from the output folder of the esmecata workflow subcommand (only way to have the complete HTML report).
    • (2) create_report_proteomes to create report files from output of esmecata proteomes subcommand.
    • (3) create_report_clustering to create report files from output of esmecata clustering subcommand.
    • (4) create_report_annotation to create report files from output of esmecata annotation subcommand.
  • New command esmecata_gseapy to create enrichment analysis of functions predicted by EsMeCaTa according to taxon rank.
  • New optional dependencies required for esmecata_report: datapane, plotly, kaleido, ontosunburst. As datapane is no more maintained, an alternative with panel is currently developed.
  • New optional dependencies required for esmecata_gseapy: gseapy and orsum.
  • New file indicating the EC numbers and GO Terms for the different observation name of the dataset (file function_table.tsv).
  • New subcommand esmecata precomputed. This subcommand uses a precomputed database to make predictions from the input file (using EsMeCaTa default parameters). It has been added to avoid creating the same prediction every run and to have a fast way to make predictions with EsMeCaTa. It is necessary to download the precompiled database before using it. At the moment of this release, the database is not available, these scripts are present for testing purposes.
  • Prototype for precomputed database creation: several scripts are added in esmecata/precomputed folder to create the input and the precomputed database.
  • Check that the proteome files are not completely empty, which could cause problems with mmseqs2.
  • Tests for precomputed database, report creation, database creation and eggnog annotation. Add mock on sevral functions to perform the test. Required pytest-mock.
  • Add readme in test folder.

Fix

  • Issue in proteomes SPARQL query (missing PREFIX).

Modify

  • Modification of the structure of the EsMeCaTa package, now divided into 4 main folders: (1) esmecata/core (for the script previously contained in the EsMeCaTa folder) and used for the workflow, (2) esmecata/report to generate a report from the esmecata output folder, (3) esmecata/gseapy to perform enrichment analysis on the esmecata output, and (4) esmecata/precomputed to create precomputed database (in development).
  • Change the name of intermediary files in clustering and annotation to avoid issues with ambiguous taxon names.
  • Modify test according to changes of packaging structure.
  • Modify the behaviour of annotation by eggnog-mapper. Now it merges protein sequences from clustering into bigger files (associated with superkingdom). This increases the performance of eggnog-mapper. Modification made with @megyl. Use --tax_scope with eggnog-mapper.
  • Update article_data folder.
  • Update CI tests of github workflow according to the new tests and the new dependencies.

Remove

  • Remove esmecata analysis subcommand as it was not used and not very useful.