Releases: AuReMe/esmecata
Releases · AuReMe/esmecata
esmecata 0.6.4
Add
- Function
get_domain_or_superkingdom_from_ncbi_tax_database
inesmecata.utils
to check if domain is called domain or superkingdom in NCBI Taxonomy database.
Fix
- Taxonomic rank
superkingdom
has been renamed todomain
in recent version of the NCBI Taxonomy database. Fix several related issues in different parts of EsMeCaTa. esmecata_gseapy
was not working with results fromesmecata precomputed
due to missingproteome_tax_id.tsv
in2_annotation
folder.
esmecata 0.6.3
Fix
- KeyError when using precomputed database from EsMeCaTa article (as there is no
KEGG_reaction
in these databases).
esmecata 0.6.2
esmecata 0.6.1
Fix
- Issue with tax synonyms on the same tax rank when filtering rank in
proteomes
.
Modify
- Use taxon ID to search for taxon presence in precomputed database.
- Use Trusted publisher to publish on PyPI.
esmecata 0.6.0
Add
- New command
esmecata_create_db
to create database from different output folders of esmecata (from_runs
). - Full release of
esmecata precomputed
associated with the first version of esmecata precomputed database. - Option threshold (
-t
) to precomputed. - Add
--gseapyCutOff
option togseapy_enrichr
. - A check after database creation to detect taxon with few predicted proteins compared to higher affiliated taxon.
- Check the good format of the gzip file.
- Header
KEGG_reaction
in annotation_reference fromannotation_uniprot
to avoid issues withesmecata_create_db
.
Fix
- Issue with protein IDs from UniParc during annotation (incorrect split on '|').
- Fix issue in
get_taxon_obs_name
function. - Issues in test.
Modify
- Add database version in log.
- Rename
test_workflow.py
intotest_workflow_uniprot.py
, to better reflect what is done. - Update workflow figure.
- Update readme.
- Update article_data folder and the associated readme.
esmecata 0.5.4
Fix
- Issue with proteomes from UniParc during clustering (incorrect split on '|').
- Issue in test with updated taxonomic group.
esmecata 0.5.3
Fix
- Handle an issue with requests.exceptions.ChunkedEncodingError.
Modify
- Remove unused header in output file of gseapy.
esmecata 0.5.2
Add
- New way to search proteomes by using UniParc. Some proteomes, when downloaded directly from UniProt are empty. A solution is to search for them in UniParc and retrieved the associated protein sequences.
- New plot in report showing proteomes according to tax_rank.
- Database number version when creating precomputed database.
- The possibility to give a file containing manually selected groups of observation names for
esmecata_gseapy gseapy_enrichr
. - Tests for
esmecata_gseapy gseapy_enrichr
.
Fix
- Issue in creating heatmap of proteomes (missing taxon rank) in report creation.
- Issue when creating database: there was a possibility that a taxon without consensus proteomes and associated annotations was kept.
Modify
- Update parameter description for SPARQL option to indicate the value to query SPARQL UniProt Endpoint.
- Rename
esmecata_gseapy gseapy_taxon
intoesmecata_gseapy gseapy_enrichr
to reflect the changes in the command. - Modify how
esmecata_gseapy gseapy_enrichr
works by adding a grouping parameters allowing to choose either groups according to taxon_rank or with a file created by the user and containing manually selected groups of observation names. - Update readme according to the different changes made in this release.
TODO
- Investigate and solve memory leak when mapping UniParc IDs to UniProt with bioservices.
- Add handling of UniParc IDs with SPARQL queries.
esmecata 0.5.1
Add
- Metadata file for
esmecata_gseapy gseapy_taxon
.
Fix
- Several issues in
esmecata_gseapy gseapy_taxon
. - Issues in tests due to UniProt updates.
Modify
- Update metadata files for annotation and eggnog by adding missing dependencies.
esmecata 0.5.0
WARNING: Changes in the structure of the python package of EsMeCaTa.
If you have been importing the package in Python, you will need to modify your import.
Add
- New command
esmecata_report
to create a report from the output folder of EsMeCaTa. Scripts ofesmecata_report
allow to createhtml
,pdf
andtsv
reports from EsMeCaTa (work of @alimatai and @PaulineGHG). This command has several subcommands:- (1)
create_report
to create a report from the output folder of theesmecata workflow
subcommand (only way to have the complete HTML report). - (2)
create_report_proteomes
to create report files from output ofesmecata proteomes
subcommand. - (3)
create_report_clustering
to create report files from output ofesmecata clustering
subcommand. - (4)
create_report_annotation
to create report files from output ofesmecata annotation
subcommand.
- (1)
- New command
esmecata_gseapy
to create enrichment analysis of functions predicted by EsMeCaTa according to taxon rank. - New optional dependencies required for
esmecata_report
: datapane, plotly, kaleido, ontosunburst. Asdatapane
is no more maintained, an alternative with panel is currently developed. - New optional dependencies required for
esmecata_gseapy
: gseapy and orsum. - New file indicating the EC numbers and GO Terms for the different observation name of the dataset (file
function_table.tsv
). - New subcommand
esmecata precomputed
. This subcommand uses a precomputed database to make predictions from the input file (using EsMeCaTa default parameters). It has been added to avoid creating the same prediction every run and to have a fast way to make predictions with EsMeCaTa. It is necessary to download the precompiled database before using it. At the moment of this release, the database is not available, these scripts are present for testing purposes. - Prototype for precomputed database creation: several scripts are added in
esmecata/precomputed
folder to create the input and the precomputed database. - Check that the proteome files are not completely empty, which could cause problems with mmseqs2.
- Tests for precomputed database, report creation, database creation and eggnog annotation. Add mock on sevral functions to perform the test. Required
pytest-mock
. - Add readme in test folder.
Fix
- Issue in proteomes SPARQL query (missing
PREFIX
).
Modify
- Modification of the structure of the EsMeCaTa package, now divided into 4 main folders: (1)
esmecata/core
(for the script previously contained in the EsMeCaTa folder) and used for the workflow, (2)esmecata/report
to generate a report from the esmecata output folder, (3)esmecata/gseapy
to perform enrichment analysis on the esmecata output, and (4)esmecata/precomputed
to create precomputed database (in development). - Change the name of intermediary files in
clustering
andannotation
to avoid issues with ambiguous taxon names. - Modify test according to changes of packaging structure.
- Modify the behaviour of annotation by eggnog-mapper. Now it merges protein sequences from clustering into bigger files (associated with superkingdom). This increases the performance of eggnog-mapper. Modification made with @megyl. Use
--tax_scope
with eggnog-mapper. - Update article_data folder.
- Update CI tests of github workflow according to the new tests and the new dependencies.
Remove
- Remove
esmecata analysis
subcommand as it was not used and not very useful.