Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing reaction names #181

Open
3 tasks done
pecholleyc opened this issue Jun 17, 2020 · 19 comments
Open
3 tasks done

missing reaction names #181

pecholleyc opened this issue Jun 17, 2020 · 19 comments

Comments

@pecholleyc
Copy link
Contributor

Description of the issue:

A large amount of reactions in the model do not have a descriptive name.

Expected feature/value/output:

More reactions with descriptive names in the model.

Current feature/value/output:

8200+/13400+ reactions without name.

Reproducing these results:

search for - name: ""\n - metabolites in the .yml

Most of the current reaction names in the model are identical to the BiGG or Recon3D annotation. But using the BiGG / Recon3D, KEGG and Reactome external identifiers I estimate that 3500+ additional reaction names could be imported in the model (based on v1.3).

I think names then could also be curated or auto-generated by considering the equation and/or EC code of enzymes associated to the reactions.

I hereby confirm that I have:

  • Tested my code on my own computer for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
@haowang-bioinfo
Copy link
Member

haowang-bioinfo commented Jun 17, 2020

@pecholleyc nice to have this issue.

This should be a long-term thing and it will take some time to fully resolve the reaction names. Might be good to begin with 1-2 external id groups for importing the names.

@mihai-sysbio
Copy link
Member

While I fully support the idea behind this issue, I don't have a straightforward suggestion here. It feels like there is no "ground truth" database to be used for the reaction names in a way that would resolve a majority of the empty names.

My opinion is that, if possible, this should be scripted in a way that it can be run repeatedly.

@haowang-bioinfo
Copy link
Member

Can't agree more

@mihai-sysbio
Copy link
Member

mihai-sysbio commented Dec 10, 2021

I'm at the point where I think any names would be better than the 8000+ reactions with no names.

One way to do this would be to fetch the names in KEGG (example).
Alternatively, the names can be fetched based on the E.C. code (example). That sounds more tricky since there are over 7600 empty eccode, and other entries with multiple E.C. codes.

Any thoughts?

@mihai-sysbio
Copy link
Member

I hope it's okay to ping @haowang-bioinfo and @JonathanRob to discuss the idea mentioned above: reactions in KEGG have names. Would it make sense to programmatically use KEGG as a source for reaction names?

@haowang-bioinfo
Copy link
Member

@mihai-sysbio do you have other suggested sources besides KEGG?

@mihai-sysbio
Copy link
Member

@mihai-sysbio do you have other suggested sources besides KEGG?

If we were to use the E.C., there should definitely be other sources (above, I linked to BRENDA). Personally I like the E.C.-based names more since they are more generic in a way (shorter, thus easier to read). However, I believe this should follow only after a curation of the E.C. codes. Moreover, over half of the reactions do not have such codes, and some have multiple. Because of this, I think the approach taken in #367 by using KEGG-provided names is the most reasonable solution we can adopt at the moment.

@JonathanRob
Copy link
Collaborator

@mihai-sysbio I'm hesitant about using an E.C.-based approach, since the E.C. number does not necessarily specify the reaction substrates. So in many cases you can have an E.C. that represents a type of reaction, in which many different substrates can participate. If an E.C.-based naming approach was applied to the model, my guess is that it would result in many reactions being assigned similar names.

@mihai-sysbio
Copy link
Member

many reactions being assigned similar names

Interesting - do reaction names really need to be unique? I was counting on the uniqueness of the identifiers for that, and the names would be just a more readable/user-friendly string.

@JonathanRob
Copy link
Collaborator

They do not need to be unique, but they also should not be super general (to the point where hundreds of reactions have the same name - I'm thinking this is something that may happen with cholesterol or lipid metabolism, for example). But then again, maybe many identical reaction names is still better than no name at all?

@haowang-bioinfo
Copy link
Member

haowang-bioinfo commented Jul 28, 2022

using KEGG-provided names is the most reasonable solution we can adopt at the moment

agree and have the same feeling that many identical reaction names is better than no name at all - KEGG reaction names are not very general.

another advantage is that this can be programally implemented

@mihai-sysbio
Copy link
Member

mihai-sysbio commented Jul 28, 2022

There are only 2423 KEGG ids in reactions.tsv - perhaps it would make more sense to extend the coverage via the MNX ids before mapping the names?

edit: with an updated KEGG mapping it might be more tempting to retrieve updated EC codes in addition to reaction names also via KEGG, thus dealing with #366

@haowang-bioinfo
Copy link
Member

Come up with an idea to move this long-term goal one step further:

The plan is to firstly locate reactions that are catalyzed by only one gene, i.e. single-gene-reaction, then go through these reactions and fill in empty reaction names by using the gene names extracted from genes.tsv file, which is based on Ensembl annotation.

@feiranl
Copy link
Collaborator

feiranl commented Nov 11, 2022

So where do those reactions come from, there is no reaction name in their origin?

@haowang-bioinfo
Copy link
Member

So where do those reactions come from, there is no reaction name in their origin?

they were inherited from HMR2 where reactions have no names originally

@JonathanRob
Copy link
Collaborator

Earlier it was suggested that we should have some scripted way to do this so that it could be run repeatedly. I've thought about it and don't think that this is necessary. The name of a reaction is not really something that needs to be updated very often, if at all. So even a one-shot, fairly manual approach to filling in the reaction names should be sufficient.

@feiranl
Copy link
Collaborator

feiranl commented Nov 11, 2022

We can try to map all external IDs to get reactions names as much as possible. For exchange and pseudo reactions, we just assign a reaction names such as Exchange glucose, transport glucose from c to m, or pseudo reaction. May I know the coverage of reaction with at least one external database ID such as KEGG/MetaNetX?

@mihai-sysbio
Copy link
Member

I guess that's a quick pandas/Excel question - 5885 reactions have no KEGG/MetaNetX/Rhea id mapped in reactions.tsv.

@haowang-bioinfo
Copy link
Member

5885 reactions have no KEGG/MetaNetX/Rhea id mapped in reactions.tsv.

Among these 5800+ reactions, 1700+ are single-gene-reactions so that the names could be assigned via their gene names.

migp11 added a commit to bsc-life/Human-GEM that referenced this issue May 10, 2024
@JHL-452b JHL-452b mentioned this issue May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants