-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing reaction names #181
Comments
@pecholleyc nice to have this issue. This should be a long-term thing and it will take some time to fully resolve the reaction names. Might be good to begin with 1-2 external id groups for importing the names. |
While I fully support the idea behind this issue, I don't have a straightforward suggestion here. It feels like there is no "ground truth" database to be used for the reaction names in a way that would resolve a majority of the empty names. My opinion is that, if possible, this should be scripted in a way that it can be run repeatedly. |
Can't agree more |
I'm at the point where I think any names would be better than the 8000+ reactions with no names. One way to do this would be to fetch the names in KEGG (example). Any thoughts? |
I hope it's okay to ping @haowang-bioinfo and @JonathanRob to discuss the idea mentioned above: reactions in KEGG have names. Would it make sense to programmatically use KEGG as a source for reaction names? |
@mihai-sysbio do you have other suggested sources besides KEGG? |
If we were to use the E.C., there should definitely be other sources (above, I linked to BRENDA). Personally I like the E.C.-based names more since they are more generic in a way (shorter, thus easier to read). However, I believe this should follow only after a curation of the E.C. codes. Moreover, over half of the reactions do not have such codes, and some have multiple. Because of this, I think the approach taken in #367 by using KEGG-provided names is the most reasonable solution we can adopt at the moment. |
@mihai-sysbio I'm hesitant about using an E.C.-based approach, since the E.C. number does not necessarily specify the reaction substrates. So in many cases you can have an E.C. that represents a type of reaction, in which many different substrates can participate. If an E.C.-based naming approach was applied to the model, my guess is that it would result in many reactions being assigned similar names. |
Interesting - do reaction names really need to be unique? I was counting on the uniqueness of the identifiers for that, and the names would be just a more readable/user-friendly string. |
They do not need to be unique, but they also should not be super general (to the point where hundreds of reactions have the same name - I'm thinking this is something that may happen with cholesterol or lipid metabolism, for example). But then again, maybe many identical reaction names is still better than no name at all? |
agree and have the same feeling that another advantage is that this can be programally implemented |
There are only 2423 KEGG ids in edit: with an updated KEGG mapping it might be more tempting to retrieve updated EC codes in addition to reaction names also via KEGG, thus dealing with #366 |
Come up with an idea to move this long-term goal one step further: The plan is to firstly locate reactions that are catalyzed by only one gene, i.e. single-gene-reaction, then go through these reactions and fill in empty reaction names by using the gene names extracted from |
So where do those reactions come from, there is no reaction name in their origin? |
they were inherited from HMR2 where reactions have no names originally |
Earlier it was suggested that we should have some scripted way to do this so that it could be run repeatedly. I've thought about it and don't think that this is necessary. The name of a reaction is not really something that needs to be updated very often, if at all. So even a one-shot, fairly manual approach to filling in the reaction names should be sufficient. |
We can try to map all external IDs to get reactions names as much as possible. For exchange and pseudo reactions, we just assign a reaction names such as |
I guess that's a quick pandas/Excel question - 5885 reactions have no KEGG/MetaNetX/Rhea id mapped in |
Among these 5800+ reactions, 1700+ are single-gene-reactions so that the names could be assigned via their gene names. |
Description of the issue:
A large amount of reactions in the model do not have a descriptive name.
Expected feature/value/output:
More reactions with descriptive names in the model.
Current feature/value/output:
8200+/13400+ reactions without name.
Reproducing these results:
search for
- name: ""\n - metabolites
in the .ymlMost of the current reaction names in the model are identical to the BiGG or Recon3D annotation. But using the BiGG / Recon3D, KEGG and Reactome external identifiers I estimate that 3500+ additional reaction names could be imported in the model (based on v1.3).
I think names then could also be curated or auto-generated by considering the equation and/or EC code of enzymes associated to the reactions.
I hereby confirm that I have:
master
branch of the repositoryThe text was updated successfully, but these errors were encountered: