missing reaction names #181

pecholleyc · 2020-06-17T12:55:30Z

Description of the issue:

A large amount of reactions in the model do not have a descriptive name.

Expected feature/value/output:

More reactions with descriptive names in the model.

Current feature/value/output:

8200+/13400+ reactions without name.

Reproducing these results:

search for - name: ""\n - metabolites in the .yml

Most of the current reaction names in the model are identical to the BiGG or Recon3D annotation. But using the BiGG / Recon3D, KEGG and Reactome external identifiers I estimate that 3500+ additional reaction names could be imported in the model (based on v1.3).

I think names then could also be curated or auto-generated by considering the equation and/or EC code of enzymes associated to the reactions.

I hereby confirm that I have:

Tested my code on my own computer for running the model
Done this analysis in the master branch of the repository
Checked that a similar issue does not exist already

The text was updated successfully, but these errors were encountered:

haowang-bioinfo · 2020-06-17T13:36:44Z

@pecholleyc nice to have this issue.

This should be a long-term thing and it will take some time to fully resolve the reaction names. Might be good to begin with 1-2 external id groups for importing the names.

mihai-sysbio · 2021-11-08T14:51:02Z

While I fully support the idea behind this issue, I don't have a straightforward suggestion here. It feels like there is no "ground truth" database to be used for the reaction names in a way that would resolve a majority of the empty names.

My opinion is that, if possible, this should be scripted in a way that it can be run repeatedly.

haowang-bioinfo · 2021-11-08T14:52:23Z

Can't agree more

mihai-sysbio · 2021-12-10T15:15:47Z

I'm at the point where I think any names would be better than the 8000+ reactions with no names.

One way to do this would be to fetch the names in KEGG (example).
Alternatively, the names can be fetched based on the E.C. code (example). That sounds more tricky since there are over 7600 empty eccode, and other entries with multiple E.C. codes.

Any thoughts?

mihai-sysbio · 2022-01-26T08:31:47Z

I hope it's okay to ping @haowang-bioinfo and @JonathanRob to discuss the idea mentioned above: reactions in KEGG have names. Would it make sense to programmatically use KEGG as a source for reaction names?

haowang-bioinfo · 2022-03-19T07:05:50Z

@mihai-sysbio do you have other suggested sources besides KEGG?

mihai-sysbio · 2022-03-19T09:47:53Z

@mihai-sysbio do you have other suggested sources besides KEGG?

If we were to use the E.C., there should definitely be other sources (above, I linked to BRENDA). Personally I like the E.C.-based names more since they are more generic in a way (shorter, thus easier to read). However, I believe this should follow only after a curation of the E.C. codes. Moreover, over half of the reactions do not have such codes, and some have multiple. Because of this, I think the approach taken in #367 by using KEGG-provided names is the most reasonable solution we can adopt at the moment.

JonathanRob · 2022-03-21T07:24:41Z

@mihai-sysbio I'm hesitant about using an E.C.-based approach, since the E.C. number does not necessarily specify the reaction substrates. So in many cases you can have an E.C. that represents a type of reaction, in which many different substrates can participate. If an E.C.-based naming approach was applied to the model, my guess is that it would result in many reactions being assigned similar names.

mihai-sysbio · 2022-07-28T14:07:40Z

many reactions being assigned similar names

Interesting - do reaction names really need to be unique? I was counting on the uniqueness of the identifiers for that, and the names would be just a more readable/user-friendly string.

JonathanRob · 2022-07-28T14:39:56Z

They do not need to be unique, but they also should not be super general (to the point where hundreds of reactions have the same name - I'm thinking this is something that may happen with cholesterol or lipid metabolism, for example). But then again, maybe many identical reaction names is still better than no name at all?

haowang-bioinfo · 2022-07-28T14:49:06Z

using KEGG-provided names is the most reasonable solution we can adopt at the moment

agree and have the same feeling that many identical reaction names is better than no name at all - KEGG reaction names are not very general.

another advantage is that this can be programally implemented

mihai-sysbio · 2022-07-28T21:15:42Z

There are only 2423 KEGG ids in reactions.tsv - perhaps it would make more sense to extend the coverage via the MNX ids before mapping the names?

edit: with an updated KEGG mapping it might be more tempting to retrieve updated EC codes in addition to reaction names also via KEGG, thus dealing with #366

haowang-bioinfo · 2022-11-11T11:21:24Z

Come up with an idea to move this long-term goal one step further:

The plan is to firstly locate reactions that are catalyzed by only one gene, i.e. single-gene-reaction, then go through these reactions and fill in empty reaction names by using the gene names extracted from genes.tsv file, which is based on Ensembl annotation.

feiranl · 2022-11-11T11:34:07Z

So where do those reactions come from, there is no reaction name in their origin?

haowang-bioinfo · 2022-11-11T11:41:42Z

So where do those reactions come from, there is no reaction name in their origin?

they were inherited from HMR2 where reactions have no names originally

JonathanRob · 2022-11-11T11:44:22Z

Earlier it was suggested that we should have some scripted way to do this so that it could be run repeatedly. I've thought about it and don't think that this is necessary. The name of a reaction is not really something that needs to be updated very often, if at all. So even a one-shot, fairly manual approach to filling in the reaction names should be sufficient.

feiranl · 2022-11-11T12:27:51Z

We can try to map all external IDs to get reactions names as much as possible. For exchange and pseudo reactions, we just assign a reaction names such as Exchange glucose, transport glucose from c to m, or pseudo reaction. May I know the coverage of reaction with at least one external database ID such as KEGG/MetaNetX?

mihai-sysbio · 2022-11-11T18:53:48Z

I guess that's a quick pandas/Excel question - 5885 reactions have no KEGG/MetaNetX/Rhea id mapped in reactions.tsv.

haowang-bioinfo · 2022-11-12T11:30:56Z

5885 reactions have no KEGG/MetaNetX/Rhea id mapped in reactions.tsv.

Among these 5800+ reactions, 1700+ are single-gene-reactions so that the names could be assigned via their gene names.

pecholleyc added the enhancement label Jun 17, 2020

haowang-bioinfo added feature discussion labels Jun 17, 2020

haowang-bioinfo assigned pecholleyc Jul 29, 2020

haowang-bioinfo mentioned this issue Nov 8, 2021

Recon ids as reaction names #333

Closed

2 tasks

haowang-bioinfo mentioned this issue Feb 5, 2022

Human 1.11 #359

Merged

mihai-sysbio added a commit that referenced this issue Feb 18, 2022

feat: add reaction names from KEGG resolves #181

caccb12

mihai-sysbio mentioned this issue Feb 18, 2022

feat: add reaction names from KEGG #367

Merged

2 tasks

mihai-sysbio mentioned this issue Apr 21, 2022

Missing reaction names in Human-GEM #385

Closed

3 tasks

haowang-bioinfo mentioned this issue Jun 19, 2022

Human 1.12 #396

Merged

edkerk mentioned this issue Mar 13, 2024

fix: standardize all names of exchange reactions #811

Merged

3 tasks

JHL-452b mentioned this issue May 8, 2024

Human 1.19 deprecated since this does not follow the guidelines #819

Closed

3 tasks

migp11 added a commit to bsc-life/Human-GEM that referenced this issue May 10, 2024

Fixing task SysBioChalmers#181

3083551

JHL-452b mentioned this issue May 30, 2024

Human 1.19 #823

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing reaction names #181

missing reaction names #181

pecholleyc commented Jun 17, 2020

haowang-bioinfo commented Jun 17, 2020 •

edited

Loading

mihai-sysbio commented Nov 8, 2021

haowang-bioinfo commented Nov 8, 2021

mihai-sysbio commented Dec 10, 2021 •

edited

Loading

mihai-sysbio commented Jan 26, 2022

haowang-bioinfo commented Mar 19, 2022

mihai-sysbio commented Mar 19, 2022

JonathanRob commented Mar 21, 2022

mihai-sysbio commented Jul 28, 2022

JonathanRob commented Jul 28, 2022

haowang-bioinfo commented Jul 28, 2022 •

edited

Loading

mihai-sysbio commented Jul 28, 2022 •

edited

Loading

haowang-bioinfo commented Nov 11, 2022

feiranl commented Nov 11, 2022

haowang-bioinfo commented Nov 11, 2022

JonathanRob commented Nov 11, 2022

feiranl commented Nov 11, 2022

mihai-sysbio commented Nov 11, 2022

haowang-bioinfo commented Nov 12, 2022

missing reaction names #181

missing reaction names #181

Comments

pecholleyc commented Jun 17, 2020

Description of the issue:

Expected feature/value/output:

Current feature/value/output:

Reproducing these results:

haowang-bioinfo commented Jun 17, 2020 • edited Loading

mihai-sysbio commented Nov 8, 2021

haowang-bioinfo commented Nov 8, 2021

mihai-sysbio commented Dec 10, 2021 • edited Loading

mihai-sysbio commented Jan 26, 2022

haowang-bioinfo commented Mar 19, 2022

mihai-sysbio commented Mar 19, 2022

JonathanRob commented Mar 21, 2022

mihai-sysbio commented Jul 28, 2022

JonathanRob commented Jul 28, 2022

haowang-bioinfo commented Jul 28, 2022 • edited Loading

mihai-sysbio commented Jul 28, 2022 • edited Loading

haowang-bioinfo commented Nov 11, 2022

feiranl commented Nov 11, 2022

haowang-bioinfo commented Nov 11, 2022

JonathanRob commented Nov 11, 2022

feiranl commented Nov 11, 2022

mihai-sysbio commented Nov 11, 2022

haowang-bioinfo commented Nov 12, 2022

haowang-bioinfo commented Jun 17, 2020 •

edited

Loading

mihai-sysbio commented Dec 10, 2021 •

edited

Loading

haowang-bioinfo commented Jul 28, 2022 •

edited

Loading

mihai-sysbio commented Jul 28, 2022 •

edited

Loading