Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add reaction names from KEGG #367

Merged
merged 1 commit into from
Mar 21, 2022
Merged

Conversation

mihai-sysbio
Copy link
Member

@mihai-sysbio mihai-sysbio commented Feb 18, 2022

Main improvements in this PR:

This PR resolves #181 by taking 2000 reaction names from KEGG (the first supplied). The work was done fully scripted on Google Colab and also moved to the /code folder.

I hereby confirm that I have:

  • Tested my code on my own computer for running the model
  • Selected develop as a target branch

@JonathanRob
Copy link
Collaborator

it seems that only 2 of these reactions had names before

I compared the rxnNames fields for the model on develop and on this branch, and found more than 2 reactions whose names had been overwritten, where one case was a deletion of the name.

rxn rxnName (original) rxnName (new)
MAR04606 Formation of homocarnosine L-histidine:4-aminobutanoate ligase (ADP-forming)
MAR04607 Hydrolysis of homocarnosine alpha-Aminobutyryl histidine hydrolase
MAR05387 Acireductone Synthase 5-(methylthio)-2,3-dioxopentyl-phosphate phosphohydrolase (isomerizing)
MAR05389 Acireductone Dixoygenase 1 1,2-dihydroxy-5-(methylthio)pent-1-en-3-one:oxygen oxidoreductase (formate-forming)
MAR01940 Formation of 21-hydroxypregnenolone pregnenolone,NADPH-hemoprotein reductase:oxygen oxidoreductase (21-hydroxylating)
MAR01941 Formation of 11-deoxycorticosterone 21-Hydroxypregnenolone:NAD+ 3-oxidoreductase
MAR02041 17-beta-estradiol 17-dehydrogenase Estradiol-17beta:NADP+ 17-oxidoreductase
MAR01317 NAD Dependent 11-Hydroxythromboxane B2 Dehydrogenase
MAR09798 Formation of Alanylalanine D-alanine:D-alanine ligase (ADP-forming)
MAR01044 Ferrochelatase, Mitochondrial protoheme ferro-lyase (protoporphyrin-forming)

It would be good to double check these to ensure they make sense, as well as spot-check several other reactions with new name assignments to confirm they're OK.

@mihai-sysbio
Copy link
Member Author

mihai-sysbio commented Feb 23, 2022

Interesting you found more conflicts @JonathanRob, I'm not sure how I missed them.

This conflict between curated data and database-fetched data is a recurring problem. My stance here is to not keep track of manual curations done on top of database-fetched data, so I'm going to propose a different approach.

Instead of overwriting reaction names, I can edit the script to only fill in reaction names where none existed, and push a new commit. With this approach we avoid the need to double-check the conflicts; only the spot-checking would need to be done.

@haowang-bioinfo
Copy link
Member

Instead of overwriting reaction names, I can edit the script to only fill in reaction names where none existed, and push a new commit. With this approach we avoid the need to double-check the conflicts; only the spot-checking would need to be done.

this seems a decent solution, would be good to upload the code as well

@mihai-sysbio
Copy link
Member Author

The force-pushed commit c5ec9fe overwrites the previous work. Now, only empty reaction names are filled in. The Python script is also attached. Given the changes, I'm re-requesting reviews.

@mihai-sysbio mihai-sysbio mentioned this pull request Mar 19, 2022
3 tasks
@haowang-bioinfo
Copy link
Member

@mihai-sysbio thanks for providing the script, whose output curated-Human-GEM.yml has exactly the same content as Human-GEM.yml after running.

However, the code still reports Modified reactions: 385. Not sure the reason but there should be no more modifications. Could you please check this out and maybe add more comments to the code for easier understanding and reviewing.

@mihai-sysbio
Copy link
Member Author

The modified reaction increment is counting the attempts where a reaction with no name ("") is replaced with the name for the associated KEGG id from reactions.tsv. 385 indicates there are still quite a few cases with reactions that don't have a name even in KEGG. I guess I could change the output so as to more clearly indicate this, but as you note @haowang-bioinfo this wouldn't have any practical implications, since:

output curated-Human-GEM.yml has exactly the same content as Human-GEM.yml after running

My approach has been that of a one-time curation, so I am in favor of not over-engineering the script. If we are aiming for turning this into a scripted, recurring curation, I would prefer to have a different issue, since the parsing of the yml is a bit clunky anyway.

@haowang-bioinfo
Copy link
Member

haowang-bioinfo commented Mar 20, 2022

385 indicates there are still quite a few cases with reactions that don't have a name even in KEGG. I guess I could change the output so as to more clearly indicate this

That would be good. Agree to not over-engineer

@mihai-sysbio
Copy link
Member Author

That would be good. Agree to not over-engineering

@haowang-bioinfo I've updated the script that now outputs (on the second run):

Reaction names adopted from KEGG: 0
Blank names also blank in KEGG: 385

The latest commit bd111e8 is force-pushed because it's now rebased on develop, making for an easy merge without needing to resolve conflicts.

Copy link
Member

@haowang-bioinfo haowang-bioinfo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the script that now outputs (on the second run)

thanks

@mihai-sysbio mihai-sysbio merged commit 8a569fd into develop Mar 21, 2022
@mihai-sysbio mihai-sysbio deleted the feat/reaction-names branch March 21, 2022 08:45
@mihai-sysbio mihai-sysbio mentioned this pull request Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants