Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Human 1.12 #396

Merged
merged 59 commits into from
Jun 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
6d1a33b
changing chebi xref of MAM00097s
ANiknejad Jan 21, 2022
8d54fb4
removing xref hmdb:HMDB0006248 from MAM00097s
ANiknejad Jan 21, 2022
3e7ae41
removing xref 'gluala' from MAM00097s
ANiknejad Jan 21, 2022
7663d97
reverting to previous state of the metabolites.tsv file without modif…
ANiknejad Jan 26, 2022
1145a72
removing xref C03740 from MAM00097, will then avoid conflict during c…
ANiknejad Jan 26, 2022
3e6380c
MAM01656 dehydrodolichol mapped to CHEBI:136960
ANiknejad Feb 3, 2022
ad00c16
correction, MAM01657 dehydrodolichol_diphosphate mapped to CHEBI:1369…
ANiknejad Feb 3, 2022
246cbc8
MAM00813p (3-ketopristanoyl-CoA) remapped to CHEBI:57291 (was mapped …
ANiknejad Feb 10, 2022
f0cf63e
fix: remove gpr for MAR04840 resolves #342
mihai-sysbio Feb 17, 2022
caa95b7
fix: mark MAR04840 as spontaneous
mihai-sysbio Feb 17, 2022
7036fc5
Merge pull request #362 from SysBioChalmers/fix/remove-gpr-MAR04840
mihai-sysbio Feb 17, 2022
b7cd3e6
refactor: DNA production/consumption in cytoplasm (#352)
PkiwiBird Feb 17, 2022
997d2e1
refactor: merge MAR08973 into MAR04474 resolves #346
mihai-sysbio Feb 18, 2022
441bc5b
refactor: RNA formation, issue #354 (#364)
PkiwiBird Feb 24, 2022
5304f56
Update metabolites.tsv
ANiknejad Feb 25, 2022
3de63a0
Update metabolites.tsv
ANiknejad Feb 25, 2022
041ad02
Update metabolites.tsv
ANiknejad Feb 25, 2022
14d2bb0
Merge branch 'develop' into develop
mihai-sysbio Feb 28, 2022
11186ac
Merge pull request #358 from ANiknejad/develop
JonathanRob Mar 1, 2022
b45e758
fix: reconcile conflict with deprecatedReactions on develop branch
JonathanRob Mar 4, 2022
46c7c9f
fix: add empty line at end of file for consistency
JonathanRob Mar 4, 2022
c1118c3
Merge branch 'develop' into refactor/merge-MAR08973-into-MAR04474
JonathanRob Mar 4, 2022
946ed63
Merge pull request #365 from SysBioChalmers/refactor/merge-MAR08973-i…
JonathanRob Mar 4, 2022
67494b9
doc: reference exchange metabolites as boundary metabolites
JonathanRob Mar 15, 2022
5b0ae08
Merge pull request #374 from SysBioChalmers/doc/tINIT_messages
haowang-bioinfo Mar 18, 2022
bd111e8
feat: replace empty reaction names with those from KEGG
mihai-sysbio Mar 20, 2022
8a569fd
Merge pull request #367 from SysBioChalmers/feat/reaction-names
mihai-sysbio Mar 21, 2022
6d26d17
fix: change to txt format metabolic task file
haowang-bioinfo Mar 22, 2022
a57cd8a
fix: flag rxns MAR05127, MAR08749, MAR08750 as spontaneous
JonathanRob Mar 25, 2022
7ec91f9
Merge pull request #379 from SysBioChalmers/fix/issue349
JonathanRob Mar 25, 2022
51f86ca
fix: Inactivate human biomass reaction (MAR13082) by default
haowang-bioinfo Mar 28, 2022
7f5bc44
fix: add proper LB and UB to generated draft model
haowang-bioinfo Mar 28, 2022
c56b4eb
feat: turn into a more generic function by adding `annPath` argument
haowang-bioinfo Mar 30, 2022
2708eea
fix: minor bugs
haowang-bioinfo Mar 30, 2022
ff109f2
Merge pull request #382 from SysBioChalmers/fix/animalGEM-code
haowang-bioinfo Apr 3, 2022
21d898e
refactor: Moved removeLowScoreGenes.m and scoreComplexModel.m from Hu…
johan-gson May 5, 2022
b0f334a
style: commented getINITModel2 as deprecated
johan-gson May 5, 2022
25eddbc
feat: Added Human-GEM related files for ftINIT. Also adapted the prev…
johan-gson May 13, 2022
b4cc6ab
fix: Changed back the getINITModel - it no longer calls something nam…
johan-gson May 13, 2022
22c6008
style: Removed the word "deprecated" from the description of the prev…
johan-gson May 21, 2022
f8b0269
fix: skip adding metabolites step if none are to be added
JonathanRob May 26, 2022
2c73fff
refactor: read subsystem in a more generic way
haowang-bioinfo Jun 3, 2022
fa7d1ec
fix: remove redundant code
haowang-bioinfo Jun 3, 2022
7bf767c
refactor: enable the option of retaining genes in the original order
haowang-bioinfo Jun 3, 2022
706db86
refactor: stop sorting the original gene list
haowang-bioinfo Jun 3, 2022
30c401f
style: Removed some commented out code
johan-gson Jun 6, 2022
f47299c
Merge branch 'ftINITHelpers' of https://github.com/SysBioChalmers/Hum…
johan-gson Jun 6, 2022
cf02955
Merge pull request #388 from SysBioChalmers/ftINITHelpers
haowang-bioinfo Jun 14, 2022
9faa6a5
refactor: speed up writing model fields
edkerk Jun 14, 2022
90c54d8
refactor: speed-up other model fields
edkerk Jun 14, 2022
922a46b
chore: yml file after importYaml & exportYaml
edkerk Jun 14, 2022
0bd19d2
refactor: another ~6 sec saved by faster concat
edkerk Jun 14, 2022
de3c7f4
refactor: avoid conversion str->num->str
edkerk Jun 14, 2022
9aea8ae
refactor: importYaml first reads whole file
edkerk Jun 14, 2022
26f095e
Update code/io/importYaml.m
edkerk Jun 15, 2022
3ff015a
Update code/io/importYaml.m
edkerk Jun 15, 2022
f8f2927
Merge pull request #392 from SysBioChalmers/refactor/importYaml
haowang-bioinfo Jun 15, 2022
f2d5352
Merge remote-tracking branch 'origin/develop' into refactor/importYam…
edkerk Jun 15, 2022
287d130
Merge pull request #393 from SysBioChalmers/refactor/importYamlSpeed
mihai-sysbio Jun 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions code/GPRs/getGenesFromGrRules.m
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
function [genes,rxnGeneMat] = getGenesFromGrRules(grRules)
function [genes,rxnGeneMat] = getGenesFromGrRules(grRules, originalGenes)
%getGenesFromGrRules Extract gene list and rxnGeneMat from grRules array.
%
% USAGE:
%
% [genes,rxnGeneMat] = getGenesFromGrRules(grRules);
% [genes,rxnGeneMat] = getGenesFromGrRules(grRules, originalGenes);
%
% INPUTS:
%
Expand All @@ -12,6 +12,7 @@
% NOTE: Boolean operators can be text ("and", "or") or
% symbolic ("&", "|"), but there must be a space
% between operators and gene names/IDs.
% originalGenes The original gene list from the model as reference
%
% OUTPUTS:
%
Expand All @@ -24,6 +25,11 @@
%


% handle input arguments
if nargin < 2
originalGenes = [];
end

% check if the grRules use written or symbolic boolean operators
if any(contains(grRules,{'&','|'}))
% fix some potential missing spaces between parentheses and &/|
Expand All @@ -50,6 +56,14 @@
nonEmpty = ~cellfun(@isempty,rxnGenes);
genes = unique([rxnGenes{nonEmpty}]');

if ~isempty(originalGenes)
if ~isequal(sort(originalGenes), sort(genes))
error('The grRules and original gene list are inconsistent!');
else
genes = originalGenes;
end
end

% construct new rxnGeneMat (if requested)
if nargout > 1
rxnGeneCell = cellfun(@(rg) ismember(genes,rg),rxnGenes,'UniformOutput',false);
Expand Down
10 changes: 8 additions & 2 deletions code/addBoundaryMets.m
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,15 @@
% add new boundary mets to the model
metsToAdd.mets = add_bound_met_IDs;
metsToAdd.metNames = add_bound_mets;
metsToAdd.compartments = 'b';
metsToAdd.compartments = repmat({'b'}, size(add_bound_mets));
metsToAdd.unconstrained = ones(size(add_bound_mets));
new_model = addMets(model,metsToAdd);
if ~isempty(add_bound_met_IDs)
new_model = addMets(model,metsToAdd);
else
fprintf('No Boundary metabolites were added to the model!\n');
new_model = model;
return
end

% now add the boundary mets to the model S-matrix
S = new_model.S;
Expand Down
47 changes: 29 additions & 18 deletions code/annotateGEM.m
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
function annModel = annotateGEM(model,annType,addMiriams,addFields,overwrite)
function annModel = annotateGEM(model,annPath,annType,addMiriams,addFields,overwrite)
% Add reaction, metabolite, and/or gene annotation to a model.
%
% Input:
%
% model Model structure.
%
% annPath Path to the annotation files, which suppose to be named as
% 'reactions.tsv', 'metabolites.tsv', and 'genes.tsv'。
%
% annType String or cell array of strings specifying the type(s) of
% annotation data to add: 'rxn', 'met', and/or 'gene'. To
% add all annotation types, use 'all'.
Expand Down Expand Up @@ -35,27 +38,32 @@
%
% Usage:
%
% annModel = annotateGEM(model,annType,addMiriams,addFields,overwrite);
% annModel = annotateGEM(model,annPath,annType,addMiriams,addFields,overwrite);
%


%% Inputs and setup

if nargin < 2 || isempty(annType) || strcmpi(annType,'all')
if nargin < 2
[ST, I] = dbstack('-completenames');
annPath = strcat(fileparts(ST(I).file),'/../model');
end

if nargin < 3 || isempty(annType) || isequal(annType,'all')
annType = {'rxn','met','gene'};
elseif ~all(ismember(annType,{'rxn','met','gene','reaction','metabolite'}))
error('annType input(s) not recognized. Valid options are "rxn", "met", and/or "gene", or "all"');
end

if nargin < 3 || isempty(addMiriams)
if nargin < 4 || isempty(addMiriams)
addMiriams = true;
end

if nargin < 4 || isempty(addFields)
if nargin < 5 || isempty(addFields)
addFields = true;
end

if nargin < 5
if nargin < 6
overwrite = true;
end

Expand Down Expand Up @@ -96,9 +104,7 @@

% load reaction annotation data
if any(ismember({'rxn','reaction'},lower(annType)))
[ST, I] = dbstack('-completenames');
path = fileparts(ST(I).file);
tmpfile = fullfile(path,'../model','reactions.tsv');
tmpfile = fullfile(annPath,'reactions.tsv');
rxnAssoc = importTsvFile(tmpfile);

% strip "RHEA:" prefix from Rhea IDs since it should not be included in
Expand All @@ -119,9 +125,7 @@

% load metabolite annotation data
if any(ismember({'met','metabolite'},lower(annType)))
[ST, I] = dbstack('-completenames');
path = fileparts(ST(I).file);
tmpfile = fullfile(path,'../model','metabolites.tsv');
tmpfile = fullfile(annPath,'metabolites.tsv');
metAssoc = importTsvFile(tmpfile);

% ChEBI IDs should be of the form "CHEBI:#####"
Expand All @@ -140,9 +144,7 @@

% load and organize gene annotation data
if ismember('gene',lower(annType))
[ST, I] = dbstack('-completenames');
path = fileparts(ST(I).file);
tmpfile = fullfile(path,'../model','genes.tsv');
tmpfile = fullfile(annPath,'genes.tsv');
geneAssoc = importTsvFile(tmpfile);

% add geneEnsemblID field if missing
Expand Down Expand Up @@ -275,9 +277,18 @@

% get fields and their types
f = fieldnames(allAssoc);
fieldType = repmat({'rxn'}, numel(f), 1);
fieldType(ismember(f, fieldnames(metAssoc))) = {'met'};
fieldType(ismember(f, fieldnames(geneAssoc))) = {'gene'};

if ~isempty(rxnAssoc)
fieldType = repmat({'rxn'}, numel(f), 1);
end

if ~isempty(metAssoc)
fieldType(ismember(f, fieldnames(metAssoc))) = {'met'};
end

if ~isempty(geneAssoc)
fieldType(ismember(f, fieldnames(geneAssoc))) = {'gene'};
end

% add individual ID fields to the model
for i = 1:numel(f)
Expand Down
68 changes: 68 additions & 0 deletions code/curateReactionNames.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""Fetch Human-GEM reaction names from KEGG
Original file is located at
https://colab.research.google.com/drive/17X0Qx0H4pwjZjLLWHnpp5ac2daH9hOxs
"""

import requests
import re
import yaml
import pandas

"""Get all the KEGG reactions via their API, and save the result to a file."""

KEGG_REACTIONS = 'kegg_reactions.txt'
HG_YAML = '../model/Human-GEM.yml'
F_YAML = '../model/curated-Human-GEM.yml'

with open(KEGG_REACTIONS,'w') as f:
r = requests.get('http://rest.kegg.jp/list/reaction/')
f.write(r.text)

"""Extract the KEGG reactions as key-value pairs."""

raw_reactions = open(KEGG_REACTIONS, 'r')
raw_reaction_lines = raw_reactions.readlines()

reaction_id = re.compile('(?:^rn\:)(R\d+)')
reaction_name = re.compile('(?:\t)([^;]+)(?:;)')
kegg_reactions = {}
for line in raw_reaction_lines:
try:
kegg_reactions[reaction_id.search(line).group(1)] = reaction_name.search(line).group(1)
except:
kegg_reactions[reaction_id.search(line).group(1)] = ''
# print(kegg_reactions[])

"""Fetch Human-GEM reactions from the TSV annotation."""

hg_annotation = pandas.read_csv('../model/reactions.tsv', sep='\t', index_col=0)

""" Traverse the YAML, and for each line that looks like a reaction definition, extract the reaction identifier, and get the matching KEGG id. Then, change the next line that contains the reaction name to the name provided by KEGG."""

with open(HG_YAML, 'r') as inputf:
with open(F_YAML, 'w') as outputf:
count = 0
count_blank = 0
while True:
reaction_id = re.compile('(?:^ - id: ")(MAR\d+)')
reaction_name = re.compile('(?:^ - name: ")()("$)')
try:
line = inputf.readline()
r_id = reaction_id.search(line).group(1)
outputf.write(line)
line = inputf.readline()
r_name = reaction_name.search(line).group(1)
kegg_id = hg_annotation.loc[r_id]['rxnKEGGID']
if kegg_id and r_name == "":
if kegg_reactions[kegg_id] == "":
count_blank = count_blank + 1
else:
line = ' - name: "' + kegg_reactions[kegg_id] + '"\n'
count = count + 1
except:
None
outputf.write(line)
if not line:
break
print('Reaction names adopted from KEGG: ' + str(count))
print('Blank names also blank in KEGG: ' + str(count_blank))
10 changes: 5 additions & 5 deletions code/gapfill4EssentialTasks.m
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
% load metabolic task for growth under Ham's media
[ST, I] = dbstack('-completenames');
path = fileparts(ST(I).file);
essentialTasks = fullfile(path,'../data/metabolicTasks','metabolicTasks_Essential.xlsx');
essentialTasks = fullfile(path,'../data/metabolicTasks','metabolicTasks_Essential.txt');
taskStruct = parseTaskList(essentialTasks);
%taskStruct = taskStruct(end);

Expand Down Expand Up @@ -120,10 +120,10 @@

outputModel = inputModel;

% block all biomass equations
%ind = find(startsWith(outputModel.rxns,'biomass'));
%outputModel.ub(ind) = 0;
%outputModel.lb(ind) = 0;
% block human biomass equations
ind = find(strcmp(outputModel.rxns,'MAR13082'));
outputModel.ub(ind) = 0;
outputModel.lb(ind) = 0;
outputModel.c(:) = 0;

% reset object function to "biomass_components"
Expand Down
3 changes: 2 additions & 1 deletion code/getModelFromOrthology.m
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@
templateModel.description = '';
templateModel.version = '';
templateModel.annotation = structfun(@(x) '',templateModel.annotation,'UniformOutput',0);

templateModel.annotation.defaultLB = -1000;
templateModel.annotation.defaultUB = 1000;

% find the index of non-empty grRules before replacing genes
preNonEmptyRuleInd = find(~cellfun(@isempty, templateModel.grRules));
Expand Down
Loading