Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GECKO 3.0.2 #293

Merged
merged 44 commits into from
Mar 21, 2023
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
49aba84
fix: correct update prot pool
ae-tafur Mar 8, 2023
a20eae3
feat: if not ptot defined, display value used
ae-tafur Mar 8, 2023
bf61e86
fix: applyKcatConstraints if kcat = NaN, cost = 0
edkerk Mar 9, 2023
0b6b3cf
fix: readDLKcatOutput deal with non-numeric kcat
edkerk Mar 9, 2023
45192bd
fix: protocol.m uncomment runDLKcat
edkerk Mar 9, 2023
88c2636
fix: getECfromDatabase only change ecRxns entries
edkerk Mar 15, 2023
9fcccfd
fix: protocol.m applyComplexData without complexInfo
edkerk Mar 15, 2023
c118232
feat: getECfromGEM improper EC number output
edkerk Mar 15, 2023
99676c6
doc: explain repeated applyKcatConstraints runs
edkerk Mar 15, 2023
51342fc
fix: remove duplicate lines
ae-tafur Mar 15, 2023
b57e234
feat: return total protein in proteomics.tsv
ae-tafur Mar 15, 2023
36286e5
fix: protein pool in proteomics integration
ae-tafur Mar 15, 2023
6e4e684
style: remove unused variable
ae-tafur Mar 15, 2023
70b93d3
Merge pull request #282 from SysBioChalmers/fix/getEC
edkerk Mar 17, 2023
9fa23b3
fix: mention correct proteomics units
edkerk Mar 17, 2023
95a5f42
feat: updateProtPool explanation of in- and output
edkerk Mar 17, 2023
44f3b61
fix: updateProtPool correct unmeasured enzyme cont
edkerk Mar 17, 2023
1e71a4f
feat: updateProtPool try relaxing sigma factor
edkerk Mar 17, 2023
b2e366d
fix: findMetSmiles also store failed queries
edkerk Mar 17, 2023
227b2be
doc: protocol.m f-factor and setProtPoolSize
edkerk Mar 17, 2023
34ec6e7
Apply suggestions from code review
edkerk Mar 18, 2023
5452e31
fix: sensitivityTuning check if model can solve
edkerk Mar 18, 2023
9366bf2
doc: correct protein pool unit in protocol.m
edkerk Mar 18, 2023
ef8d282
chore: rerun protocol.m
edkerk Mar 18, 2023
05df810
fix: protocol.m minimize protein pool usage
edkerk Mar 20, 2023
42b2ed3
Merge pull request #284 from SysBioChalmers/fix/protocol
edkerk Mar 20, 2023
5ad2770
Apply suggestions from code review
edkerk Mar 20, 2023
c0a60ca
Merge pull request #283 from SysBioChalmers/fix/misc
edkerk Mar 20, 2023
a70811a
doc: clarify minimization of protein pool exchange
edkerk Mar 20, 2023
b51926d
feat: getStandardKcat take custom nonEnzymeRxns
edkerk Mar 20, 2023
73aa43c
fix: getStandardKcat ec.source output
edkerk Mar 20, 2023
f92693e
fix: getStandardKcat removes old standard kcats
edkerk Mar 20, 2023
62f2ed9
refactor: rename to pseudoRxns.tsv
edkerk Mar 20, 2023
10da2da
fix: if organism is in KEGG multiple times
edkerk Mar 20, 2023
2f2036c
Merge pull request #276 from SysBioChalmers/fix/updateProtPool
edkerk Mar 20, 2023
75064ce
fix: error if any match in customKcats
ae-tafur Mar 20, 2023
0a18754
Merge pull request #292 from SysBioChalmers/fix/customKcats
edkerk Mar 20, 2023
808f740
Merge pull request #289 from SysBioChalmers/fix/fuzzyKcatMatching
edkerk Mar 20, 2023
5c70dd6
Merge remote-tracking branch 'origin/develop' into fix/standardKcat
edkerk Mar 20, 2023
6c13a8d
Update applyKcatConstraints.m
edkerk Mar 20, 2023
0dfbffd
Merge pull request #288 from SysBioChalmers/fix/standardKcat
edkerk Mar 20, 2023
7d4db31
fix: resolves issues raised in #288
edkerk Mar 21, 2023
886a15e
doc: fuzzyKcatMatching matching org_name in KEGG
edkerk Mar 21, 2023
e0df921
Merge pull request #295 from SysBioChalmers/fix/standardKcat
edkerk Mar 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 65 additions & 21 deletions protocol.m
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,9 @@
% (https://www.ebi.ac.uk/complexportal/), its data is included in
% ecYeastGEM.
complexInfo = getComplexData(); % No need to run, as data is already in adapter folder
[ecModel, foundComplex, proposedComplex] = applyComplexData(ecModel, complexInfo);
[ecModel, foundComplex, proposedComplex] = applyComplexData(ecModel);
% complexInfo can be given as second input, but not needed here, as it will
% read the file that was written by getComplexData

% STEP 5 Store model in YAML format
saveEcModel(ecModel,ModelAdapter,'yml','ecYeastGEMfull');
Expand Down Expand Up @@ -99,7 +101,7 @@
% DLKcat, while the output file is read back into MATLAB.
writeDLKcatInput(ecModel);
% runDLKcat will run the DLKcat algorithm via a Docker image
%runDLKcat();
runDLKcat();
kcatList_DLKcat = readDLKcatOutput(ecModel);

% STEP 8 Combine kcat from BRENDA and DLKcat
Expand All @@ -115,28 +117,66 @@
% summarized under /data/customKcats.tsv, and applied here.
[ecModel, rxnUpdated, notMatch] = applyCustomKcats(ecModel);
% To modify the S-matrix, to actually implement the kcat/MW constraints,
% you run applyKcatConstraints.
% applyKcatConstraints should be run. This takes the values from
% ecModel.ec.kcat, considers any complex/subunit data that is tracked in
% ecModel.ec.rxnEnzMat, together with the MW in ecModel.ec.mw, and uses
% this to modify the enzyme usage coefficients directly in ecModel.S. Any
% time a change is made to the .kcat, .rxnEnzMat or .mw fields, the
% applyKcatConstraints function should be run again to reapply the new
% constraints onto the metabolic model.
ecModel = applyKcatConstraints(ecModel);

% STEP 11 Get kcat values across isoenzymes
ecModel = getKcatAcrossIsoenzymes(ecModel);

% STEP 12 Get standard kcat
% Assign an enzyme cost to reactions without gene assocation (except
% exchange, transport and pseudoreactions)
% Assign an enzyme cost to reactions without gene assocation. These
% reactions are identified as those with empty entry in ecModel.grRules.
% The following reactions are exempted:
% A Exchange reactions: exchanging a metabolite across the model boundary,
% not representing a real enzymatic reaction.
% B Transport reactions: transporting a metabolite with the same name from
% one compartment to another. Real transport reactions should already be
% annotated with grRules, so that the remaining non-annotated reactions
% are mostly representing diffusion or pseudotransport processes such as
% vesicles moving from ER to Golgi. While proteins are involved in such
% processes, they are not catalyzed by enzymes.
% C Pseudoreactions: any other reaction that should be considered to be
% catalyzed by an enzyme. getStandardKcat recognizes these from the
% reaction name contaning "pseudoreaction".
% D Custom list of non-enzyme reactions: if the above approaches does not
% correctly identify all non-enzyme reactions that should be ignored by
% getStandardKcat, /data/pseudoRxns.tsv can be specified in adapter
% folder.

[ecModel, rxnsMissingGPR, standardMW, standardKcat] = getStandardKcat(ecModel);

% STEP 13 Apply kcat constraints from ecModel.ec.kcats to ecModel.S
% This function can be run at any point to re-apply the kcat contraints on
% the model. It also considers the complex data
% STEP 13 Apply kcat constraints from ecModel.ec.kcat to ecModel.S
% As the above functions have modified ecModel.ec.kcat,
% applyKcatConstraints is rerun as explained in step 11.
ecModel = applyKcatConstraints(ecModel);

% STEP 14 Set upper bound of protein pool
% Calculate f-factor (how much proteins are enzymes), based on paxDB.tsv
% that is provided for the model. Other proteomics data can also be used,
% while an alternative approximation of 0.5 is typically quite realistic.
f = calculateFfactor(ecModel);
ecModel = setProtPoolSize(ecModel,[],f);
% The protein pool exchange is constrained by the total protein content
% (Ptot), multiplied by the f-factor (ratio of enzymes/proteins) and the
% sigma-factor (how saturated enzymes are on average: how close to their
% Vmax to they function based on e.g. metabolite concentrations). In
% modelAdapter Ptot, f- and sigma-factors can all be specified (as rough
% estimates, 0.5 for each of the three parameters is reasonable).
Ptot = params.Ptot;
f = params.f;
sigma = params.sigma;
% But these values can also be defined separately. The f-factor can be
% calculated from quantitative proteomics data, for instance with data that
% is available via PAXdb (https://pax-db.org/). calculateFfactor can be used to estimate the f-factor.
%f = calculateFfactor(ecModel); % Optional

ecModel = setProtPoolSize(ecModel,Ptot,f,sigma);

% Note that at a later stage (after stage 3), the sigma factor be further
% adjusted with sigmaFitter, to get a model that is able to reach a
% particular maximum growth rate. This will not be done here, as we first
% need to tune the kcat values in Stage 3.

%% STAGE 3: model tuning
% Test whether the model is able to reach maximum growth if glucose uptake
Expand Down Expand Up @@ -204,7 +244,7 @@

%% STAGE 5: simulation and analysis
% If starting from here, load some basic assets
ecModel = loadEcModel('ecYeastGEMfull.yml',ModelAdapter);
ecModel = loadEcModel('ecYeastGEMfull.yml');
modelY = loadConventionalGEM();
fluxData = loadFluxData;

Expand All @@ -216,7 +256,7 @@
% % Set the objective function to maximize reaction biomassRxn
% ecModel = setParam(ecModel,'obj','r_4041',1);
% % Set the objective function to minimize protein usage
% ecModel = setParam(ecModel,'obj','prot_pool_exchange',-1);
% ecModel = setParam(ecModel,'obj','prot_pool_exchange',1);
% % Perform flux balance analysis (FBA)
% sol = solveLP(ecModel);
% % Perform parsimonious FBA (minimum total flux)
Expand All @@ -234,9 +274,13 @@
disp(['Growth rate reached: ' num2str(abs(sol.f))])
% Set growth lower bound to 99% of the previous value
ecModel = setParam(ecModel,'lb',params.bioRxn,0.99*abs(sol.f));
ecModel = setParam(ecModel,'obj','prot_pool_exchange',-1);
% Minimize protein pool usage. As protein pool exchange is defined in the
% reverse direction (with negative flux), minimization of protein pool
% usage is computationally represented by maximizing the prot_pool_exchange
% reaction.
ecModel = setParam(ecModel,'obj','prot_pool_exchange',1);
sol = solveLP(ecModel)
disp(['Minimum protein pool usage: ' num2str(abs(sol.f)) ' ug/gDCW'])
disp(['Minimum protein pool usage: ' num2str(abs(sol.f)) ' mg/gDCW'])

% STEP 23 Compare fluxes from ecModel and starting model
% Constrain with the same conditions to model and ecModel. We now fix the
Expand All @@ -245,8 +289,8 @@
% We know that growth can only reach 0.088, so use this instead of 0.1.
fluxData.grRate(1) = 0.088;
ecModel = constrainFluxData(ecModel,fluxData,1,'min',5);
% Minimize protein pool usage.
ecModel = setParam(ecModel,'obj','prot_pool_exchange',-1);
% Minimize protein pool usage.
ecModel = setParam(ecModel,'obj','prot_pool_exchange',1);
solEC = solveLP(ecModel,1)

% Apply (almost) the same to non-ecModel. Same constraints on fluxes, but
Expand Down Expand Up @@ -281,15 +325,15 @@
[minFluxY, maxFluxY] = ecFVA(modelY, modelY);

% Write results to output file
output = [modelY.rxns, modelY.rxnNames, num2cell([minFluxY, maxFluxY, minFlux, maxFlux])]';
output = [modelY.rxns, modelY.rxnNames, num2cell([minFluxY, maxFluxY, minFluxEc, maxFluxEc])]';
fID = fopen(fullfile(params.path,'output','ecFVA.tsv'),'w');
fprintf(fID,'%s %s %s %s %s %s\n','rxnIDs', 'rxnNames', 'minFlux', ...
'maxFlux', 'ec-minFlux', 'ec-maxFlux');
fprintf(fID,'%s %s %g %g %g %g\n',output{:});
fclose(fID);

% Look at flux ranges to indicate reaction-level variability
fluxRange = maxFlux - minFlux;
fluxRange = maxFluxEc - minFluxEc;
fluxRangeY = maxFluxY - minFluxY;

% Plot variability distributions of both models in 1 plot
Expand Down
6 changes: 5 additions & 1 deletion src/geckomat/change_model/applyCustomKcats.m
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,11 @@
end

% Apply the new kcat values to the model
model = applyKcatConstraints(model, rxnToUpdate);
if ~isempty(find(rxnToUpdate, 1))
model = applyKcatConstraints(model, rxnToUpdate);
else
disp('No matches found. Consider checking the IDs or proteins in customKcats.')
end

rxnUpdated = model.ec.rxns(find(rxnToUpdate));

Expand Down
10 changes: 7 additions & 3 deletions src/geckomat/change_model/applyKcatConstraints.m
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,11 @@
updateRxnsIds = convertCharArray(updateRxns);
updateRxns = ismember(rxns,updateRxnsIds);
end


if isempty(find(updateRxns, 1)) || isempty(updateRxns)
error('No reaction to update or updateRxns is logical but without any true value')
end

if ~isfield(model,'ec')
error(['No model.ec structure could be found: the provided model is'...
' not a valid GECKO3 ecModel. First run makeEcModel(model).'])
Expand Down Expand Up @@ -73,11 +77,11 @@
end
newKcats(kcatLast+1:end,:)=[];

sel = newKcats(:,4) ~= 0; %assign zero cost instead of inf when kcat == 0
sel = newKcats(:,4) > 0; %Only apply to non-zero kcat
newKcats(sel,4) = newKcats(sel,4) * 3600; %per second -> per hour
newKcats(sel,4) = newKcats(sel,5) ./ newKcats(sel,4); %MW/kcat
newKcats(sel,4) = newKcats(sel,3) .* newKcats(sel,4); %Multicopy subunits.
newKcats(~sel,4) = 0;
newKcats(~sel,4) = 0; %Results in zero-cost

%Replace rxns and enzymes with their location in model
[~,newKcats(:,1)] = ismember(model.ec.rxns(newKcats(:,1)),model.rxns);
Expand Down
27 changes: 15 additions & 12 deletions src/geckomat/change_model/findMetSmiles.m
Original file line number Diff line number Diff line change
Expand Up @@ -66,32 +66,35 @@
%keep the first suggestion.
smileResult = regexp(smileResult,'(^\S*)\n','once','tokens');
uniqueSmiles{i,1} = smileResult{1,1};
% Append one line each time, in case internet connection is lost
% halfway. Open & close file each time to avoid leaving the file
% open when breaking the function.
out = [uniqueNames(i), uniqueSmiles(i)];
fID = fopen(smilesDBfile,'a');
fprintf(fID,'%s\t%s\n',out{:});
fclose(fID);
break % success? look at next metabolite
retry = 15; % success: no retry
catch exception
%Sometimes the call fails, for example since the server is busy. In those cases
%we will try 10 times. Some errors however are because the metabolite
%name does no exist in the database (404) or some other error (the metabolite contains
%a slash or similar, 400 or 500). In those cases we can
%immediately give up.
if (strcmp(exception.identifier, 'MATLAB:webservices:HTTP404StatusCodeError') || ...
strcmp(exception.identifier, 'MATLAB:webservices:HTTP400StatusCodeError') || ...
strcmp(exception.identifier, 'MATLAB:webservices:HTTP500StatusCodeError'))
break
strcmp(exception.identifier, 'MATLAB:webservices:HTTP400StatusCodeError') || ...
strcmp(exception.identifier, 'MATLAB:webservices:HTTP500StatusCodeError'))
uniqueSmiles(i) = {''};
retry = 15;
else
retry = retry + 1;
end
end
if retry == 10
error('Cannot reach PubChem. Check your internet connection and try again.')
end
end
% Append one line each time, in case internet connection is lost
% halfway. Open & close file each time to avoid leaving the file
% open when breaking the function.
out = [uniqueNames(i), uniqueSmiles(i)];
fID = fopen(smilesDBfile,'a');
fprintf(fID,'%s\t%s\n',out{:});
fclose(fID);
end
if verbose;
if verbose
fprintf('\b\b\b\b\b\b\b\b\b\b\b\b\bdone.\n');
fprintf('Model-specific SMILES database stored at %s\n',smilesDBfile);
end
Expand Down
2 changes: 0 additions & 2 deletions src/geckomat/change_model/makeEcModel.m
Original file line number Diff line number Diff line change
Expand Up @@ -105,15 +105,13 @@
end
end
params = modelAdapter.getParameters();
compartmentID = params.enzyme_comp;
compartmentID = strcmp(model.compNames,params.enzyme_comp);
if ~any(compartmentID)
error(['Compartment ' params.enzyme_comp ' (specified in params.enzyme_comp) '...
'cannot be found in model.compNames'])
end
compartmentID = model.comps(compartmentID);


if geckoLight
ec.geckoLight=true;
else
Expand Down
4 changes: 3 additions & 1 deletion src/geckomat/gather_kcats/fuzzyKcatMatching.m
Original file line number Diff line number Diff line change
Expand Up @@ -497,7 +497,9 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function org_index = find_inKEGG(org_name,names)
org_index = find(strcmpi(org_name,names));
if isempty(org_index)
if numel(org_index)>1
org_index = org_index(1);
elseif isempty(org_index)
i=1;
while isempty(org_index) && i<length(names)
str = names{i};
Expand Down
Loading