-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathHow_to_apply_BayModTS.py
226 lines (177 loc) · 10.9 KB
/
How_to_apply_BayModTS.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
# ----------------------------------------------------------------------------------------------------------------
## How to apply BayModTS - walk through
# ----------------------------------------------------------------------------------------------------------------
"""
In this notebook we walk through the complete BayModTS pipline. The notebook contains all steps needed to apply
it to your personal dataset. We use the Midazolam steatosis data with three different conditions (control, 2wks, 4wks),
that is based on excel sheets from the lab.
The Midazolam raw data set is published on [FairdomHub](https://fairdomhub.org/documents/4067?version=1)
(DOI: 10.1038/s41598-022-26483-6)
Measurement files are loaded from a "measurement_files" subfolder and results are stored in an "results" subfolder,
so please make sure they exist in your executing directory.
Note: It is highly recommended to use pure python files for execution of the BayModTS workflow as this allows
for easy parallelization of the ensemble creation.
"""
# ----------------------------------------------------------------------------------------------------------------
# ----------------------------------------------------------------------------------------------------------------
# ----------------------------------------------------------------------------------------------------------------
### Creation of PEtab measurements file
# ----------------------------------------------------------------------------------------------------------------
"""
First our data has to be converted to the PEtab measurement format. Information about the PEtab format can be found
at their [docs](https://petab.readthedocs.io/en/latest/).
Here, I given an example on how to create a PEtab measurement table that contains all different experimental conditions
we are interested in. For this demonstration, the curated Midazolam raw data is used in an excel format that has all
relevant information on one sheet (2021-09-15_Midazolam_Raw_data.xlsx).
The PEtab problem consists of a (1) Model (here RTF), (2) observables file, (3) experimental conditions file,
(4) parameter table, (5) measurements file, and (6) yaml file that maps the afore mentined files to one problem.
If the [Retarded Transient Functions]( https://doi.org/10.3389/fphy.2020.00070) (C. Kreutz) are used as a general
description model in BayModTS, (1) to (3) of the BayModTS core can be used.
In this walk-through we use the RTF. Further, the PEtab_files class is used to create a project specific PEtab
measurement table and parameter table file.
If a complete PEtab problem was already created based on the PEtab docs, this step can be skipped an proceeded
to the inference step.
"""
# ----------------------------------------------------------------------------------------------------------------
# ----------------------------------------------------------------------------------------------------------------
# Initialize condition names and path
import os
import pandas as pd
import numpy as np
from pathlib import Path
model_name = 'rtf'
Conditions = ['Control', '2Wks', '4Wks']
p = os.path.join(Path(__file__).parent, 'baymodts/walk_through/') # path for execution
path_to_measurement_files = os.path.join(p, 'measurement_files/')
# load curated experimental data
midazolam = pd.read_excel(f'{p}2021-09-15_Midazolam_Raw_data.xlsx',
sheet_name='curated')
data_dict = {'midazolam': midazolam}
class PEtab_files:
"""Creates measurement and parameter file of PEtab from experimental data.
To use this class exploits the data structure in the midazolam excel.
For other data structures it must be adapted.
Inputs:
dataframe = dataframe object (loaded with pandas from excel)
df_name = name of the dataframe object (str)
condition = experimental condition, group of the dataframe
here: (control, 2Wks, 4Wks)
"""
def __init__(self, df_name: str, condition: str):
self.df = data_dict[df_name]
self.df_name = df_name
self.condition = condition
self.number_of_replicates = len(self.df[self.df['Group'] == self.condition].index)
self.t = [0, 0.167, 0.5, 1, 1.5, 2, 2.5, 3, 4, 6] # time of samples
self.column_names = ['t0', 't0.167', 't0.5hn', 't1h', 't1.5h',
't2h', 't2.5h', 't3h', 't4h', 't6h']
# take only the data from specified condition
self.df_cond = self.df[self.df['Group'] == self.condition]
# clean dataframe (only leave values)
self.df_cond = self.df_cond.drop(columns=['Column1', 'Group', 'diet'])
def create_measurement_df(self, path):
"""Create dataframe in the structure of PEtab
Input:
path: path to measurement files
Return:
measurement_df: PEtab measurement table (tsv)
"""
observable_array = ['y_obs' for i in range(self.number_of_replicates * len(self.t))]
simulation_cond_array = ['condition1' for i in range(self.number_of_replicates * len(self.t))]
noiseParameters_array = ['sd1_y_obs' for i in range(self.number_of_replicates * len(self.t))]
replicateId_array = [timepoint for timepoint in range(len(self.t)) for replicate in range(self.number_of_replicates)]
# time array for all replicates
tn = np.repeat(self.t, self.number_of_replicates)
# concatenate measurements of the replicates to one array
measurements_array = []
# take for each time point all values of the replicates
for i in self.column_names:
i = self.df_cond.loc[:, i].to_numpy()
measurements_array.append(i)
# convert array of arrays to one array
self.measurements_array = np.concatenate(measurements_array, axis=0)
measurement_dict = {'observableId': observable_array,
'simulationConditionId': simulation_cond_array,
'measurement': self.measurements_array,
'time': tn,
'noiseParameters': noiseParameters_array,
'replicateId': replicateId_array}
self.measurement_df = pd.DataFrame(measurement_dict)
self.measurement_df.to_csv(f'{path}/{self.condition}_measurement_table.tsv', sep='\t', index=False)
return self.measurement_df
def create_parameters_table(self):
"""Create a parameter file for each substance.
The offset p0 is the first measurement (base line)
Return:
parameter_df: Petab parameter df (tsv)
"""
# Calculate offset for current substrate and condition
# As there are always the same offset we can take the first value for simplicity
p0_curr = self.measurements_array[0]
par_data = {'parameterId': ['Asus', 'Atrans', 't1', 't11', 't2', 'Tshift', 'Trange', 'p0', 'sd1_y_obs'],
'parameterScale': ['lin' for parameter in range(9)],
'lowerBound': [-6000, 0, 0, 0, 0, -100, 0, 0, 0.1],
'upperBound': [6000, 6000, 400, 400, 400, 100, 100, 1000, 1000],
'estimate': [1, 1, 1, 1, 1, 0, 0, 0, 1],
'nominalValue': [1, 1, 1, 1, 1, -2, 6, p0_curr, 1]}
self.parameter_df = pd.DataFrame(par_data)
# Save parameter table
self.parameter_df.to_csv(os.path.join(p, f'rtf_parameters.tsv'), sep='\t', index=False)
for condition in Conditions:
# Create PEtab measurement table for each Substance
df_cond = PEtab_files(df_name='midazolam', condition=condition)
df_cond.create_measurement_df(path_to_measurement_files)
df_cond.create_parameters_table()
# ----------------------------------------------------------------------------------------------------------------
## Now you have all files you need to execute BayModTS
# ----------------------------------------------------------------------------------------------------------------
"""
If you already have PEtab problems for several conditions and want to test wheater they belong to the same data
generating process, you can start below.
Please make sure all PEtab subfiles are in your executing folder.
For this you can copy the rtf.xml model, observable file and experimental conditions file from baymodts core.
"""
# ----------------------------------------------------------------------------------------------------------------
# ----------------------------------------------------------------------------------------------------------------
# Import the BayModTS class baymodts and function all_conditions_plot
import sys
sys.path.append(os.path.join(Path(__file__).parent, 'baymodts/core/'))
from BayModTS_core import baymodts
from BayModTS_core import all_conditions_plot
# Optional: Define user specified colors to compare the different conditions
pal = ['#781F19', '#C2630B', '#F1C03E']
# Define credibility level
ci = 95
# Execute the UQ analysis for each Substrate and Condition.
UQ_conditions = {} # dict to store UQ results for each condition
for condition in Conditions:
# Execute BayModTS
UQ_conditions[condition] = baymodts(model_name='rtf',
condition=condition,
all_conditions=Conditions,
Path=p,
samples=100000,
pal=pal)
UQ_conditions[condition].MCMC_sampling()
UQ_conditions[condition].MCMC_visualization()
UQ_conditions[condition].Ensemble_creation()
UQ_conditions[condition].plot_ensemble(ci, path_to_measurement_files)
all_conditions_plot(UQ_conditions, ci, p, pal=pal)
# ----------------------------------------------------------------------------------------------------------------
## After the execution of BayModTS
# ----------------------------------------------------------------------------------------------------------------
"""
Perfect, now you should have the results of the BayModTS workflow.
The results are stored in the results folder.
Please always check carefully:
- Was the Effective Sample Size (ESS) sufficient? For this example, we would need to increase the number of
drawn MCMC samples in order to get a sufficent ESS.
- Can you reproduce your results with a differenent/broader prior?
- Take a look at your traces - can you confirm a converged chain visually?
- Compare your Posterior Predicitve Distributions (PPDs) to your underlying data (ensemble_output_predictions.png).
Is your data explained reasonably? If not is your model flexible enough to describe the observed data?
Further good practices for Bayesian Inference can be found in the "Bayesian Analysis Reporting Guidelines" of J. Kruschke:
https://doi.org/10.1038/s41562-021-01177-7
"""
# ----------------------------------------------------------------------------------------------------------------
# ----------------------------------------------------------------------------------------------------------------