Remove metabolism constraints file #111

tahorst · 2018-03-10T00:42:30Z

I've been playing around with kinetic parameters and because we currently use a dynamically created file, I needed to rerun the fitter to generate the file when I wanted to switch back and forth between sims. This was my attempt at keeping the dynamically generated code more self contained especially since we aren't tracking the files that are dynamically generated. I had wanted to keep the compile and exec commands contained within sim_data but that prevented the pickling of the object. I could have done it with the eval command but that is about 8x slower than compiling once and running exec. I'm not sure how I feel about relying on exec and feel like there should be a cleaner way to do this. If this implementation looks good though, we could probably do roughly the same thing with the ode constraints to remove those dynamically generated files as well.

I found this site useful in setting this up: http://lucumr.pocoo.org/2011/2/1/exec-in-python/

…constraints.py

1fish2

I like this a lot.

See the comments on compile+eval.

1fish2 · 2018-03-10T05:24:36Z

models/ecoli/processes/metabolism.py

+			'enzymes': enzymes,
+			'kineticsSubstrates': substrates,
+			}
+		exec self.compiledConstraints in local


The exec statement says providing one dictionary to exec uses it for both the globals and the locals. If locals are in fact faster than globals, then it's worth providing {}, local.

Might as well use the Python 3 compatible syntax exec(self.compiledConstraints, global, local).

Actually, this module has already imported numpy so there's no need to compile code that redoes that. Just pass np in one of the dictionaries. It might be worth testing whether the locals or globals dictionary is faster for that -- np is usually a global.

And with that you can just compile and eval() an expression, which is simpler and I'd expect it to be more efficient since eval() never has to modify its locals dictionary. (Note that the expression source code must end with a \n.)

if not self.compiledConstraints: self.compiledConstraints = compile('np.array(%s).reshape(-1)\n' % self.kineticConstraintsStr, '<string>', 'eval') local = { 'enzymes': enzymes, 'kineticsSubstrates': substrates, 'np': np, } return eval(expr, {}, local)

Was the 8x performance penalty with eval() due to evaluating a string multiple times instead of compiling it once?

Does using eval() instead of exec() let you move the compile & eval code into sim_data?

1fish2 · 2018-03-10T06:53:49Z

reconstruction/ecoli/dataclasses/process/metabolism.py

-			"reconstruction", "ecoli", "dataclasses", "process", "metabolism_constraints.py"
-			)
-		writeMetabolicConstraintsFile(constraintsFile, constraints)
+		self.kineticConstraints = str(sp.Matrix(constraints))[7:-1]


Why 7 and -1? Should it use len(something) or symbolic constants?

This was just carry over from the kinetic_constraints.py file. Creating the sp.Matrix object adds Matrix() around the string so the indexing just strips that. It looks like all we need to do is turn the constraints into a string and then we don't need to index or reshape the numpy array on the function call so I can update that

1fish2 · 2018-03-10T19:23:03Z

Even better: eval() a lambda expression once, then just invoke it.

if not self.compiledConstraints:
    self.compiledConstraints = eval('lambda enzymes, kineticsSubstrates: np.array(%s).reshape(-1)\n' % self.kineticConstraintsStr, {'np': np}, {})

return self.compiledConstraints(enzymes, substrates)

This proves to be faster the first time and subsequent times.

np has to be provided in the globals, not locals. I don't know why.

(Another variation passes in np as another lambda argument, but it doesn't make a noticeable performance difference.)

About pickling sim_data
If the issue is having a derived field that can't be pickled, the fix is to avoid pickling that field. See this extended example where the object's _getstate__() method copies its __dict__ and removes the unpicklable field. The __setstate__() method reconstructs that field; in the present case, just setting it to None would suffice.

Test code

import numpy as np
import time


def measure(f):
	"""Return f() speed in microseconds."""
	clock = time.time
	start = clock()
	result = f()
	stop = clock()
	# return result, (stop - start) * 1e6
	return (stop - start) * 1e6

class Alternatives:
	expr = '[enzymes[2] * substrates[2], enzymes[3] * substrates[3]]'

	def __init__(self):
		self.x1 = self.x2 = self.x3 = None

	def v1(self, enzymes, substrates):
		if not self.x1:
			expr = 'np.array(%s)\n' % self.expr
			self.x1 = compile(expr, '<string>', 'eval')
		local = {'np': np, 'enzymes': enzymes, 'substrates': substrates}
		return eval(self.x1, {}, local)

	def v2(self, enzymes, substrates):
		if not self.x2:
			expr = 'lambda enzymes, substrates: np.array(%s)\n' % self.expr
			self.x2 = eval(expr, {'np': np}, {})
		return self.x2(enzymes, substrates)

	def v3(self, enzymes, substrates):
		if not self.x3:
			expr = 'lambda np, enzymes, substrates: np.array(%s)\n' % self.expr
			self.x3 = eval(expr, {}, {})
		return self.x3(np, enzymes, substrates)


def test():
	enzymes = [0.0, -1.0, 3.14159, 2.71828]
	substrates = [10 ** x for x in xrange(5)]
	a = Alternatives()
	l1 = lambda: a.v1(enzymes, substrates)
	l2 = lambda: a.v2(enzymes, substrates)
	l3 = lambda: a.v3(enzymes, substrates)

	for _ in xrange(10):
		print '\t'.join(
			[str(x) for x in [measure(l1), measure(l2), measure(l3)]])

test()

1159.19113159	55.0746917725	41.0079956055
5.00679016113	2.14576721191	1.90734863281
2.86102294922	2.14576721191	0.953674316406
3.09944152832	0.953674316406	0.953674316406
3.09944152832	1.90734863281	2.14576721191
2.86102294922	0.953674316406	1.90734863281
3.09944152832	0.953674316406	0.953674316406
2.86102294922	2.14576721191	0.953674316406
3.09944152832	1.90734863281	0.953674316406
3.09944152832	0.953674316406	2.14576721191

1fish2 · 2018-03-12T07:58:13Z

BTW, the penalty for v1 mostly goes away by changing eval(self.x1, {}, local) to eval(self.x1, {'np': np}, local). v1's first run duration drops to match v2, while v1's later runs average about 4 µsec vs. 3 µsec for v2.

That's really surprising since the first-run step compile(expr, '<string>', 'eval') is identical in both cases. That suggests that v1's eval() step is taking a long time to lookup np and then it caches the essential result.

tahorst · 2018-03-12T18:57:32Z

Thanks for doing the testing Jerry! It seems like I can create the same getKineticConstraint() function that's in metabolism as a function in sim_data where it will only compile the eval statement when it is first called and everything works fine. I think this just means it's waiting until after pickling/multiprocessing (some similar issues to the pickling were coming up when the fitter was creating new threads with a compiled object) to create the compiled object in sim_data which could present some problems if we would ever try to pickle it again afterwards. It seems a little weird and contrary to our programming design to be modifying the sim_data object after the fitter even if it's just storing some temporary compiled code.

jmason42

This is definitely an improvement, although I don't think I've ever used exec or compile so it's a bit mysterious to me.

If you want to evaluate the kinetic constraints in a more algorithmic fashion (e.g. something that could be assembled and then evaluated in Cython) let me know. My parameter estimation work provides guidelines for disassembling a kinetic rate law into standard component parts. This would also put you one (big) step closer towards being able to use my parameter estimation techniques. 😉

jmason42 · 2018-03-12T19:12:53Z

models/ecoli/processes/metabolism.py

+
+	def getKineticConstraints(self, enzymes, substrates):
+		'''
+		Allows for dynamic programming for kinetic constraint calculation from sim_data


I wouldn't call this "dynamic programming" - dynamic programming is a specific algorithmic technique.

jmason42 · 2018-03-12T19:13:51Z

models/ecoli/processes/metabolism.py

+		Returns np.array of the kinetic constraint target for each reaction with kinetic parameters
+		Inputs:
+			enzymes (np.array) - concentrations of enzymes associated with kinetics constraints
+			substrates (np.array) - concentrations of substrates associated with kinetics constraints


Documenting inputs is great, something I'm also trying to do more. With arrays I like to annotate the dtype and shape if possible.

…abolism

1fish2 · 2018-03-13T08:11:09Z

This looks good!

Revised notes on pickling:

This code lazily sets _compiledConstraints, and if pickling only happens before that, it's fine.
If pickling ever happens after that, it'll raise a PicklingError.
The simplest fix is to add a __getstate__() method that sets self._compiledConstraints = None then returns self or self.__dict__, although that discards the cached _compiledConstraints.
A higher end fix is to implement __getstate__() and __setstate__() like in the example.

1fish2 · 2018-03-13T08:15:06Z

models/ecoli/processes/metabolism.py

@@ -111,8 +111,7 @@ def initialize(self, sim, sim_data):
 		self.catalyzedReactionBoundsPrev = np.inf * np.ones(len(self.reactionsWithCatalystsList))

 		# Data structures to compute reaction targets based on kinetic parameters


Maybe reword "Data structures" as "Function" or something.

Good catch. I feel like it's easy to forget about comments when updating code. Probably something we need to be more careful about moving forward so we don't get mismatches between what the comments say and what the code actually does

tahorst added 2 commits March 9, 2018 16:29

Compile kinetic constraints function on the fly to remove metabolism_…

f75792d

…constraints.py

Remove wholecell/utils/write_metabolic_constraints_file.py

a7cd86b

tahorst requested a review from 1fish2 March 10, 2018 00:42

1fish2 reviewed Mar 10, 2018

View reviewed changes

jmason42 reviewed Mar 12, 2018

View reviewed changes

Move getKineticConstraints() from model process to reconstruction met…

bbbee31

…abolism

1fish2 approved these changes Mar 13, 2018

View reviewed changes

Update comment

5660187

tahorst merged commit f317048 into master Mar 13, 2018

tahorst deleted the remove-metabolism-constraints-file branch March 13, 2018 23:46

1fish2 mentioned this pull request Mar 17, 2018

Procedural code generation #63

Closed

tahorst mentioned this pull request May 9, 2019

make clean should delete the generated ODE .py files #539

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove metabolism constraints file #111

Remove metabolism constraints file #111

tahorst commented Mar 10, 2018

1fish2 left a comment

1fish2 Mar 10, 2018

1fish2 Mar 10, 2018

tahorst Mar 12, 2018

1fish2 commented Mar 10, 2018

1fish2 commented Mar 12, 2018

tahorst commented Mar 12, 2018

jmason42 left a comment

jmason42 Mar 12, 2018

jmason42 Mar 12, 2018

1fish2 commented Mar 13, 2018

1fish2 Mar 13, 2018

tahorst Mar 13, 2018

		@@ -111,8 +111,7 @@ def initialize(self, sim, sim_data):
		self.catalyzedReactionBoundsPrev = np.inf * np.ones(len(self.reactionsWithCatalystsList))

		# Data structures to compute reaction targets based on kinetic parameters

Remove metabolism constraints file #111

Remove metabolism constraints file #111

Conversation

tahorst commented Mar 10, 2018

1fish2 left a comment

Choose a reason for hiding this comment

1fish2 Mar 10, 2018

Choose a reason for hiding this comment

1fish2 Mar 10, 2018

Choose a reason for hiding this comment

tahorst Mar 12, 2018

Choose a reason for hiding this comment

1fish2 commented Mar 10, 2018

1fish2 commented Mar 12, 2018

tahorst commented Mar 12, 2018

jmason42 left a comment

Choose a reason for hiding this comment

jmason42 Mar 12, 2018

Choose a reason for hiding this comment

jmason42 Mar 12, 2018

Choose a reason for hiding this comment

1fish2 commented Mar 13, 2018

1fish2 Mar 13, 2018

Choose a reason for hiding this comment

tahorst Mar 13, 2018

Choose a reason for hiding this comment