Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove metabolism constraints file #111

Merged
merged 4 commits into from
Mar 13, 2018

Conversation

tahorst
Copy link
Member

@tahorst tahorst commented Mar 10, 2018

I've been playing around with kinetic parameters and because we currently use a dynamically created file, I needed to rerun the fitter to generate the file when I wanted to switch back and forth between sims. This was my attempt at keeping the dynamically generated code more self contained especially since we aren't tracking the files that are dynamically generated. I had wanted to keep the compile and exec commands contained within sim_data but that prevented the pickling of the object. I could have done it with the eval command but that is about 8x slower than compiling once and running exec. I'm not sure how I feel about relying on exec and feel like there should be a cleaner way to do this. If this implementation looks good though, we could probably do roughly the same thing with the ode constraints to remove those dynamically generated files as well.

I found this site useful in setting this up: http://lucumr.pocoo.org/2011/2/1/exec-in-python/

@tahorst tahorst requested a review from 1fish2 March 10, 2018 00:42
Copy link
Contributor

@1fish2 1fish2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot.

See the comments on compile+eval.

'enzymes': enzymes,
'kineticsSubstrates': substrates,
}
exec self.compiledConstraints in local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exec statement says providing one dictionary to exec uses it for both the globals and the locals. If locals are in fact faster than globals, then it's worth providing {}, local.

Might as well use the Python 3 compatible syntax exec(self.compiledConstraints, global, local).

Actually, this module has already imported numpy so there's no need to compile code that redoes that. Just pass np in one of the dictionaries. It might be worth testing whether the locals or globals dictionary is faster for that -- np is usually a global.

And with that you can just compile and eval() an expression, which is simpler and I'd expect it to be more efficient since eval() never has to modify its locals dictionary. (Note that the expression source code must end with a \n.)

if not self.compiledConstraints:
    self.compiledConstraints = compile('np.array(%s).reshape(-1)\n' % self.kineticConstraintsStr, '<string>', 'eval')

local = {
        'enzymes': enzymes,
        'kineticsSubstrates': substrates,
        'np': np,
        }
return eval(expr, {}, local)

Was the 8x performance penalty with eval() due to evaluating a string multiple times instead of compiling it once?

Does using eval() instead of exec() let you move the compile & eval code into sim_data?

"reconstruction", "ecoli", "dataclasses", "process", "metabolism_constraints.py"
)
writeMetabolicConstraintsFile(constraintsFile, constraints)
self.kineticConstraints = str(sp.Matrix(constraints))[7:-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 7 and -1? Should it use len(something) or symbolic constants?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just carry over from the kinetic_constraints.py file. Creating the sp.Matrix object adds Matrix() around the string so the indexing just strips that. It looks like all we need to do is turn the constraints into a string and then we don't need to index or reshape the numpy array on the function call so I can update that

@1fish2
Copy link
Contributor

1fish2 commented Mar 10, 2018

Even better: eval() a lambda expression once, then just invoke it.

if not self.compiledConstraints:
    self.compiledConstraints = eval('lambda enzymes, kineticsSubstrates: np.array(%s).reshape(-1)\n' % self.kineticConstraintsStr, {'np': np}, {})

return self.compiledConstraints(enzymes, substrates)

This proves to be faster the first time and subsequent times.

np has to be provided in the globals, not locals. I don't know why.

(Another variation passes in np as another lambda argument, but it doesn't make a noticeable performance difference.)

About pickling sim_data
If the issue is having a derived field that can't be pickled, the fix is to avoid pickling that field. See this extended example where the object's _getstate__() method copies its __dict__ and removes the unpicklable field. The __setstate__() method reconstructs that field; in the present case, just setting it to None would suffice.

Test code

import numpy as np
import time


def measure(f):
	"""Return f() speed in microseconds."""
	clock = time.time
	start = clock()
	result = f()
	stop = clock()
	# return result, (stop - start) * 1e6
	return (stop - start) * 1e6

class Alternatives:
	expr = '[enzymes[2] * substrates[2], enzymes[3] * substrates[3]]'

	def __init__(self):
		self.x1 = self.x2 = self.x3 = None

	def v1(self, enzymes, substrates):
		if not self.x1:
			expr = 'np.array(%s)\n' % self.expr
			self.x1 = compile(expr, '<string>', 'eval')
		local = {'np': np, 'enzymes': enzymes, 'substrates': substrates}
		return eval(self.x1, {}, local)

	def v2(self, enzymes, substrates):
		if not self.x2:
			expr = 'lambda enzymes, substrates: np.array(%s)\n' % self.expr
			self.x2 = eval(expr, {'np': np}, {})
		return self.x2(enzymes, substrates)

	def v3(self, enzymes, substrates):
		if not self.x3:
			expr = 'lambda np, enzymes, substrates: np.array(%s)\n' % self.expr
			self.x3 = eval(expr, {}, {})
		return self.x3(np, enzymes, substrates)


def test():
	enzymes = [0.0, -1.0, 3.14159, 2.71828]
	substrates = [10 ** x for x in xrange(5)]
	a = Alternatives()
	l1 = lambda: a.v1(enzymes, substrates)
	l2 = lambda: a.v2(enzymes, substrates)
	l3 = lambda: a.v3(enzymes, substrates)

	for _ in xrange(10):
		print '\t'.join(
			[str(x) for x in [measure(l1), measure(l2), measure(l3)]])

test()
1159.19113159	55.0746917725	41.0079956055
5.00679016113	2.14576721191	1.90734863281
2.86102294922	2.14576721191	0.953674316406
3.09944152832	0.953674316406	0.953674316406
3.09944152832	1.90734863281	2.14576721191
2.86102294922	0.953674316406	1.90734863281
3.09944152832	0.953674316406	0.953674316406
2.86102294922	2.14576721191	0.953674316406
3.09944152832	1.90734863281	0.953674316406
3.09944152832	0.953674316406	2.14576721191

@1fish2
Copy link
Contributor

1fish2 commented Mar 12, 2018

BTW, the penalty for v1 mostly goes away by changing eval(self.x1, {}, local) to eval(self.x1, {'np': np}, local). v1's first run duration drops to match v2, while v1's later runs average about 4 µsec vs. 3 µsec for v2.

That's really surprising since the first-run step compile(expr, '<string>', 'eval') is identical in both cases. That suggests that v1's eval() step is taking a long time to lookup np and then it caches the essential result.

@tahorst
Copy link
Member Author

tahorst commented Mar 12, 2018

Thanks for doing the testing Jerry! It seems like I can create the same getKineticConstraint() function that's in metabolism as a function in sim_data where it will only compile the eval statement when it is first called and everything works fine. I think this just means it's waiting until after pickling/multiprocessing (some similar issues to the pickling were coming up when the fitter was creating new threads with a compiled object) to create the compiled object in sim_data which could present some problems if we would ever try to pickle it again afterwards. It seems a little weird and contrary to our programming design to be modifying the sim_data object after the fitter even if it's just storing some temporary compiled code.

Copy link
Collaborator

@jmason42 jmason42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely an improvement, although I don't think I've ever used exec or compile so it's a bit mysterious to me.

If you want to evaluate the kinetic constraints in a more algorithmic fashion (e.g. something that could be assembled and then evaluated in Cython) let me know. My parameter estimation work provides guidelines for disassembling a kinetic rate law into standard component parts. This would also put you one (big) step closer towards being able to use my parameter estimation techniques. 😉


def getKineticConstraints(self, enzymes, substrates):
'''
Allows for dynamic programming for kinetic constraint calculation from sim_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call this "dynamic programming" - dynamic programming is a specific algorithmic technique.

Returns np.array of the kinetic constraint target for each reaction with kinetic parameters
Inputs:
enzymes (np.array) - concentrations of enzymes associated with kinetics constraints
substrates (np.array) - concentrations of substrates associated with kinetics constraints
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documenting inputs is great, something I'm also trying to do more. With arrays I like to annotate the dtype and shape if possible.

@1fish2
Copy link
Contributor

1fish2 commented Mar 13, 2018

This looks good!

Revised notes on pickling:

  • This code lazily sets _compiledConstraints, and if pickling only happens before that, it's fine.
  • If pickling ever happens after that, it'll raise a PicklingError.
  • The simplest fix is to add a __getstate__() method that sets self._compiledConstraints = None then returns self or self.__dict__, although that discards the cached _compiledConstraints.
  • A higher end fix is to implement __getstate__() and __setstate__() like in the example.

@@ -111,8 +111,7 @@ def initialize(self, sim, sim_data):
self.catalyzedReactionBoundsPrev = np.inf * np.ones(len(self.reactionsWithCatalystsList))

# Data structures to compute reaction targets based on kinetic parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reword "Data structures" as "Function" or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I feel like it's easy to forget about comments when updating code. Probably something we need to be more careful about moving forward so we don't get mismatches between what the comments say and what the code actually does

@tahorst tahorst merged commit f317048 into master Mar 13, 2018
@tahorst tahorst deleted the remove-metabolism-constraints-file branch March 13, 2018 23:46
@1fish2 1fish2 mentioned this pull request Mar 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants