Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use integer linear programming in balance_stoichiometry #86

Merged
merged 4 commits into from
Mar 24, 2018

Conversation

bjodah
Copy link
Owner

@bjodah bjodah commented Mar 18, 2018

To address gh-85.

This should in theory give canonical solutions (unless there are multiple degenerate solutions) to under-determined systems representing the stoichiometry of chemical reactions.

Currently the problems with glpk.ilp are:

  • the simplex algorithm does not always converge
  • sometimes a suboptimal (and therefore non-canonical) solution is returned.

It is possible that a hand-written special purpose algorithm is still preferable to a general ILP solver.

@bjodah bjodah force-pushed the fix-underdetermined-balance branch 2 times, most recently from e14c377 to 4248e5f Compare March 18, 2018 16:21
@bjodah bjodah force-pushed the fix-underdetermined-balance branch from 4248e5f to 3d44cf0 Compare March 18, 2018 16:38
@bjodah bjodah changed the title [WIP] Use integer linear programming in balance_stoichiometry Use integer linear programming in balance_stoichiometry Mar 18, 2018
@bjodah
Copy link
Owner Author

bjodah commented Mar 18, 2018

cc @saraaamin

would you mind trying this branch? (if that sounds difficult, don't worry -- I can make a release which you can "pip install")

@saraaamin
Copy link

saraaamin commented Mar 18, 2018

will this require that i install glpk?
because I am having problems doing that, and this is why i switched from another balancing reaction module to yours.

@bjodah
Copy link
Owner Author

bjodah commented Mar 18, 2018

cvxopt was using glpk, didn't look "under the hood" in pulp, so not sure i glpk is needed for pulp (or if it's bundled).

I'm quite sure the code could be made to work with plain SymPy if that's needed -- it's "just" a matter of coding up a genreal strategy.

Does "pip install pulp" work?
(or with conda: "conda install pulp")
if it does, then this branch should work.

@saraaamin
Copy link

saraaamin commented Mar 18, 2018

yes pip install pulp worked, and i downloaded this branch and installed it.
I tried it with one of the examples i had for underdetermined, and it returned none, so i'm assuming instead of throwing an exception it's now returning none if it couldn't find a solution, am i correct?

@bjodah
Copy link
Owner Author

bjodah commented Mar 18, 2018

hmmm.. no that sounds worrisome. It should raise an exception. Could you write a minimal self-contained example returning None?

@saraaamin
Copy link

I take it back, sorry about that i wasn't returning the right thing.
It's raising an exception like you said:

raise TypeError("can't convert %s to int" % r)
TypeError: can't convert nan to int

i'll extensively test with a bunch of equations and get back to you.
Thank you so much for all the help, highly appreciate it.

@bjodah
Copy link
Owner Author

bjodah commented Mar 20, 2018

I was thinking that we could merge this PR now (since it is fixing a previously failing test).
Any objections @saraaamin ? (have you found the time to evaluate the branch against your reaction set?)

@saraaamin
Copy link

I ran it through a reaction set, but it's only returning coefficients if the reaction is already balanced. I'm trying to find one of the reactions that is under-determined but can find one of the solutions for it and see if the fix will work for that.

or is it going to throw an error if it's under-determined no matter what?

@bjodah
Copy link
Owner Author

bjodah commented Mar 20, 2018

Not quite sure I follow. Could you write a minimal example together with what you'd expect?

@saraaamin
Copy link

saraaamin commented Mar 21, 2018

@bjodah Yes you can merge it. it's working now.
This is what i found, which i guess is expected when using a branch and bound algorithm to solve MIP problems.
I tested it with several examples. in under-determined system when there is a solution, a solution can be found, it's not usually the optimal solution, but it's a feasible solution.

example:

{'C21H28N7O14P2': 7, 'C3H4O3': 13, 'C5H9NO4': 3}
{'C5H6O5': 9, 'C21H29N7O14P2': 7, 'C3H5NO2': 3, 'H': 3}

While the solution identified is feasible, all the coeff could have been '1' and that would have minimized an Obj of summing the identified coeff values.

@saraaamin
Copy link

found another case where the result is unbalanced and not sure why it's giving back the wrong answer:

{'C3H4O3': 1, 'H3PO4': 1}
{'C3H6O3': 2}

@bjodah
Copy link
Owner Author

bjodah commented Mar 22, 2018

Thanks, I pushed a fix for your second example.

The one before though seems to be working for me:

>>> balance_stoichiometry({'C21H28N7O14P2', 'C3H4O3', 'C5H9NO4'}, {'C5H6O5', 'C21H29N7O14P2', 'C3H5NO2', 'H'}, underdetermined=None)
(OrderedDict([('C21H28N7O14P2', 1), ('C3H4O3', 1), ('C5H9NO4', 1)]),
 OrderedDict([('C21H29N7O14P2', 1), ('C3H5NO2', 1), ('C5H6O5', 1), ('H', 1)]))

@saraaamin
Copy link

should i download the same branch, and run the examples again?

@bjodah
Copy link
Owner Author

bjodah commented Mar 22, 2018

That would be great!

@saraaamin
Copy link

I hate to say this but i found more examples that should not be balanced and wrong coeff are identified for them:

{'C21H36N7O16P3S': 7, 'C3H4O3': 6}
{'H2O': 1, 'C5H8O3': 1, 'C24H38N7O18P3S': 7}

in this case the coeff are correct for all atoms, only 'C' is unbalanced. and according to a colleague in biochem, this is a wrong reaction and can't be balanced

another example:

{'C3H6O3': 1}
{'C3H4O3': 1}
This also can't be balanced because no matter what there will be extra 'H'

@bjodah
Copy link
Owner Author

bjodah commented Mar 22, 2018

Ah, yes. Thanks, didn't think about that. Since the linear programming solver performs a minimization there is no guarantee that the constraints are actually fulfilled. I added explicit checks which will raise a ValueError exception when the balancing fails. Would you mind giving this version a try?

@saraaamin
Copy link

I will test it later today, but i'm sitting right now with my colleague, and we noticed that the below example should not balance as well. would the new fix be able to detect that?

{'H2O': 1, 'C21H29N7O17P3': 1, 'C3H4O3': 1}
{'H': 1, 'C2H2O4(CH2)n': 1, 'C21H30N7O17P3': 1}

@bjodah
Copy link
Owner Author

bjodah commented Mar 22, 2018

Currently for n=1:

In [7]: balance_stoichiometry({'H2O', 'C21H29N7O17P3', 'C3H4O3'}, {'H', 'C2H2O4(CH2)1', 'C21H30N7O17P3'}, underdetermined=None)
Out[7]: 
(OrderedDict([('C21H29N7O17P3', 1), ('C3H4O3', 1), ('H2O', 1)]),
 OrderedDict([('C21H30N7O17P3', 1), ('C2H2O4(CH2)1', 1), ('H', 1)]))

for n == 2:

In [8]: balance_stoichiometry({'H2O', 'C21H29N7O17P3', 'C3H4O3'}, {'H', 'C2H2O4(CH2)2', 'C21H30N7O17P3'}, underdetermined=None)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-4f80fd0c1d03> in <module>()
----> 1 balance_stoichiometry({'H2O', 'C21H29N7O17P3', 'C3H4O3'}, {'H', 'C2H2O4(CH2)2', 'C21H30N7O17P3'}, underdetermined=None)

~/vc/chempy/chempy/chemistry.py in balance_stoichiometry(reactants, products, substances, substance_factory, parametric_symbols, underdetermined)
   1263                 raise ValueError("The system was under-determined")
   1264         if not all(residual == 0 for residual in A.dot(sol)):
-> 1265             raise ValueError("Failed to balance reaction")
   1266 
   1267     def _x(k):

ValueError: Failed to balance reaction

(since the number of carbons are affected)

@saraaamin
Copy link

what if i don't want to replace the n with a specific number, I just want to leave it as is? would it detect that it is unbalanced?

@bjodah
Copy link
Owner Author

bjodah commented Mar 22, 2018

Currently the parser does not translate "n" into e.g. SymPy symbols. It could be done but would require some work on the parser and balance_stoichiometry.

@bjodah bjodah force-pushed the fix-underdetermined-balance branch from 3f7ee48 to 5720dd9 Compare March 22, 2018 22:01
@bjodah
Copy link
Owner Author

bjodah commented Mar 22, 2018

Here's a work-around:

>>> balance_stoichiometry({'H2O', 'C21H29N7O17P3', 'C3H4O3'}, {'H', 'C2H2O4', 'CH2', 'C21H30N7O17P3'}, underdetermined=None)
(OrderedDict([('C21H29N7O17P3', 1), ('C3H4O3', 1), ('H2O', 1)]),
 OrderedDict([('C21H30N7O17P3', 1), ('C2H2O4', 1), ('CH2', 1), ('H', 1)]))

@saraaamin
Copy link

ok, let's keep it as is right now, and i'll try to deal with n's on my side.

@saraaamin
Copy link

I finished testing it, and all is good.
You can go ahead and merge.

@bjodah
Copy link
Owner Author

bjodah commented Mar 24, 2018

Thank you for the feedback.

@bjodah bjodah merged commit 2233adc into master Mar 24, 2018
@bjodah bjodah deleted the fix-underdetermined-balance branch March 24, 2018 14:50
@saraaamin
Copy link

Thanks for helping me out.
Side question, how can i cite the work later in the future?

@bjodah
Copy link
Owner Author

bjodah commented Mar 24, 2018

Thanks for considering it. I have submitted a manuscript to Journal of Open Source Software:
openjournals/joss-reviews#565

Hopefully it will get accepted, and when (if) it is, I will update the README with a "how to cite" section.

If you need a citation key before that, the second best solution is citing the zenodo doi:
https://doi.org/10.5281/zenodo.593188

@saraaamin
Copy link

Great, good luck!

@bjodah
Copy link
Owner Author

bjodah commented Apr 4, 2018

@saraaamin the manuscript got accepted and there is now a DOI in the README if you need to cite ChemPy.
Good luck with your project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants