Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable enumeration of bond types present in a molecule #1525

Merged
merged 3 commits into from
Dec 31, 2018
Merged

Conversation

amarkpayne
Copy link
Member

In some cases in ARC (in applying BAC) and in generating isodesmic reactions, it is important to determine the number of each bond type present in the molecule. Currently single, double, triple, benzene, and hydrogen bonds have set symbols, but other bond orders can be managed (will return something like C<bond order 4>C)

The PR also includes a fix to prevent duplicating the number of edges in a Molecule object when generating H bonded structures. Thanks to @mjohnson541 for this fix!

To the reviewer, generate a molecule object and run mol.enumerate_bonds() and check the output dictionary

@amarkpayne
Copy link
Member Author

I am requesting that @alongd be the primary code reviewer, but I would also like @mjohnson541 and @mliu49 to take a quick look to make sure that the additions are okay (i.e. my fix for preventing edge duplication is okay and we are fine with the bond labels I have chosen) if you guys have time.

@codecov
Copy link

codecov bot commented Dec 15, 2018

Codecov Report

Merging #1525 into master will decrease coverage by <.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1525      +/-   ##
==========================================
- Coverage   42.06%   42.06%   -0.01%     
==========================================
  Files         165      165              
  Lines       27806    27821      +15     
  Branches     5666     5667       +1     
==========================================
+ Hits        11697    11702       +5     
- Misses      15325    15336      +11     
+ Partials      784      783       -1
Impacted Files Coverage Δ
rmgpy/molecule/molecule.py 0% <0%> (ø) ⬆️
rmgpy/data/kinetics/family.py 58.58% <0%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e5a1606...f0af658. Read the comment docs.

@@ -245,4 +247,6 @@ cdef class Molecule(Graph):

cpdef bint isIdentical(self, Molecule other) except -2

cpdef dict enumerate_bonds(self)

cdef atom_id_counter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a new line at the end of the file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

the atom labels in alphabetical order (i.e. 'C-H' is possible but not 'H-C')
:return: str
"""
bond_symbol_mapping = {0: '~', 1: '-', 1.5: ':', 2: '=', 3: '#'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we say that ~ is visually too similar to -? How about _?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked @mjohnson541 what his preference was (since he is leading the effort for adding hydrogen bonds), and he liked ~ the best. I can change this though if need be.

bonds = self.getAllEdges()

for bond in bonds:
bond_count[bond.get_bond_string()] += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 0 the default value for an int?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should I change this to 0 though to make this more clear for the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's OK, I was just making sure

@@ -1593,12 +1593,12 @@ def generate_H_bonded_structures(self):
structs = []
Hbonds = self.find_H_bonds()
for i,bd1 in enumerate(Hbonds):
molc = deepcopy(self)
molc = self.copy(deep=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this change does?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass a Molecule object to copy.deepcopy, all of the edges in the graph get duplicated because it copies all of the attributes of each atom (and there are always two atoms that have the same bond object as an attribute), but is not smart enough to know that it has already created a copy of the bonds. To overcome this, Molecule objects have a method that behaves like you think deepcopy would but without duplicating the number of bonds. Without this change, enumerating the bonds will double count everything if hydrogen bonded structures are generated.

def test_enumerate_bonds(self):
"""Test that generating a count of bond labels works properly."""
mol = Molecule().fromSMILES('c1cc(O)c(CCO)cc1CC#C')
mol = mol.generate_resonance_structures()[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you only use a particular resonance structure, perhaps give it directly as SMILES? Otherwise, changes to the resonance module that might change the order of resonance structures in the resulting list would cause this test to fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll hand it the adjacency list then (I can't get the H-bonded structure from SMILES)

Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me. Please see some minor comments.

@amarkpayne
Copy link
Member Author

@alongd I have made the requested changes, though let me know what I should do about the hydrogen-bond label and about the default of int. Otherwise if this is good to go then I will squash the fixup commits and this can get merged after the release.

@alongd
Copy link
Member

alongd commented Dec 26, 2018

@amarkpayne, looking good!

@amarkpayne
Copy link
Member Author

Thanks @alongd ! I have rebased this branch and squashed the fixup commits. Once the tests run this branch should be ready for your final approval.

@alongd alongd merged commit d45415c into master Dec 31, 2018
@alongd alongd deleted the enumerate_bonds branch December 31, 2018 03:07
@mliu49 mliu49 mentioned this pull request May 15, 2019
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants