-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable enumeration of bond types present in a molecule #1525
Conversation
I am requesting that @alongd be the primary code reviewer, but I would also like @mjohnson541 and @mliu49 to take a quick look to make sure that the additions are okay (i.e. my fix for preventing edge duplication is okay and we are fine with the bond labels I have chosen) if you guys have time. |
Codecov Report
@@ Coverage Diff @@
## master #1525 +/- ##
==========================================
- Coverage 42.06% 42.06% -0.01%
==========================================
Files 165 165
Lines 27806 27821 +15
Branches 5666 5667 +1
==========================================
+ Hits 11697 11702 +5
- Misses 15325 15336 +11
+ Partials 784 783 -1
Continue to review full report at Codecov.
|
rmgpy/molecule/molecule.pxd
Outdated
@@ -245,4 +247,6 @@ cdef class Molecule(Graph): | |||
|
|||
cpdef bint isIdentical(self, Molecule other) except -2 | |||
|
|||
cpdef dict enumerate_bonds(self) | |||
|
|||
cdef atom_id_counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a new line at the end of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
the atom labels in alphabetical order (i.e. 'C-H' is possible but not 'H-C') | ||
:return: str | ||
""" | ||
bond_symbol_mapping = {0: '~', 1: '-', 1.5: ':', 2: '=', 3: '#'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we say that ~
is visually too similar to -
? How about _
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked @mjohnson541 what his preference was (since he is leading the effort for adding hydrogen bonds), and he liked ~
the best. I can change this though if need be.
bonds = self.getAllEdges() | ||
|
||
for bond in bonds: | ||
bond_count[bond.get_bond_string()] += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 0
the default value for an int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, should I change this to 0
though to make this more clear for the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's OK, I was just making sure
@@ -1593,12 +1593,12 @@ def generate_H_bonded_structures(self): | |||
structs = [] | |||
Hbonds = self.find_H_bonds() | |||
for i,bd1 in enumerate(Hbonds): | |||
molc = deepcopy(self) | |||
molc = self.copy(deep=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what this change does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pass a Molecule object to copy.deepcopy, all of the edges in the graph get duplicated because it copies all of the attributes of each atom (and there are always two atoms that have the same bond object as an attribute), but is not smart enough to know that it has already created a copy of the bonds. To overcome this, Molecule objects have a method that behaves like you think deepcopy would but without duplicating the number of bonds. Without this change, enumerating the bonds will double count everything if hydrogen bonded structures are generated.
rmgpy/molecule/moleculeTest.py
Outdated
def test_enumerate_bonds(self): | ||
"""Test that generating a count of bond labels works properly.""" | ||
mol = Molecule().fromSMILES('c1cc(O)c(CCO)cc1CC#C') | ||
mol = mol.generate_resonance_structures()[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you only use a particular resonance structure, perhaps give it directly as SMILES? Otherwise, changes to the resonance module that might change the order of resonance structures in the resulting list would cause this test to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I'll hand it the adjacency list then (I can't get the H-bonded structure from SMILES)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me. Please see some minor comments.
fff5143
to
b454d45
Compare
@alongd I have made the requested changes, though let me know what I should do about the hydrogen-bond label and about the default of |
@amarkpayne, looking good! |
Returns a dictionary of the number of each bond type present in the molecule
b454d45
to
f0af658
Compare
Thanks @alongd ! I have rebased this branch and squashed the fixup commits. Once the tests run this branch should be ready for your final approval. |
In some cases in ARC (in applying BAC) and in generating isodesmic reactions, it is important to determine the number of each bond type present in the molecule. Currently single, double, triple, benzene, and hydrogen bonds have set symbols, but other bond orders can be managed (will return something like
C<bond order 4>C
)The PR also includes a fix to prevent duplicating the number of edges in a Molecule object when generating H bonded structures. Thanks to @mjohnson541 for this fix!
To the reviewer, generate a molecule object and run
mol.enumerate_bonds()
and check the output dictionary