Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#425 Smiles with attachment points is not read correctly (valences are wrong) #2783

Conversation

jblack-mestre
Copy link
Contributor

Generic request

  • PR name follows the pattern #1234 – issue name
  • branch name does not contain '#'
  • base branch (master or release/xx) is correct
  • PR is linked with the issue
  • task status changed to "Code review"
  • code follows product standards
  • regression tests updated

@jblack-mestre
Copy link
Contributor Author

Similar bug to #2746; need to remove the old bonds before removing atoms (and thereby recalculating the implicit hydrogens)

to_remove.push(i);
_bmol->removeBonds(bondsToRemove);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better fix void BaseMolecule::removeAtoms(const Array<int>& indices) to remove bonds connected to removed atoms?
Because edges without vertex looks weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add some code to remove the edges in removeAtoms, however there are problems. If I put this code after line 813 (_removeAtoms), it doesn't fix this bug. If I put it before this line, then I break a some integration tests.

These tests seem to be about folding/unfolding molecules. There is a comment in Molecule::_removeAtoms about this. I suspect some other parts of the code are relying on having the 'incorrect' number of bonds to an atom.

I think this is a deeper problem that can't be solved with adding code to BaseMolecule::removeAtoms.

@jblack-mestre
Copy link
Contributor Author

I don't understand why the tests fail with:
{F39E453B-DD69-4CAE-8474-BC9CA3CE4D50}

"N": "[NH3+]%91.[*:1]%91 |$;_AP1$|",
}

for key, value in smiles.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such code cause UTs fails due to different sorting in iron python.
To avoid such fails I usually use next code:

for key in sorted(smiles.keys()):
   value = smeles[key]

@AliaksandrDziarkach
Copy link
Collaborator

AliaksandrDziarkach commented Mar 4, 2025

I don't understand why the tests fail with:

It is unclear without context, but I already hit same issue - it is due to different key order in iron python, so line moved to different position.
I added comment to code.

@AliaksandrDziarkach
Copy link
Collaborator

Render at linux and mac could generate files different from windows so you will see fails like

rendering/render_chemaxon_smiles.py..................................[FAILED]    0.09 sec
Diff:
+ chemaxon_smiles/smiles_attachment_points_452_N.png rendering status: Problem: PNG similarity is 0.89 for alpha channel
+ iVBORw0KGgoAAAANSUhEUgAAAGgAAABtCAYAAABN0SQSAAAABmJLR0QA/wD/AP+gvaeTAAAFq0lEQVR4nO2dbYgVVRjHf6uyCb3s3q2NXkztQ2/08iESe5OQiF5I6lsFkVmpRZQbUZERUQQRha5RQRZaiwkalaFWGEi4lel+iYxeIMhKy1C3DVHS3N0+PDPeM+O917lz58w89+7zg2HnzDz3OWfmP/ec55w5d894opwPzAVKwA7gPyqj3a4lWQiMOts3QHcT2rUkNxK9+HBb12R2LcsWyhc9QvQmTG8iu5ZkCtELfhTY66RfahK7lmUasBEYQhrdDmA55RuwLbCb7xyr126Fc+yHDPxVsrsUmBmcb0nagMnB/m1Eq5RJwIfOsU112rkCjWbgr5bdCPBuozdDO13AYco3YTlw0Ek/VafdM0QFatRfUruWxn1y3e0QcHaddo9VsUnrrx67luVi4ABH34AlKexqCZTGXz12Lc11wCBy4YeRqmRCCjtXoOEM/NVr19JMBC4HzmjAzhXotwz8pbEzauAK9GvBZWkqxhVdAKM2JpByTCDlmEDKMYGUYwIpxwRSjgmkHBNIOSaQckwg5ZhAyjGBlGMCKccEUo4JpBwTSDkmkHJMIOWYQMoxgZRjAinHBFKOCaQcE0g5JpByTCDlmEDKMYGUYwIpxwRSjgmkHBNIOSaQckwg5ZhAyjGBlGMCKccEUo4JpBwTSDkmkHJMIOWYQMoxgZRjAinHBFKOCaQcE0g5JpByxtw/YFVAO/BCsL8VWFVgWY7wAPBnsA0UXJaiOYHoP2Q3lFGXQNYGKccEUo4JpByL4vzRDcyqcPw4Z/8c4J4KNruBtT4KZZSZTvUFRY61bQ2dZPENWoqsn0Dg/AlgVwo/E4BlTvrzWPpZysvOjAB3p8gD4BRgkZPeAryW0pd32jLwsQ8JHUM+Am5N4acdWaQp5E1gnpMeAC4L9keA8SnyAFkTb7uTXo2s6JU1HcCVFY5PBD4I9j8DFlewGQI2g5826BbgTmRZtLHMP8AnFY67D/POKjZH8BXFvQKc7sn3mMKXQCXgDU++xxQ++0GzgLs8+h8TZC3Q7li6F1sZqyGyDhL6kUjsjiBdQsLwmzPOp5k5iKxIBvBdHhnuo9zBeh84GekHuR2v2Qn8tMc+szR2fsA5N9xAeafE8lHxPqYaPtqgvcD9sWO9wJke8mp5fAUJa4CVTrqTo78RRgJ8RnEPER3yuQmY4zE/n3QhVXfu+BzNHkSqujXOscXI8MaOBn2PIzpcUw9Jh4g6gKeRrkI30l79CHwJPE8TLTcaDxLirCDaKFcb2qgnSMhyqxQknAb8HpwfAX4BvqW8oPtO4KIq15Epebywe5hoVXcDcG8O+TbCe8Ak4A9gBjKKfgkSAW5E+nb9wIm+C5LHC7tBYD4yyh2yCNiAPKVpGAWeS/nZTmBBjfPTgauD/TlIlRayCxn53h74uZZoFa6SY1VxIX1Eq5ZPY+e19IN6gH+p3U5+HXz29QbKkYg85yQsQObFhVwPzM0x/6T0AscD02rYhH26tDVAYvIU6G+kqnN5GZicYxmSMkz0YXK5HWmfANb5Lkjes3rWIlVdyEnAWzmXIQ1TgfuQN6F9SGT3JLCtwDIlJmkbFFJCwlS3HZiHnjYoTlfMfhR4vIH866KIeXHVqrqp+RclEWcBPyGTWMJq70VgPdKZVU+936CQt4k+lZvQ+Q2KMxvYH3z2iwbKkYgiZ5b2IB3BkBmx81nMOPLBO8jUMoCrkGjUG0UKNER0WlUzsQwJFACu8JlR0XOz1yNVnSa6kZt+YQ2bA8CeYL/kszBFCwTwCBLVaWEl8BWwpIZNCTg12Pf62lqDQNqqug3B32uAc6vY9Dj7m30WRoNAAB+j5+eAfcBfyEDyKqIjHW3IzzkXBulXyWnihxFlJnAICaX3IwO7q4GfKYfn/ZR/NGAUwHlIdRcfRdgDPEhOtY/WvoYmOoELkHHD78lhBNswDMMwDMNodv4H5odtNL3RPFAAAAAASUVORK5CYII=
- chemaxon_smiles/smiles_attachment_points_452_N.png rendering status: OK

Line after rendering status: Problem: contains render result coded using base64, so you could get this string (after the + or from raw logs) and conver into corresponding png file.
I use next code for this:

#!python3
import base64
import sys

if len(sys.argv) < 3:
  print("Use b64 infile outfile")
  exit()

with open(sys.argv[1]) as b64file:
    b64data = b64file.read()

with open(sys.argv[2], 'wb') as file:
    file.write(base64.b64decode(b64data))

@AlexanderSavelyev AlexanderSavelyev merged commit d71f8f8 into epam:master Mar 6, 2025
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Smiles with attachment points is not read correctly (valences are wrong)
3 participants