-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Argument check_occu
is not working as intended (in _get_structure
and parse_structures
)
#3816
Comments
I'm not a maintainer, but if you could provide the cif file and code snippet to reproduce this issue, that would be very helpful :) |
Here is a minimally reproducible example with the structure (Crystal.cif) attached, highlighting that the argument To reproduce the bug: from pymatgen.io.cif import CifParser
parser = CifParser("Crystal.cif", site_tolerance = 1e-4, occupancy_tolerance = 1.0)
b = parser.parse_structures(check_occu = False) Error that I encountered: Some occupancies ([2.0, 2.0, 2.0, 1.0, 1.0]) sum to > 1! If they are within the occupancy_tolerance, they will be rescaled. The current occupancy_tolerance is set to: 1.0
No structure parsed for section 1 in CIF.
...
in CifParser.parse_structures(self, primitive, symmetrized, check_occu, on_error)
1219 warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
1221 if len(structures) == 0:
-> 1222 raise ValueError("Invalid CIF file with no structures!")
1223 return structures
ValueError: Invalid CIF file with no structures! Expected behavior: occupancy would not be checked and a structure with all atoms should be given. I think code review could be more helpful to see the issue though. |
Hi @KunhuanLiu, I just had a look at this issue, and the inconsistency between comment and implementation you mentioned is indeed an issue: Lines 985 to 989 in 2e1c301
But in our case, that's not the cause of the failure. Cause of code failureThe issue is the bad default value of Line 290 in 2e1c301
And the condition for scaling occupancy: Lines 1085 to 1086 in 2e1c301
Which means the default Line 1105 in 2e1c301
I would talk to @janosh on changing the bad default value of To fix your codeTo fix your code, you need to set the from pymatgen.io.cif import CifParser
parser = CifParser("Crystal_min.cif", occupancy_tolerance=2.0)
structures = parser.parse_structures(check_occu=False) Hope it helps :) |
Thanks @DanielYang59 , for investigating the case. Totally agree with your thoughts; I came up with a similar manual fix (I offered a test case where manually setting the In addition to attending to the behavior of Again thank you for the attention and time! |
Maybe the behavior should be changed such that
No worries at all. |
Sorry about the delayed response -- I noticed another thing about the current
I think that's what I thought the argument should do based on the API documentation description. When I want to note the following behavior when I use
Returned atoms do not have labels:
|
Hi thanks for following up. I have updated the behavior and docstring in #3819. Meanwhile I have a question though, as per the docstring, "unphysical occupancy" refers to I would look into the site label soon, thanks for reporting. |
The label issue is fixed by 05c11d4. Thanks. |
Hi @KunhuanLiu : the CIF file you provided does not appear to be valid, as the three sites at the end, Pymatgen does not currently include functionality to fix bad CIFs. This isn't an issue with the occupancy tolerance function |
Hello @esoteric-ephemera , this ticket inquiries whether the implementation of the
Example structure serves as a minimal example to highlight the impact of the issue.
Thank you for sharing. Can you clarify what you mean? I am not requesting such functionality here. Can you comment on how or why I also don't think that "bad cifs" (a term that should be defined more clearly) are the sole reasons for such errors. A structure with large unit cell and dense atoms can have atoms such as H that are "close" to each other under the 1e-4 default site tolerance (I cannot publish the structure here). If
I never thought it was in this case. Setting a higher occupancy tolerance is an unorthodox trick to let pymatgen return my structure. The question concerns the proper behavior with the |
@KunhuanLiu: the example is not a reasonable CIF because it contains multiply-occupied sites. The site occupancy represents fractionally what elements occupy a given site, e.g., a disordered structure will have at least two sites with occupancies < 1, and an ordered structure will have all site occupancies == 1. A site occupancy > 1 is not physical A "bad CIF" in this context means a CIF that is either out of spec per the IUCr or is unphysical. The ICUr specifically states that the occupancy represents:
Right now, pymatgen does not support this kind of "dummy site" because it's not compatible with other functionality in pymatgen. pymatgen does support disordered structures, where at least two elements occupy the same site with fractional (< 1) occupancy.
|
Hi @esoteric-ephemera thanks a lot for your comment. Combine of sites
It appears Lines 306 to 329 in bb68c78
And: Lines 328 to 329 in bb68c78
The cif @KunhuanLiu provided:
Handle of "unphysical" (occupancy > 1) sites
In Lines 1263 to 1266 in bb68c78
Which should allow user to parse cif with occupancy > 1, once |
@esoteric-ephemera Thanks so much for clarifying, and I apologize for my oversight on the fact that in a realistic crystal structure, no atom sites would have summed occupancy > 1. I understand and support the design that raises the error I am still unsure what the algorithm should do when I'm very thankful to the attention and help from @DanielYang59, and I want to make sure we are not changing the code implementation unnecessarily. It looks like I had some misunderstanding caused by warning messages and documentation. To make sure we are aligned on the intended behavior, here's my (corrected) understanding:
@DanielYang59 thanks for your attention. Based on our discussion I think the original version has a very reasonable design and raises ValueError as intended. It sounds reasonable to me to refer the structures with occupancy > 1 as Lines 1011 to 1018 in 2e1c301
Here, ({sum_occu}) sum to > 1! is confusing and can lead to a very very long warning message when there are lots of atoms. This happened in my case and made it difficult to correctly understand the warning message.I suggest the following:
@esoteric-ephemera Can please you confirm that the behavior described above on |
Sorry for the delay. @DanielYang59:
They're duplicates in terms of position and chemical species, their labels differ. The right way to specify a site like this is with 0.5 occupancy on
It might let you do that but I'm not sure what it parses will be meaningful / usable in pymatgen. I think pymatgen should parse CIFs that conform (within a tolerance) to the IUCr standards and not extrapolate when a CIF deviates too far from those standards.
This is right except
As for closing the issue / @DanielYang59's PR: I would prefer to just amend the ValueError message for clarity - the site tolerance shouldn't be changed unless there's an in-spec CIF that isn't being parsed correctly |
All good. Thanks a lot for your comments :)
Sorry I don't think they're duplicates, as their x/y coordinates differ (slightly by 1e-4):
So if the tolerance is set to a stricter value, they could be two individual sites. But in our case, the default tolerance is such that these two sites are indeed considered at the same position (but I would not consider them duplicates). Lines 328 to 329 in 14b7357
I'm unsure about the purpose of having such behavior either, but as per the docstring: Lines 1263 to 1266 in 14b7357
So I might assume there is a reason to allow for such cif files. If you or anyone else with more knowledge on cif decide to remove this functionality, we might need to remove
Yes I have already updated the code with a more descriptive error message. I didn't change the site tolerance but updated the behavior of |
Python version
3.11.8
Pymatgen version
2024.2.8
Operating system version
No response
Current behavior
Hi all,
I was trying to deal with a crystal structure (in .cif format) that had warnings on having occupancy issues:
I am unsure if it is the structure's error given that many other programs (ase, VESTA, etc) were not reporting error, so I tried to turn off the occupancy check and check myself. When I called CifParser function
parse_structures
withcheck_occu
= False, the same warning and errors were logged.Checking the source code I think there is something implemented that is inconsistent with the algorithm, by a quick glance:
Under
_get_structures()
:I think
not check_occu
is not consistent with the algorithm commented above.Meanwhile, the warning I'm receiving comes from the following code snippet that is not switched off by "check_occu":
My temporarily hacky solution to my structure is to raise the occupancy_tolerance to 1000 to get the structure.
Expected Behavior
Sites that are too close to each other are kept without raising warning and errors that prevent getting the structures.
Minimal example
No response
Relevant files to reproduce this bug
No response
The text was updated successfully, but these errors were encountered: