-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update microstates of pKa challenge #20
Conversation
…metric isomers eliminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in terms of organization/content, but I'm not doing any checking on the states. Should we get @bannanc to review also or go ahead and merge?
Also, what do you think -- is this a substantial enough change that we should send out an e-mail blast with news of it (once merged) too in case anyone is not watching the GitHub repo?
@davidlmobley I removed many replicate microstates (about 25% of initial microstates list ) and added some missing microstates. I think once we merge this PR, it will be appropriate to send out an e-mail blast stating following changes:
|
I am done with my updates on this branch. |
OK, thanks. Tag me once @bannanc reviews and we can send out an e-mail blast. In the meantime I'll ask for the latest e-mail list (I don't have access). Thanks. |
pKa_challenge_instructions.md
Outdated
@@ -158,6 +167,9 @@ These and blank lines will be ignored. | |||
|
|||
The file must contain the following four components in the following order: your predictions, a name for your computational protocol, a list of the major software packages used, and a long-form methods description. | |||
Each of these components must begin with a line containing only the corresponding keyword: `Predictions:`, `Name:`, `Software:`, and `Method:`, as illustrated in the example files. | |||
|
|||
Example submission files can be found in `physical_properties/pKa/example_submission_files/` directory to illustrate expected format when filling submission templates of the pKa challenge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this directory existed before this PR, could you make this a hyperlink so people can click on the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a link.
pKa_challenge_instructions.md
Outdated
@@ -130,6 +130,15 @@ Newly added microstates will also be shared with participants and microstates li | |||
Please do not create a microstate ID yourself. | |||
It is important that challenge organizers assign unique microstate IDs and keep track. | |||
|
|||
### Updates on microstates lists | |||
|
|||
Microstate SMILES strings and microstate IDs can be found in `physical_properties/pKa/microstates/` directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever possible, I think it is useful to provide links so people don't have to click around to figure out where you're pointing them.
If this directory existed before this PR, could you make this a hyperlink so people can click on the path?
If the directory is new a link will need to be added after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I added a link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bannanc FTR, one can add links in PRs that add the directory (a link is just formatted text). Relative links (like here) will be active even before merging since they'll just refer elsewhere in the branch; absolute links might not be active until merging.
pKa_challenge_instructions.md
Outdated
Due to replicate and missing microstates present in the first release of microstate lists, we have updated `SMXX_microstates.csv` files with necessary corrections in Version 1.4 of this repository. | ||
The main correction of this update was the removal of resonance structures that were causing dublicate representation of same microstates. | ||
We have also added new microstates suggested by participants. | ||
Newly added microstates were assigned unique microstate IDs, as recorded in `SMXX_microstates.csv` files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing I said about the directory paths, it would be convenient for the file names to also link to the relevant file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These files are all in the directory mentioned at the beginning of paragraph. I added a link to the directory. I won't add link to the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since it's multiple files you can't have each be a link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
# | ||
# PREDICTION SECTION | ||
# | ||
# Fill one typeI_microscopic_pKas_and_microstates.csv template for all molecules predicted with the same method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you are trying to say here, but its worded in a way that is a little confusing. Maybe something like:
You may submit predictions from multiple methods, but people make a separate [file name] for each different method.
@davidlmobley help on wording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Fill one typeI_microscopic_pKas_and_microstates.csv template for all molecules predicted with the same method."
Agree this is awkward. Maybe, "Each submission (method) must have a completed typeI_microscopic_pKas_and_microstates.csv
file for all molecules the prediction encompasses," if that's what's meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked how you clarified this rule. I added these sentences to pKa_challenge_instructions.md
# Please include the values of key parameters, with units, and explain how any statistical uncertainties were estimated. | ||
# Use as many lines of text as you need. | ||
# All text following the "Method:" keyword will be regarded as part of your free text methods description. | ||
Method: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be informative to have a real example here, maybe with the reference calculation protocol (or approximately that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that if you can provide people an example of what these would look like it will probably help them (and help prevent an e-mail deluge when people start trying to submit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created example submission files directory with 3 filled templates. Method section has a few sentences as example, not exactly the full description of a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I read your example when I wrote this initial comment. Maybe it is fine as is, in the past we put a method for our reference calculations here as an example of how much detail to include, but I guess its up to them how to describe it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be nicer to have a full detail description here. I can add that in the future when Bas is done with reference calculations.
# | ||
# PREDICTION SECTION | ||
# | ||
# Fill one typeII_microstate_fractional_populations.csv template file for all molecules and microstates you have predictions for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment for this line in the typeI example.
# e.g. In column 3 report ln(fractional microstate population) at pH 2.10 of your predicted microstate. | ||
# Use scientific notation with 3 decimals of precision (e.g. -2.30e0 or 3.14e-4). | ||
# For microstates predicted to have 0 populations, instead of ln(0), report as "-infinity". | ||
# Do not report error estimates in this section. You can report error estimates in Method secton. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read this I would provide the same lines as below with uncertainty estimates for each measurement. Is that what you're expecting people to do? If you want people to have the option of reporting uncertainties maybe you should make an uncertainties:
section for this file type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't plan to evaluate uncertainties of this submission type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, maybe just say that and not tell people they can report error estimates in the method section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. This is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed " You can report error estimates in Method section." sentence from the template file.
# For molecules with multiple macroscopic pKas report each macroscopic pKa in a new line. | ||
# Report pKa values to two decimal places (e.g. 10.71). | ||
# Reporting the standard error of the mean (SEM) is optional and encouraged. If it is reported, SEM should be reported to two decimal places (e.g. 1.02). | ||
# For values for which you don't have an estimate, leave that cell of the csv table empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this apply to both the uncertainty and the pKa?
For example, if I can only calculate macro pkas for 5 of the molecules can I still participate? Do I have to list all of the molecule IDs with empty cells next them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It applies both to uncertainty and pKa.
We can handle both cases for a molecule without prediction:
- Excluding a molecule ID line completely
- A line with just molecule ID followed by empty cvs cells
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok cool, I just wanted to make sure you had thought about it in a file parsing perspective.
@@ -0,0 +1,63 @@ | |||
# Submission Type II |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should specify at the top of each file that any line that begins with a #
is considered a comment and will be ignored when parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. I will add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1,11 +1,9 @@ | |||
microstate ID,canonical isomeric SMILES | |||
SM01_micro001,c1cc2c(cc1O)c3c(o2)C(=[OH+])NCCC3 | |||
SM01_micro002,c1cc2c(cc1[O-])c3c(o2)C(=NCCC3)[O-] | |||
SM01_micro003,c1cc2=[O+]C3=C(NCCCC3=c2cc1O)[O-] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to be honest, I'm not good enough at reading SMILES strings to check these so I'm going to trust your visualization and the openeye code we worked on together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I am also relying on participants to tell us, if they catch another mistake.
I had a few minor things that I think should be addressed before this is accepted. I have a general question about resonance vs. microstates/tautomers. What if my method comes up with the resonance structure that doesn't agree with the microstate ID you have assigned it? Should I use the deprecated ID or is it my responsibility to figure out which ID it was supposed to be? Related to that, if you can't use a deprecated microstate ID are you going to check IDs when files are submitted? |
@bannanc I expect people to find the matching ID in the updated microstate file, i.e. find a resonance structure of the same microstate from the SMILES list. If they can't find it, it is they can email me their requested microstate in which case I will check if it is unique or resonance of others first. But if they insist on using the deprecated microstate ID in their submissions, that would be fine too. When I analyze the prediction, I will pool together microstate IDs that are resonance structures of one another (sum their predicted populations). |
@davidlmobley @bannanc I am done with these edits. |
@@ -87,7 +87,7 @@ Participants are encouraged to submit their results in all or multiple submissio | |||
#### Prediction Type I - microscopic pKas and related microstates | |||
Predicting microscopic pKas and related microstate structures. | |||
Different protonation states and tautomer combinations constitute different microstates. | |||
- Fill one `typeI_microscopic_pKas_and_microstates.csv` template for all molecules. | |||
- Fill one `typeI_microscopic_pKas_and_microstates.csv` template for all molecules predicted with one method. You may submit predictions from multiple methods, but you should fill a separate template file for each different method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more clear, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few follow-up questions, but I think in general its fine to push these changes.
I think we addressed everything so far. I am anxious to merge this branch because of major changes in microstate lists. The sooner we announce this correction the better. |
Thanks, @MehtapIsik . I have the e-mail list; would you like to do the honors? LMK on Slack. |
Also, @MehtapIsik - please go ahead and do a release. Tag me if you need anything. |
Release version 1.4 is done. |
I have updated the microstate lists of pKa challenge molecules, according to following corrections:
We should release these microstate lists with major updates as Version 1.4 of this repo, as soon as possible.