Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update microstates of pKa challenge #20

Merged
merged 25 commits into from
Nov 28, 2017
Merged

Conversation

MehtapIsik
Copy link
Collaborator

I have updated the microstate lists of pKa challenge molecules, according to following corrections:

  1. Incorporated missing microstates according to suggestions from participants.
  2. Removed duplicate microstates (due to resonance structures and geometric isomers) by visual inspection.
  3. Removed duplicate microstates using an OpenEye script based on @bannanc's method and Geoff Skillman's suggestion to compare the number of hydrogens bound to heavy atoms to detect resonance structures.

We should release these microstate lists with major updates as Version 1.4 of this repo, as soon as possible.

Copy link
Contributor

@davidlmobley davidlmobley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me in terms of organization/content, but I'm not doing any checking on the states. Should we get @bannanc to review also or go ahead and merge?

Also, what do you think -- is this a substantial enough change that we should send out an e-mail blast with news of it (once merged) too in case anyone is not watching the GitHub repo?

@MehtapIsik
Copy link
Collaborator Author

@davidlmobley I removed many replicate microstates (about 25% of initial microstates list ) and added some missing microstates. I think once we merge this PR, it will be appropriate to send out an e-mail blast stating following changes:

  1. Microstate files of pKa challenge (SMXX_microstates.csv files) were updated. Some microstates of the initial version were removed and some missing microstates are added.
  2. Deprecated microstates were replicates of others (resonance structures or stereoisomers). Deprecated microstates and their microstate IDs were recorded in SMXX_microstates_deprecated.csv files with "deprecated" comment in "remarks" column. Their microstate IDs were preserved.
  3. Example submission files were added to illustrate expected format when filling 3 types of submission templates of the pKa challenge.

@MehtapIsik
Copy link
Collaborator Author

I am done with my updates on this branch.
@bannanc Could you please have a look at the updated microstates and example submission files when you get a chance before we merge?

@davidlmobley
Copy link
Contributor

OK, thanks. Tag me once @bannanc reviews and we can send out an e-mail blast. In the meantime I'll ask for the latest e-mail list (I don't have access).

Thanks.

@@ -158,6 +167,9 @@ These and blank lines will be ignored.

The file must contain the following four components in the following order: your predictions, a name for your computational protocol, a list of the major software packages used, and a long-form methods description.
Each of these components must begin with a line containing only the corresponding keyword: `Predictions:`, `Name:`, `Software:`, and `Method:`, as illustrated in the example files.

Example submission files can be found in `physical_properties/pKa/example_submission_files/` directory to illustrate expected format when filling submission templates of the pKa challenge.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this directory existed before this PR, could you make this a hyperlink so people can click on the path?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a link.

@@ -130,6 +130,15 @@ Newly added microstates will also be shared with participants and microstates li
Please do not create a microstate ID yourself.
It is important that challenge organizers assign unique microstate IDs and keep track.

### Updates on microstates lists

Microstate SMILES strings and microstate IDs can be found in `physical_properties/pKa/microstates/` directory.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever possible, I think it is useful to provide links so people don't have to click around to figure out where you're pointing them.

If this directory existed before this PR, could you make this a hyperlink so people can click on the path?
If the directory is new a link will need to be added after this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I added a link.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bannanc FTR, one can add links in PRs that add the directory (a link is just formatted text). Relative links (like here) will be active even before merging since they'll just refer elsewhere in the branch; absolute links might not be active until merging.

Due to replicate and missing microstates present in the first release of microstate lists, we have updated `SMXX_microstates.csv` files with necessary corrections in Version 1.4 of this repository.
The main correction of this update was the removal of resonance structures that were causing dublicate representation of same microstates.
We have also added new microstates suggested by participants.
Newly added microstates were assigned unique microstate IDs, as recorded in `SMXX_microstates.csv` files.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing I said about the directory paths, it would be convenient for the file names to also link to the relevant file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are all in the directory mentioned at the beginning of paragraph. I added a link to the directory. I won't add link to the files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since it's multiple files you can't have each be a link.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

#
# PREDICTION SECTION
#
# Fill one typeI_microscopic_pKas_and_microstates.csv template for all molecules predicted with the same method.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you are trying to say here, but its worded in a way that is a little confusing. Maybe something like:

You may submit predictions from multiple methods, but people make a separate [file name] for each different method.
@davidlmobley help on wording?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fill one typeI_microscopic_pKas_and_microstates.csv template for all molecules predicted with the same method."

Agree this is awkward. Maybe, "Each submission (method) must have a completed typeI_microscopic_pKas_and_microstates.csv file for all molecules the prediction encompasses," if that's what's meant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked how you clarified this rule. I added these sentences to pKa_challenge_instructions.md

# Please include the values of key parameters, with units, and explain how any statistical uncertainties were estimated.
# Use as many lines of text as you need.
# All text following the "Method:" keyword will be regarded as part of your free text methods description.
Method:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be informative to have a real example here, maybe with the reference calculation protocol (or approximately that).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that if you can provide people an example of what these would look like it will probably help them (and help prevent an e-mail deluge when people start trying to submit).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created example submission files directory with 3 filled templates. Method section has a few sentences as example, not exactly the full description of a method.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I read your example when I wrote this initial comment. Maybe it is fine as is, in the past we put a method for our reference calculations here as an example of how much detail to include, but I guess its up to them how to describe it anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be nicer to have a full detail description here. I can add that in the future when Bas is done with reference calculations.

#
# PREDICTION SECTION
#
# Fill one typeII_microstate_fractional_populations.csv template file for all molecules and microstates you have predictions for.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment for this line in the typeI example.

# e.g. In column 3 report ln(fractional microstate population) at pH 2.10 of your predicted microstate.
# Use scientific notation with 3 decimals of precision (e.g. -2.30e0 or 3.14e-4).
# For microstates predicted to have 0 populations, instead of ln(0), report as "-infinity".
# Do not report error estimates in this section. You can report error estimates in Method secton.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this I would provide the same lines as below with uncertainty estimates for each measurement. Is that what you're expecting people to do? If you want people to have the option of reporting uncertainties maybe you should make an uncertainties: section for this file type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't plan to evaluate uncertainties of this submission type.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe just say that and not tell people they can report error estimates in the method section?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. This is reasonable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed " You can report error estimates in Method section." sentence from the template file.

# For molecules with multiple macroscopic pKas report each macroscopic pKa in a new line.
# Report pKa values to two decimal places (e.g. 10.71).
# Reporting the standard error of the mean (SEM) is optional and encouraged. If it is reported, SEM should be reported to two decimal places (e.g. 1.02).
# For values for which you don't have an estimate, leave that cell of the csv table empty.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this apply to both the uncertainty and the pKa?
For example, if I can only calculate macro pkas for 5 of the molecules can I still participate? Do I have to list all of the molecule IDs with empty cells next them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It applies both to uncertainty and pKa.

We can handle both cases for a molecule without prediction:

  1. Excluding a molecule ID line completely
  2. A line with just molecule ID followed by empty cvs cells

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool, I just wanted to make sure you had thought about it in a file parsing perspective.

@@ -0,0 +1,63 @@
# Submission Type II
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should specify at the top of each file that any line that begins with a # is considered a comment and will be ignored when parsing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. I will add this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1,11 +1,9 @@
microstate ID,canonical isomeric SMILES
SM01_micro001,c1cc2c(cc1O)c3c(o2)C(=[OH+])NCCC3
SM01_micro002,c1cc2c(cc1[O-])c3c(o2)C(=NCCC3)[O-]
SM01_micro003,c1cc2=[O+]C3=C(NCCCC3=c2cc1O)[O-]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to be honest, I'm not good enough at reading SMILES strings to check these so I'm going to trust your visualization and the openeye code we worked on together.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I am also relying on participants to tell us, if they catch another mistake.

@bannanc
Copy link

bannanc commented Nov 28, 2017

I had a few minor things that I think should be addressed before this is accepted.

I have a general question about resonance vs. microstates/tautomers. What if my method comes up with the resonance structure that doesn't agree with the microstate ID you have assigned it? Should I use the deprecated ID or is it my responsibility to figure out which ID it was supposed to be?

Related to that, if you can't use a deprecated microstate ID are you going to check IDs when files are submitted?

@MehtapIsik
Copy link
Collaborator Author

@bannanc I expect people to find the matching ID in the updated microstate file, i.e. find a resonance structure of the same microstate from the SMILES list. If they can't find it, it is they can email me their requested microstate in which case I will check if it is unique or resonance of others first.

But if they insist on using the deprecated microstate ID in their submissions, that would be fine too. When I analyze the prediction, I will pool together microstate IDs that are resonance structures of one another (sum their predicted populations).

@MehtapIsik
Copy link
Collaborator Author

@davidlmobley @bannanc I am done with these edits.

@@ -87,7 +87,7 @@ Participants are encouraged to submit their results in all or multiple submissio
#### Prediction Type I - microscopic pKas and related microstates
Predicting microscopic pKas and related microstate structures.
Different protonation states and tautomer combinations constitute different microstates.
- Fill one `typeI_microscopic_pKas_and_microstates.csv` template for all molecules.
- Fill one `typeI_microscopic_pKas_and_microstates.csv` template for all molecules predicted with one method. You may submit predictions from multiple methods, but you should fill a separate template file for each different method.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more clear, thanks!

Copy link

@bannanc bannanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few follow-up questions, but I think in general its fine to push these changes.

@MehtapIsik
Copy link
Collaborator Author

I think we addressed everything so far.

I am anxious to merge this branch because of major changes in microstate lists. The sooner we announce this correction the better.

@davidlmobley
Copy link
Contributor

Thanks, @MehtapIsik . I have the e-mail list; would you like to do the honors? LMK on Slack.

@davidlmobley davidlmobley merged commit 9be5023 into master Nov 28, 2017
@davidlmobley
Copy link
Contributor

Also, @MehtapIsik - please go ahead and do a release. Tag me if you need anything.

@davidlmobley davidlmobley deleted the pKa_microstate_corr branch November 28, 2017 19:58
@MehtapIsik
Copy link
Collaborator Author

Release version 1.4 is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants