-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added PPI section with MHC subsection #638
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - only minor suggestions. My one fear is that this is fairly technically detailed, perhaps moreso than a lot of the rest of the ms.
sections/04_study.md
Outdated
However, because many PPIs are transient or dependent on biological context, high-throughput methods can fail to capture a number of interactions. | ||
Additionally, common types of high-throughput screens for PPIs, such as the yeast two-hybrid, can have issues with high rates of false positive results [@doi:10.1186/s12964-015-0116-8 @doi:10.1002/pmic.200800150]. | ||
|
||
This section will focus on advances in *de novo* PPI prediction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might benefit from a more explicit linking statement of the need for PPI prediction and thus DL.
sections/04_study.md
Outdated
Beyond predicting whether or not two proteins interact, Du et al. [@doi:10.1016/j.ymeth.2016.06.001] showed that a tandem stacked-autoencoder/deep-neural-network method could be used to predict residue contacts for the interfacial regions of interacting proteins. | ||
A combination of a hidden Markov model with Fisher scores yielded uniform-length features for each residue. Their method significantly exceeded classical machine learning accuracy. | ||
|
||
Because many studies used predefined higher-level features, one of the benefits of deep learning— automatic feature extraction— is not fully leveraged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before emdash
sections/04_study.md
Outdated
Because MHCnuggets had to be trained for every MHC allele, performance was far better for alleles with abundant, balanced training data. | ||
|
||
In a comparison of several current methods, Bhattacharya et al. found that the top methods— NetMHC, NetMHCpan, MHCflurry, and MHCnuggets— showed comparable performance, but large differences in speed. | ||
In the authors analysis, convolutional neural networks (in this case, HLA-CNN) showed comparatively poor performance, while shallow and recurrent neural networks performed the best. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete "in the authors analysis" as unnecessary
@cgreene I had planned to incorporate those changes here. Sorry for the delay in getting those updates ready. I want to get the section finished ASAP, and I hope to push some changes by the beginning of next week. |
@zietzm 👍 will wait to review further until then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great contributions. I have several suggestions, and my main comment is to think about how to summarize the many MHC methods. In some places I tried trimming text that isn't critical.
@cgreene do you think we need to further shorten this section? I think that if we can condense some of the MHC paragraphs we'll be okay.
content/04.study.md
Outdated
@@ -424,6 +424,92 @@ summarized above also apply to interfacial contact prediction for protein | |||
complexes but may be less effective since on average protein complexes have | |||
fewer sequence homologs. | |||
|
|||
### Protein-Protein Interactions | |||
|
|||
Protein-protein interactions (PPIs) are highly specific and non-accidental physical contacts between proteins which occur for purposes other than generic protein production or degradation [@doi:10.1371/journal.pcbi.1000807]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comma before which
content/04.study.md
Outdated
@@ -424,6 +424,92 @@ summarized above also apply to interfacial contact prediction for protein | |||
complexes but may be less effective since on average protein complexes have | |||
fewer sequence homologs. | |||
|
|||
### Protein-Protein Interactions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize only the first Protein
content/04.study.md
Outdated
### Protein-Protein Interactions | ||
|
||
Protein-protein interactions (PPIs) are highly specific and non-accidental physical contacts between proteins which occur for purposes other than generic protein production or degradation [@doi:10.1371/journal.pcbi.1000807]. | ||
PPIs are key to many cellular processes like metabolism and immune responses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PPIs are involved in almost all cellular processes. Perhaps we could cut this line? I'm looking for places to shorten the text.
content/04.study.md
Outdated
PPIs are key to many cellular processes like metabolism and immune responses. | ||
Abundant interaction data have been generated in-part thanks to advances in high-throughput screening methods, such as yeast two-hybrid and affinity-purification with mass spectrometry. | ||
However, because many PPIs are transient or dependent on biological context, high-throughput methods can fail to capture a number of interactions. | ||
Additionally, common types of high-throughput screens for PPIs, such as the yeast two-hybrid, can have issues with high rates of false positive results [@doi:10.1186/s12964-015-0116-8 @doi:10.1002/pmic.200800150]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this line, the new manubot style requires ;
between references.
content/04.study.md
Outdated
Protein-protein interactions (PPIs) are highly specific and non-accidental physical contacts between proteins which occur for purposes other than generic protein production or degradation [@doi:10.1371/journal.pcbi.1000807]. | ||
PPIs are key to many cellular processes like metabolism and immune responses. | ||
Abundant interaction data have been generated in-part thanks to advances in high-throughput screening methods, such as yeast two-hybrid and affinity-purification with mass spectrometry. | ||
However, because many PPIs are transient or dependent on biological context, high-throughput methods can fail to capture a number of interactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence alone might be enough to motivate the need for PPI prediction. Then you could cut the line about false positive rates, because a reader might wonder whether computational predictions really have lower false positive rates than Y2H.
content/04.study.md
Outdated
A way of working with different network types was shown by Gligorijevic et al., [@doi:10.1101/223339] who developed a multimodal deep autoencoder, deepNF, to find a feature representation common among several different PPI networks. | ||
This common lower-level representation allows for the combination of various PPI data sources towards a single predictive task. | ||
An SVM classifier trained on the compressed features from the middle layer of the autoencoder outperformed previous methods in predicting protein function. | ||
The key advancement of this method is the use of deep learning to incorporate higher-order network information for protein function prediction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might already be clear enough from the rest of the paragraph. You could cut it.
content/04.study.md
Outdated
The key advancement of this method is the use of deep learning to incorporate higher-order network information for protein function prediction. | ||
|
||
Hamilton et al. addressed the issue of large, heterogeneous, and changing networks with an inductive approach called GraphSAGE [@arxiv:1706.02216v2]. | ||
By finding node embeddings through learned aggregator functions which describe the node and its neighbors in the network, the GraphSAGE approach allows for the generalization of the model to unknown nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change which
to that
.
content/04.study.md
Outdated
|
||
Hamilton et al. addressed the issue of large, heterogeneous, and changing networks with an inductive approach called GraphSAGE [@arxiv:1706.02216v2]. | ||
By finding node embeddings through learned aggregator functions which describe the node and its neighbors in the network, the GraphSAGE approach allows for the generalization of the model to unknown nodes. | ||
Generalization to unseen nodes is especially useful for PPI networks, as these networks represent various types of interactions between proteins in a variety of species, and they can be updated frequently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I'm following this. Isn't an unseen node a new protein in a PPI network? Do we encounter new proteins? Or is the idea that a trained model generalizes to new graphs?
content/04.study.md
Outdated
Hamilton et al. addressed the issue of large, heterogeneous, and changing networks with an inductive approach called GraphSAGE [@arxiv:1706.02216v2]. | ||
By finding node embeddings through learned aggregator functions which describe the node and its neighbors in the network, the GraphSAGE approach allows for the generalization of the model to unknown nodes. | ||
Generalization to unseen nodes is especially useful for PPI networks, as these networks represent various types of interactions between proteins in a variety of species, and they can be updated frequently. | ||
In a classification task for the prediction of protein function, Chen and Zhu [@arxiv:1710.10568v1] optimized this approach and enhanced the graph convolutional network with a preprocessing step to improve significantly both training time and prediction accuracy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the preprocessing step?
content/04.study.md
Outdated
They found that MHCnuggets — the recurrent neural network — was by far the fastest training among the top performing methods. | ||
In predicting interactions between proteins, deep learning has achieved state-of-the-art results and shows promise to overcome previous challenges in the field. | ||
|
||
### PPI networks and graph analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also discuss graph convolutions in the drug discovery section and could link those topics. We have a sentence Modern neural networks can operate directly on the molecular graph as input.
that could be changed to Modern neural networks, such as those discussed previously for PPI networks, can operate directly on the molecular graph as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, these are excellent revisions and address all of my initial comments. The only remaining items to resolve before merging are:
- two minor commas noted here
- decide what you'd like to do with the
####
header - resolve conflicts with master
content/04.study.md
Outdated
|
||
Shallow, feed-forward neural networks are competitive methods and have made progress toward pan-allele and pan-length peptide representations. | ||
Sequence alignment techniques are useful for representing variable-length peptides as uniform-length features [@doi:10.1110/ps.0239403; @doi:10.1093/bioinformatics/btv639]. | ||
For pan-allelic prediction, NetMHCpan [@doi:10.1007/s00251-008-0341-z; @doi:10.1186/s13073-016-0288-x] used a pseudo-sequence representation of the MHC class I molecule which included only polymorphic peptide contact residues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comma before which
content/04.study.md
Outdated
MHCflurry's imputation method increases its performance on poorly characterized alleles, making it competitive with NetMHCpan for this task. | ||
Kuksa et al. [@doi:10.1093/bioinformatics/btv371] developed a shallow, higher-order neural network (HONN) comprised of both mean and covariance hidden units to capture some of the higher-order dependencies between amino acid locations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement, the HONN makes sense now.
content/04.study.md
Outdated
|
||
An important challenge in PPI network prediction is the task of combining different networks and types of networks. | ||
A way of working with different network types was shown by Gligorijevic et al., [@doi:10.1101/223339] who developed a multimodal deep autoencoder, deepNF, to find a feature representation common among several different PPI networks. | ||
Gligorijevic et al., [@doi:10.1101/223339] developed a multimodal deep autoencoder, deepNF, to find a feature representation common among several different PPI networks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove the comma after et al.
", such as those discussed previously for PPI networks,"
@agitter thanks for your help on these sections! I think my last few commits should now have the PR ready. |
This build is based on 75f0dc2. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/deep-review/builds/325010006 https://travis-ci.org/greenelab/deep-review/jobs/325010007 [ci skip] The full commit message that triggered this build is copied below: Added PPI section with MHC subsection (#638) * Added PPI and MHC sections * Updates to PPI/MHC subsection * PPI network section * Updates to all PPI sections * Commas and header * Remove accidental newline * Re-add PPI section reference ", such as those discussed previously for PPI networks,"
This build is based on 75f0dc2. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/deep-review/builds/325010006 https://travis-ci.org/greenelab/deep-review/jobs/325010007 [ci skip] The full commit message that triggered this build is copied below: Added PPI section with MHC subsection (#638) * Added PPI and MHC sections * Updates to PPI/MHC subsection * PPI network section * Updates to all PPI sections * Commas and header * Remove accidental newline * Re-add PPI section reference ", such as those discussed previously for PPI networks,"
References #575 and includes MHC-peptide papers
Added a section on Protein-Protein Interactions (PPI) with a subsection on MHC-peptide binding prediction.
@agitter mentioned PPI networks as a possible area of interest in #575, but this has been neglected here for the sake of not adding too much to an already long section. If a PPI network subsection is still desired, I would be more than happy to add one, but I understand the necessity to minimize additional length being added to this paper.