-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill in more of the introduction #135
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add more comments later, perhaps after you finish your commits.
if there are unique challenges posed by biomedical data that render deep | ||
learning methods more challenging or less fruitful. | ||
|
||
`TODO: not sure if it should go here, but somewhere we should talk about how we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is interesting to discuss in the review, but I suggest it come later in the paper. Perhaps in a non-traditional Author Contributions section? I think it is distracting in the intro when we are trying to engage readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agitter : agree - moved it after conclusions as a placeholder for now.
biological "cats" hidden in our data - the patterns that exist but that we don't | ||
know to look for - and could act on them. | ||
|
||
Deep learning has transformed image analysis, but researchers' initial forays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very glad you added this. I often see something like "deep learning was great for ImageNet => deep learning will transform biology" in the introduction of papers, and we want to think critically about that.
@agitter : removed the [WIP] tag. Had some questions in the TODOs though and I'm wondering what you and other reviewers think. Thanks! |
Computer scientists are now building many-layered neural networks from | ||
collections of millions of images. In a famous example, scientists from Google | ||
demonstrated that a neural network could learn to identify cats simply by | ||
watching online videos [@doi:10.1109/ICASSP.2013.6639343]. Such approaches, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original reference to the Google cat study should actually be the following:
Building high-level features using large scale unsupervised learning. ICML'12. http://research.google.com/archive/unsupervised_icml2012.html
Unfortunately these conferences do not have DOI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea - need some help figuring out what to do here. Paging reference/citation/licensing guru @dhimmel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current DOI (https://doi.org/10.1109/icassp.2013.6639343) is actually correct. That is the version of record for "Building high-level features using large scale unsupervised learning".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhimmel : I don't have access to the CASSP full paper, but the author list is different (just Le, as opposed to the ICML paper). The full paper should actually be the ICML citation (this is the full list - http://icml.cc/2012/papers/ ). It looks like a series of PDFs on a random webserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point! I overlooked the crucial detail that there are two distinct conferences: ICML and ICASSP. Here is the ICASSP PDF. It only has one author and looks to be a bit shorter. However, the abstract is nearly identical. The similarities are high enough that Google Scholar has (incorrectly perhaps) grouped these papers into a single record.
So we have a difficult decision to make. I'm leaning towards keeping the current ICASSP DOI, at least for now. While not the ideal reference, it's not terrible (same abstract). We can reevaluate if we have more papers whose metadata isn't in standardized repositories. I think this is a good example of why conferences should get their act together and start issuing DOIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#140 is also a non-standard reference. I suggest we annotate these with some sort of TODO in the markdown and see how many we have at the end. That will tell us if we can use a one-off (or two-off...) solution or need something more general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agitter, how about @url:http://openreview.net/pdf?id=Sk-oDY9ge
for now? Then we'll see all the URL citations at the end and figure out a solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These means might contain drug combinations selected based on personalized | ||
predictions. | ||
Concurrent with this explosive growth in biomedical data, a new class of machine | ||
learning algorithm has become widespread in the domain of image analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image and speech processing.
node, has inputs, an activation function, and outputs. Each value from the | ||
inputs is usually multiplied by some weight and combined and summarized by the | ||
activation function. The value of the activation function is then multiplied by | ||
another set of weights to produce the output `TODO: we probably need a figure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a figure is needed, and maybe even several sub-figures if subsequent sections want to describe a few popular DNN architectures: autoencoders, CNN, RNN, LSTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps similar to and citing http://www.asimovinstitute.org/neural-network-zoo/?
I also think we may want to be careful to not too closely reproduce the idea behind the introduction of Figure 1 in #28 since we are not necessarily focusing on specific applications of each architecture in the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anybody know the people from @asimovinstitute who put the nn zoo together? I would be more comfortable citing a version of the figure deposited in zenodo or figshare than a webage that could change. The metadata for zenodo or figshare could provide a link to the webpage.
It might also be great to see if they want to participate in this specific aspect (or more broadly). I agree with @gwaygenomics that we don't want to dive deeply into specific architectures at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know him, but author Fjodor van Veen is responding to comments on that page. He may be receptive to adding this to zenodo or figshare to be cited because he wrote:
I’d strongly recommend citing original papers, but feel free to use the images depicted here. Conventional citation methods should suffice just fine. Thank you for your interest!
We would need to substantially simplify the figure though. It would take more time, but we could also introduce different architectures when they first appear in the review. Then we could have a figure that shows a specific example of biomedical input data, e.g. gene expression for multi-layer perceptron. That could make the architectures less abstract and directly show why they are useful for different types of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't any illustrative talent. Should we search for appropriately-licensed content that we can reuse or adapt for some of these generic neural network figures (along the lines of the neural network zoo suggestion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty handy with PGF/TikZ and vector illustration, and can make the figures if I have a description or cartoon of what is wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XieConnect @evancofer @gwaygenomics . Corresponded with the NN Zoo author and he is happy for us to use those images if we wish to. He does not have vector versions and is not planning to make more. Let's keep this option on our desks, and plan on some sort of illustration here - TBD in the future.
@@ -1,97 +1,155 @@ | |||
## Introduction | |||
|
|||
### Potential writing prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion: once we decide on other main subsections, maybe we can come up with one central guiding story here to motivate/orient the whole work, and to integrate different subsections that seem isolated. i.e., what is the ultimate vision? Maybe a patient comes into hospital, gets his molecules measured, grants access to his EHR. Then our deep learning based engine/knowledge-base gives diagnosis/categorization/medication advice. Meanwhile, his data will be "shared" and recorded into our system, allowing refinement of our deep learning system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a patient comes into hospital, gets his molecules measured, grants access to his EHR. Then our deep learning based engine/knowledge-base gives diagnosis/categorization/medication advice.
I like the idea but I think it would be better suited for short discussion about the future of deep learning in precision medicine "if everything lives up to the hype"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XieConnect It's a fun idea, but I agree it might fit better in the Discussion and future opportunities. Some of the papers don't fit well in that narrative, such as those oriented more toward chemoinformatics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments throughout - mainly structural suggestions. I stayed away from too many granular comments about style and flow (but there wouldn't have been many anyway)
@@ -1,97 +1,155 @@ | |||
## Introduction | |||
|
|||
### Potential writing prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a patient comes into hospital, gets his molecules measured, grants access to his EHR. Then our deep learning based engine/knowledge-base gives diagnosis/categorization/medication advice.
I like the idea but I think it would be better suited for short discussion about the future of deep learning in precision medicine "if everything lives up to the hype"
in data generation and analysis within the next decade | ||
[@doi:10.1371/journal.pbio.1002195]. These data present new opportunities, but | ||
also new challenges. We expect that algorithms to automatically extract | ||
meaningful patterns and provide sufficient context to enable us to act will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how granular you'd like comments to be at this time...
I think act
in this sentence is a bit too vague. Perhaps something like:
We expect that algorithms to automatically extract meaningful patterns and provide actionable knowledge allowing us to better treat, categorize, or study disease will be required.
I also think referencing the goal of the review early would set the tone of the review nicely.
I think its good to get words in the Github and not necessarily focus too much on the specific style yet so I will try to refrain from too many of these types of comments to not stall progress!
node, has inputs, an activation function, and outputs. Each value from the | ||
inputs is usually multiplied by some weight and combined and summarized by the | ||
activation function. The value of the activation function is then multiplied by | ||
another set of weights to produce the output `TODO: we probably need a figure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps similar to and citing http://www.asimovinstitute.org/neural-network-zoo/?
I also think we may want to be careful to not too closely reproduce the idea behind the introduction of Figure 1 in #28 since we are not necessarily focusing on specific applications of each architecture in the review.
only become widespread to describe analysis methods in the last decade. For the | ||
purposes of this review, we identify deep learning approaches as those that use | ||
multi-layer neural networks to construct complex features from large-scale | ||
datasets. `TODO: somehow, I feel like we should work in some of the early |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Although I probably wouldn't devote an entire paragraph to the history. "The first class of biomedical applications of deep learning include..." - maybe something like that would be sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would stay away from the history, in part because I don't know all of the details well enough to give appropriate credit. There are also very early examples of neural networks in biology that we don't necessarily want to discuss (e.g. secondary structure prediction in 1988). We could refer to something like http://www.deeplearningbook.org/ to cover the history of neural networks instead of doing it ourselves.
I do think we need to clarify "deep" here, which you did. To me at least, the term has come to represent a set of strategies for constructing and training neural networks even if there are few layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This review has a comprehensive discussion of the history of neural networks.
Significant heterogeneity still remains within these four subtypes | ||
[@doi:10.1200/JCO.2008.18.1370 @doi:10.1158/1078-0432.CCR-13-0583]. Given the | ||
increasing wealth of molecular data available, it seems that a more | ||
One important topic in the biomedical field is the accurate classification of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these blurbs included in the intro, or are they placeholders for the larger sections? I don't think they belong in the intro given that your last paragraph read like a concluding paragraph in an intro that setup discussion of these topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@traversc Wrote these for the intro. Maybe we need to shorten it, but the goal was to lay out the major questions and topics to orient the reader where we are going.
|
||
### Author contributions | ||
|
||
`TODO: not sure if it should go here, but somewhere we should talk about how we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! I think that would be pretty cool too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll revise the Introduction again later, but I think this can be merged soon. Perhaps leave some additional TODOs to remind us to revisit things like the neural network zoo figure and the Google cat citation.
watching online videos [@doi:10.1109/ICASSP.2013.6639343]. Such approaches, | ||
termed deep learning, seem like a solution to the challenge presented by the | ||
growth of data in biomedicine. Perhaps these algorithms could identify the | ||
biological "cats" hidden in our data - the patterns that exist but that we don't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a zoological sense, "cats" are biological already 😄
### If this happens, is deep learning required for any of it? Are we any closer | ||
### because of the advent of deep learning? | ||
Deep learning has transformed image analysis, but researchers' initial forays | ||
into the use of these techniques in biomedicine have been relatively limited. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to get too granular, but "relative limited" suggests to me that not much has been tried. On the contrary, I think there have been many biomedical applications and well over 100 papers. We might say they are "less conclusive" or something.
node, has inputs, an activation function, and outputs. Each value from the | ||
inputs is usually multiplied by some weight and combined and summarized by the | ||
activation function. The value of the activation function is then multiplied by | ||
another set of weights to produce the output `TODO: we probably need a figure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't any illustrative talent. Should we search for appropriately-licensed content that we can reuse or adapt for some of these generic neural network figures (along the lines of the neural network zoo suggestion).
only become widespread to describe analysis methods in the last decade. For the | ||
purposes of this review, we identify deep learning approaches as those that use | ||
multi-layer neural networks to construct complex features from large-scale | ||
datasets. `TODO: somehow, I feel like we should work in some of the early |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would stay away from the history, in part because I don't know all of the details well enough to give appropriate credit. There are also very early examples of neural networks in biology that we don't necessarily want to discuss (e.g. secondary structure prediction in 1988). We could refer to something like http://www.deeplearningbook.org/ to cover the history of neural networks instead of doing it ourselves.
I do think we need to clarify "deep" here, which you did. To me at least, the term has come to represent a set of strategies for constructing and training neural networks even if there are few layers.
Fill in more of the introduction
This is a work in progress and not quite ready for a final review (though if you want to make some edits or suggest some changes now anyway, feel free!). I'm working to flesh out some bits in the intro, and to give some background on deep learning, our process, etc.