Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill in more of the introduction #135

Merged
merged 12 commits into from
Dec 21, 2016
Merged

Fill in more of the introduction #135

merged 12 commits into from
Dec 21, 2016

Conversation

cgreene
Copy link
Member

@cgreene cgreene commented Nov 7, 2016

This is a work in progress and not quite ready for a final review (though if you want to make some edits or suggest some changes now anyway, feel free!). I'm working to flesh out some bits in the intro, and to give some background on deep learning, our process, etc.

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more comments later, perhaps after you finish your commits.

if there are unique challenges posed by biomedical data that render deep
learning methods more challenging or less fruitful.

`TODO: not sure if it should go here, but somewhere we should talk about how we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is interesting to discuss in the review, but I suggest it come later in the paper. Perhaps in a non-traditional Author Contributions section? I think it is distracting in the intro when we are trying to engage readers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agitter : agree - moved it after conclusions as a placeholder for now.

biological "cats" hidden in our data - the patterns that exist but that we don't
know to look for - and could act on them.

Deep learning has transformed image analysis, but researchers' initial forays
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very glad you added this. I often see something like "deep learning was great for ImageNet => deep learning will transform biology" in the introduction of papers, and we want to think critically about that.

@cgreene cgreene changed the title [WIP] Fill in more of the introduction Fill in more of the introduction Nov 8, 2016
@cgreene
Copy link
Member Author

cgreene commented Nov 8, 2016

@agitter : removed the [WIP] tag. Had some questions in the TODOs though and I'm wondering what you and other reviewers think. Thanks!

Computer scientists are now building many-layered neural networks from
collections of millions of images. In a famous example, scientists from Google
demonstrated that a neural network could learn to identify cats simply by
watching online videos [@doi:10.1109/ICASSP.2013.6639343]. Such approaches,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original reference to the Google cat study should actually be the following:

Building high-level features using large scale unsupervised learning. ICML'12. http://research.google.com/archive/unsupervised_icml2012.html

Unfortunately these conferences do not have DOI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - need some help figuring out what to do here. Paging reference/citation/licensing guru @dhimmel.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current DOI (https://doi.org/10.1109/icassp.2013.6639343) is actually correct. That is the version of record for "Building high-level features using large scale unsupervised learning".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhimmel : I don't have access to the CASSP full paper, but the author list is different (just Le, as opposed to the ICML paper). The full paper should actually be the ICML citation (this is the full list - http://icml.cc/2012/papers/ ). It looks like a series of PDFs on a random webserver.

Copy link
Collaborator

@dhimmel dhimmel Nov 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point! I overlooked the crucial detail that there are two distinct conferences: ICML and ICASSP. Here is the ICASSP PDF. It only has one author and looks to be a bit shorter. However, the abstract is nearly identical. The similarities are high enough that Google Scholar has (incorrectly perhaps) grouped these papers into a single record.

So we have a difficult decision to make. I'm leaning towards keeping the current ICASSP DOI, at least for now. While not the ideal reference, it's not terrible (same abstract). We can reevaluate if we have more papers whose metadata isn't in standardized repositories. I think this is a good example of why conferences should get their act together and start issuing DOIs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#140 is also a non-standard reference. I suggest we annotate these with some sort of TODO in the markdown and see how many we have at the end. That will tell us if we can use a one-off (or two-off...) solution or need something more general.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agitter, how about @url:http://openreview.net/pdf?id=Sk-oDY9ge for now? Then we'll see all the URL citations at the end and figure out a solution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhimmel Good idea. I created #143 for this so that we can use the proposed @url citation for the ICML paper here.

These means might contain drug combinations selected based on personalized
predictions.
Concurrent with this explosive growth in biomedical data, a new class of machine
learning algorithm has become widespread in the domain of image analysis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image and speech processing.

node, has inputs, an activation function, and outputs. Each value from the
inputs is usually multiplied by some weight and combined and summarized by the
activation function. The value of the activation function is then multiplied by
another set of weights to produce the output `TODO: we probably need a figure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree a figure is needed, and maybe even several sub-figures if subsequent sections want to describe a few popular DNN architectures: autoencoders, CNN, RNN, LSTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps similar to and citing http://www.asimovinstitute.org/neural-network-zoo/?

I also think we may want to be careful to not too closely reproduce the idea behind the introduction of Figure 1 in #28 since we are not necessarily focusing on specific applications of each architecture in the review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anybody know the people from @asimovinstitute who put the nn zoo together? I would be more comfortable citing a version of the figure deposited in zenodo or figshare than a webage that could change. The metadata for zenodo or figshare could provide a link to the webpage.

It might also be great to see if they want to participate in this specific aspect (or more broadly). I agree with @gwaygenomics that we don't want to dive deeply into specific architectures at this time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know him, but author Fjodor van Veen is responding to comments on that page. He may be receptive to adding this to zenodo or figshare to be cited because he wrote:

I’d strongly recommend citing original papers, but feel free to use the images depicted here. Conventional citation methods should suffice just fine. Thank you for your interest!

We would need to substantially simplify the figure though. It would take more time, but we could also introduce different architectures when they first appear in the review. Then we could have a figure that shows a specific example of biomedical input data, e.g. gene expression for multi-layer perceptron. That could make the architectures less abstract and directly show why they are useful for different types of data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't any illustrative talent. Should we search for appropriately-licensed content that we can reuse or adapt for some of these generic neural network figures (along the lines of the neural network zoo suggestion).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty handy with PGF/TikZ and vector illustration, and can make the figures if I have a description or cartoon of what is wanted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XieConnect @evancofer @gwaygenomics . Corresponded with the NN Zoo author and he is happy for us to use those images if we wish to. He does not have vector versions and is not planning to make more. Let's keep this option on our desks, and plan on some sort of illustration here - TBD in the future.

@@ -1,97 +1,155 @@
## Introduction

### Potential writing prompt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion: once we decide on other main subsections, maybe we can come up with one central guiding story here to motivate/orient the whole work, and to integrate different subsections that seem isolated. i.e., what is the ultimate vision? Maybe a patient comes into hospital, gets his molecules measured, grants access to his EHR. Then our deep learning based engine/knowledge-base gives diagnosis/categorization/medication advice. Meanwhile, his data will be "shared" and recorded into our system, allowing refinement of our deep learning system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a patient comes into hospital, gets his molecules measured, grants access to his EHR. Then our deep learning based engine/knowledge-base gives diagnosis/categorization/medication advice.

I like the idea but I think it would be better suited for short discussion about the future of deep learning in precision medicine "if everything lives up to the hype"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XieConnect It's a fun idea, but I agree it might fit better in the Discussion and future opportunities. Some of the papers don't fit well in that narrative, such as those oriented more toward chemoinformatics.

Copy link
Contributor

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments throughout - mainly structural suggestions. I stayed away from too many granular comments about style and flow (but there wouldn't have been many anyway)

@@ -1,97 +1,155 @@
## Introduction

### Potential writing prompt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a patient comes into hospital, gets his molecules measured, grants access to his EHR. Then our deep learning based engine/knowledge-base gives diagnosis/categorization/medication advice.

I like the idea but I think it would be better suited for short discussion about the future of deep learning in precision medicine "if everything lives up to the hype"

in data generation and analysis within the next decade
[@doi:10.1371/journal.pbio.1002195]. These data present new opportunities, but
also new challenges. We expect that algorithms to automatically extract
meaningful patterns and provide sufficient context to enable us to act will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how granular you'd like comments to be at this time...

I think act in this sentence is a bit too vague. Perhaps something like:

We expect that algorithms to automatically extract meaningful patterns and provide actionable knowledge allowing us to better treat, categorize, or study disease will be required.

I also think referencing the goal of the review early would set the tone of the review nicely.

I think its good to get words in the Github and not necessarily focus too much on the specific style yet so I will try to refrain from too many of these types of comments to not stall progress!

node, has inputs, an activation function, and outputs. Each value from the
inputs is usually multiplied by some weight and combined and summarized by the
activation function. The value of the activation function is then multiplied by
another set of weights to produce the output `TODO: we probably need a figure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps similar to and citing http://www.asimovinstitute.org/neural-network-zoo/?

I also think we may want to be careful to not too closely reproduce the idea behind the introduction of Figure 1 in #28 since we are not necessarily focusing on specific applications of each architecture in the review.

only become widespread to describe analysis methods in the last decade. For the
purposes of this review, we identify deep learning approaches as those that use
multi-layer neural networks to construct complex features from large-scale
datasets. `TODO: somehow, I feel like we should work in some of the early
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Although I probably wouldn't devote an entire paragraph to the history. "The first class of biomedical applications of deep learning include..." - maybe something like that would be sufficient

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stay away from the history, in part because I don't know all of the details well enough to give appropriate credit. There are also very early examples of neural networks in biology that we don't necessarily want to discuss (e.g. secondary structure prediction in 1988). We could refer to something like http://www.deeplearningbook.org/ to cover the history of neural networks instead of doing it ourselves.

I do think we need to clarify "deep" here, which you did. To me at least, the term has come to represent a set of strategies for constructing and training neural networks even if there are few layers.

Copy link
Collaborator

@evancofer evancofer Nov 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review has a comprehensive discussion of the history of neural networks.

Significant heterogeneity still remains within these four subtypes
[@doi:10.1200/JCO.2008.18.1370 @doi:10.1158/1078-0432.CCR-13-0583]. Given the
increasing wealth of molecular data available, it seems that a more
One important topic in the biomedical field is the accurate classification of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these blurbs included in the intro, or are they placeholders for the larger sections? I don't think they belong in the intro given that your last paragraph read like a concluding paragraph in an intro that setup discussion of these topics.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@traversc Wrote these for the intro. Maybe we need to shorten it, but the goal was to lay out the major questions and topics to orient the reader where we are going.


### Author contributions

`TODO: not sure if it should go here, but somewhere we should talk about how we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! I think that would be pretty cool too

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll revise the Introduction again later, but I think this can be merged soon. Perhaps leave some additional TODOs to remind us to revisit things like the neural network zoo figure and the Google cat citation.

watching online videos [@doi:10.1109/ICASSP.2013.6639343]. Such approaches,
termed deep learning, seem like a solution to the challenge presented by the
growth of data in biomedicine. Perhaps these algorithms could identify the
biological "cats" hidden in our data - the patterns that exist but that we don't
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a zoological sense, "cats" are biological already 😄

### If this happens, is deep learning required for any of it? Are we any closer
### because of the advent of deep learning?
Deep learning has transformed image analysis, but researchers' initial forays
into the use of these techniques in biomedicine have been relatively limited.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to get too granular, but "relative limited" suggests to me that not much has been tried. On the contrary, I think there have been many biomedical applications and well over 100 papers. We might say they are "less conclusive" or something.

node, has inputs, an activation function, and outputs. Each value from the
inputs is usually multiplied by some weight and combined and summarized by the
activation function. The value of the activation function is then multiplied by
another set of weights to produce the output `TODO: we probably need a figure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't any illustrative talent. Should we search for appropriately-licensed content that we can reuse or adapt for some of these generic neural network figures (along the lines of the neural network zoo suggestion).

only become widespread to describe analysis methods in the last decade. For the
purposes of this review, we identify deep learning approaches as those that use
multi-layer neural networks to construct complex features from large-scale
datasets. `TODO: somehow, I feel like we should work in some of the early
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stay away from the history, in part because I don't know all of the details well enough to give appropriate credit. There are also very early examples of neural networks in biology that we don't necessarily want to discuss (e.g. secondary structure prediction in 1988). We could refer to something like http://www.deeplearningbook.org/ to cover the history of neural networks instead of doing it ourselves.

I do think we need to clarify "deep" here, which you did. To me at least, the term has come to represent a set of strategies for constructing and training neural networks even if there are few layers.

@agitter agitter mentioned this pull request Nov 13, 2016
@cgreene cgreene merged commit 92edc3d into master Dec 21, 2016
@cgreene cgreene deleted the more-intro-work branch December 21, 2016 14:53
dhimmel pushed a commit to dhimmel/deep-review that referenced this pull request Nov 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants