Fill in more of the introduction #135

cgreene · 2016-11-07T21:42:41Z

This is a work in progress and not quite ready for a final review (though if you want to make some edits or suggest some changes now anyway, feel free!). I'm working to flesh out some bits in the intro, and to give some background on deep learning, our process, etc.

agitter

I'll add more comments later, perhaps after you finish your commits.

agitter · 2016-11-08T14:55:57Z

sections/02_intro.md

+if there are unique challenges posed by biomedical data that render deep
+learning methods more challenging or less fruitful.
+
+`TODO: not sure if it should go here, but somewhere we should talk about how we


I agree this is interesting to discuss in the review, but I suggest it come later in the paper. Perhaps in a non-traditional Author Contributions section? I think it is distracting in the intro when we are trying to engage readers.

@agitter : agree - moved it after conclusions as a placeholder for now.

agitter · 2016-11-08T14:57:09Z

sections/02_intro.md

+biological "cats" hidden in our data - the patterns that exist but that we don't
+know to look for - and could act on them.
+
+Deep learning has transformed image analysis, but researchers' initial forays


I'm very glad you added this. I often see something like "deep learning was great for ImageNet => deep learning will transform biology" in the introduction of papers, and we want to think critically about that.

cgreene · 2016-11-08T16:17:21Z

@agitter : removed the [WIP] tag. Had some questions in the TODOs though and I'm wondering what you and other reviewers think. Thanks!

XieConnect · 2016-11-08T16:26:00Z

sections/02_intro.md

+Computer scientists are now building many-layered neural networks from
+collections of millions of images. In a famous example, scientists from Google
+demonstrated that a neural network could learn to identify cats simply by
+watching online videos [@doi:10.1109/ICASSP.2013.6639343]. Such approaches,


The original reference to the Google cat study should actually be the following:

Building high-level features using large scale unsupervised learning. ICML'12. http://research.google.com/archive/unsupervised_icml2012.html

Unfortunately these conferences do not have DOI.

Yea - need some help figuring out what to do here. Paging reference/citation/licensing guru @dhimmel.

The current DOI (https://doi.org/10.1109/icassp.2013.6639343) is actually correct. That is the version of record for "Building high-level features using large scale unsupervised learning".

@dhimmel : I don't have access to the CASSP full paper, but the author list is different (just Le, as opposed to the ICML paper). The full paper should actually be the ICML citation (this is the full list - http://icml.cc/2012/papers/ ). It looks like a series of PDFs on a random webserver.

Ah good point! I overlooked the crucial detail that there are two distinct conferences: ICML and ICASSP. Here is the ICASSP PDF. It only has one author and looks to be a bit shorter. However, the abstract is nearly identical. The similarities are high enough that Google Scholar has (incorrectly perhaps) grouped these papers into a single record.

So we have a difficult decision to make. I'm leaning towards keeping the current ICASSP DOI, at least for now. While not the ideal reference, it's not terrible (same abstract). We can reevaluate if we have more papers whose metadata isn't in standardized repositories. I think this is a good example of why conferences should get their act together and start issuing DOIs.

#140 is also a non-standard reference. I suggest we annotate these with some sort of TODO in the markdown and see how many we have at the end. That will tell us if we can use a one-off (or two-off...) solution or need something more general.

@agitter, how about @url:http://openreview.net/pdf?id=Sk-oDY9ge for now? Then we'll see all the URL citations at the end and figure out a solution.

@dhimmel Good idea. I created #143 for this so that we can use the proposed @url citation for the ICML paper here.

XieConnect · 2016-11-08T16:28:47Z

sections/02_intro.md

-These means might contain drug combinations selected based on personalized
-predictions.
+Concurrent with this explosive growth in biomedical data, a new class of machine
+learning algorithm has become widespread in the domain of image analysis.


image and speech processing.

XieConnect · 2016-11-08T16:42:13Z

sections/02_intro.md

+node, has inputs, an activation function, and outputs. Each value from the
+inputs is usually multiplied by some weight and combined and summarized by the
+activation function. The value of the activation function is then multiplied by
+another set of weights to produce the output `TODO: we probably need a figure


I agree a figure is needed, and maybe even several sub-figures if subsequent sections want to describe a few popular DNN architectures: autoencoders, CNN, RNN, LSTM.

perhaps similar to and citing http://www.asimovinstitute.org/neural-network-zoo/?

I also think we may want to be careful to not too closely reproduce the idea behind the introduction of Figure 1 in #28 since we are not necessarily focusing on specific applications of each architecture in the review.

Does anybody know the people from @asimovinstitute who put the nn zoo together? I would be more comfortable citing a version of the figure deposited in zenodo or figshare than a webage that could change. The metadata for zenodo or figshare could provide a link to the webpage.

It might also be great to see if they want to participate in this specific aspect (or more broadly). I agree with @gwaygenomics that we don't want to dive deeply into specific architectures at this time.

I don't know him, but author Fjodor van Veen is responding to comments on that page. He may be receptive to adding this to zenodo or figshare to be cited because he wrote:

I’d strongly recommend citing original papers, but feel free to use the images depicted here. Conventional citation methods should suffice just fine. Thank you for your interest!

We would need to substantially simplify the figure though. It would take more time, but we could also introduce different architectures when they first appear in the review. Then we could have a figure that shows a specific example of biomedical input data, e.g. gene expression for multi-layer perceptron. That could make the architectures less abstract and directly show why they are useful for different types of data.

I don't any illustrative talent. Should we search for appropriately-licensed content that we can reuse or adapt for some of these generic neural network figures (along the lines of the neural network zoo suggestion).

I am pretty handy with PGF/TikZ and vector illustration, and can make the figures if I have a description or cartoon of what is wanted.

Thanks @XieConnect @evancofer @gwaygenomics . Corresponded with the NN Zoo author and he is happy for us to use those images if we wish to. He does not have vector versions and is not planning to make more. Let's keep this option on our desks, and plan on some sort of illustration here - TBD in the future.

XieConnect · 2016-11-08T16:51:39Z

sections/02_intro.md

@@ -1,97 +1,155 @@
 ## Introduction

-### Potential writing prompt


One suggestion: once we decide on other main subsections, maybe we can come up with one central guiding story here to motivate/orient the whole work, and to integrate different subsections that seem isolated. i.e., what is the ultimate vision? Maybe a patient comes into hospital, gets his molecules measured, grants access to his EHR. Then our deep learning based engine/knowledge-base gives diagnosis/categorization/medication advice. Meanwhile, his data will be "shared" and recorded into our system, allowing refinement of our deep learning system.

Maybe a patient comes into hospital, gets his molecules measured, grants access to his EHR. Then our deep learning based engine/knowledge-base gives diagnosis/categorization/medication advice.

I like the idea but I think it would be better suited for short discussion about the future of deep learning in precision medicine "if everything lives up to the hype"

@XieConnect It's a fun idea, but I agree it might fit better in the Discussion and future opportunities. Some of the papers don't fit well in that narrative, such as those oriented more toward chemoinformatics.

gwaybio

Minor comments throughout - mainly structural suggestions. I stayed away from too many granular comments about style and flow (but there wouldn't have been many anyway)

gwaybio · 2016-11-09T15:28:58Z

sections/02_intro.md

@@ -1,97 +1,155 @@
 ## Introduction

-### Potential writing prompt


Maybe a patient comes into hospital, gets his molecules measured, grants access to his EHR. Then our deep learning based engine/knowledge-base gives diagnosis/categorization/medication advice.

I like the idea but I think it would be better suited for short discussion about the future of deep learning in precision medicine "if everything lives up to the hype"

gwaybio · 2016-11-09T15:34:56Z

sections/02_intro.md

+in data generation and analysis within the next decade
+[@doi:10.1371/journal.pbio.1002195]. These data present new opportunities, but
+also new challenges. We expect that algorithms to automatically extract
+meaningful patterns and provide sufficient context to enable us to act will be


Not sure how granular you'd like comments to be at this time...

I think act in this sentence is a bit too vague. Perhaps something like:

We expect that algorithms to automatically extract meaningful patterns and provide actionable knowledge allowing us to better treat, categorize, or study disease will be required.

I also think referencing the goal of the review early would set the tone of the review nicely.

I think its good to get words in the Github and not necessarily focus too much on the specific style yet so I will try to refrain from too many of these types of comments to not stall progress!

gwaybio · 2016-11-09T15:38:58Z

sections/02_intro.md

+node, has inputs, an activation function, and outputs. Each value from the
+inputs is usually multiplied by some weight and combined and summarized by the
+activation function. The value of the activation function is then multiplied by
+another set of weights to produce the output `TODO: we probably need a figure


perhaps similar to and citing http://www.asimovinstitute.org/neural-network-zoo/?

I also think we may want to be careful to not too closely reproduce the idea behind the introduction of Figure 1 in #28 since we are not necessarily focusing on specific applications of each architecture in the review.

gwaybio · 2016-11-09T15:44:14Z

sections/02_intro.md

+only become widespread to describe analysis methods in the last decade. For the
+purposes of this review, we identify deep learning approaches as those that use
+multi-layer neural networks to construct complex features from large-scale
+datasets. `TODO: somehow, I feel like we should work in some of the early


I agree. Although I probably wouldn't devote an entire paragraph to the history. "The first class of biomedical applications of deep learning include..." - maybe something like that would be sufficient

I would stay away from the history, in part because I don't know all of the details well enough to give appropriate credit. There are also very early examples of neural networks in biology that we don't necessarily want to discuss (e.g. secondary structure prediction in 1988). We could refer to something like http://www.deeplearningbook.org/ to cover the history of neural networks instead of doing it ourselves.

I do think we need to clarify "deep" here, which you did. To me at least, the term has come to represent a set of strategies for constructing and training neural networks even if there are few layers.

This review has a comprehensive discussion of the history of neural networks.

gwaybio · 2016-11-09T15:48:25Z

sections/02_intro.md

-Significant heterogeneity still remains within these four subtypes 
-[@doi:10.1200/JCO.2008.18.1370 @doi:10.1158/1078-0432.CCR-13-0583]. Given the 
-increasing wealth of molecular data available, it seems that a more 
+One important topic in the biomedical field is the accurate classification of


Are these blurbs included in the intro, or are they placeholders for the larger sections? I don't think they belong in the intro given that your last paragraph read like a concluding paragraph in an intro that setup discussion of these topics.

@traversc Wrote these for the intro. Maybe we need to shorten it, but the goal was to lay out the major questions and topics to orient the reader where we are going.

gwaybio · 2016-11-09T15:48:47Z

sections/07_conclusions.md

+
+### Author contributions
+
+`TODO: not sure if it should go here, but somewhere we should talk about how we


yes! I think that would be pretty cool too

agitter

We'll revise the Introduction again later, but I think this can be merged soon. Perhaps leave some additional TODOs to remind us to revisit things like the neural network zoo figure and the Google cat citation.

agitter · 2016-11-13T13:55:02Z

sections/02_intro.md

+watching online videos [@doi:10.1109/ICASSP.2013.6639343]. Such approaches,
+termed deep learning, seem like a solution to the challenge presented by the
+growth of data in biomedicine. Perhaps these algorithms could identify the
+biological "cats" hidden in our data - the patterns that exist but that we don't


In a zoological sense, "cats" are biological already 😄

agitter · 2016-11-13T13:57:14Z

sections/02_intro.md

-### If this happens, is deep learning required for any of it? Are we any closer
-### because of the advent of deep learning?
+Deep learning has transformed image analysis, but researchers' initial forays
+into the use of these techniques in biomedicine have been relatively limited.


I don't want to get too granular, but "relative limited" suggests to me that not much has been tried. On the contrary, I think there have been many biomedical applications and well over 100 papers. We might say they are "less conclusive" or something.

agitter · 2016-11-13T13:59:16Z

sections/02_intro.md

+node, has inputs, an activation function, and outputs. Each value from the
+inputs is usually multiplied by some weight and combined and summarized by the
+activation function. The value of the activation function is then multiplied by
+another set of weights to produce the output `TODO: we probably need a figure


I don't any illustrative talent. Should we search for appropriately-licensed content that we can reuse or adapt for some of these generic neural network figures (along the lines of the neural network zoo suggestion).

agitter · 2016-11-13T14:07:18Z

sections/02_intro.md

+only become widespread to describe analysis methods in the last decade. For the
+purposes of this review, we identify deep learning approaches as those that use
+multi-layer neural networks to construct complex features from large-scale
+datasets. `TODO: somehow, I feel like we should work in some of the early


I would stay away from the history, in part because I don't know all of the details well enough to give appropriate credit. There are also very early examples of neural networks in biology that we don't necessarily want to discuss (e.g. secondary structure prediction in 1988). We could refer to something like http://www.deeplearningbook.org/ to cover the history of neural networks instead of doing it ourselves.

I do think we need to clarify "deep" here, which you did. To me at least, the term has come to represent a set of strategies for constructing and training neural networks even if there are few layers.

Fill in more of the introduction

cgreene added 2 commits November 7, 2016 15:57

add initial context to intro

4cd5124

introduce our bio question

f58dc97

agitter reviewed Nov 8, 2016

View reviewed changes

cgreene added 5 commits November 8, 2016 11:07

fill in deep learning bit

578ce41

move discussion of paper writing to author contribs section

3abf64d

don't explicitly number list

8368610

work list directly into paragraph

353f0ad

remove some stray text

1abdde2

cgreene changed the title ~~[WIP] Fill in more of the introduction~~ Fill in more of the introduction Nov 8, 2016

XieConnect reviewed Nov 8, 2016

View reviewed changes

gwaybio suggested changes Nov 9, 2016

View reviewed changes

agitter approved these changes Nov 13, 2016

View reviewed changes

agitter mentioned this pull request Nov 13, 2016

URL citation guidelines #143

Merged

cgreene added 5 commits December 19, 2016 12:35

add correspondence w/ nn-zoo via @XieConnect / @gwaygenomics suggestion

e6f725f

add great review for history suggested by @evancofer

658fbd2

adjust language to be more precise via @agitter suggestion

449e45f

Merge remote-tracking branch 'origin' into more-intro-work

38583d3

add new @url citation based on @dhimmel/@XieConnect discussion

3548f66

gwaybio approved these changes Dec 21, 2016

View reviewed changes

cgreene merged commit 92edc3d into master Dec 21, 2016

cgreene deleted the more-intro-work branch December 21, 2016 14:53

agitter mentioned this pull request Apr 25, 2017

Are we making any figures/flowcharts? #354

Closed

dhimmel pushed a commit to dhimmel/deep-review that referenced this pull request Nov 3, 2017

Merge pull request greenelab#135 from greenelab/more-intro-work

c74db29

Fill in more of the introduction

		@@ -1,97 +1,155 @@
		## Introduction

		### Potential writing prompt


		### Author contributions

		`TODO: not sure if it should go here, but somewhere we should talk about how we

Fill in more of the introduction #135

Fill in more of the introduction #135

Conversation

cgreene commented Nov 7, 2016

agitter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgreene commented Nov 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhimmel Nov 8, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gwaybio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evancofer Nov 26, 2016 • edited by agitter Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agitter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhimmel Nov 8, 2016 •

edited

Loading

evancofer Nov 26, 2016 •

edited by agitter

Loading