greenelab · agitter · Jan 19, 2018 · Jan 17, 2018 · Jan 17, 2018 · Jan 18, 2018
diff --git a/content/02.intro.md b/content/02.intro.md
@@ -55,28 +55,29 @@ We sketch out a few simple example architectures in Figure @fig:nn-petting-zoo.
 If data have a natural adjacency structure, a convolutional neural network (CNN) can take advantage of that structure by emphasizing local relationships, especially when convolutional layers are used in early layers of the neural network.
 Other neural network architectures such as autoencoders require no labels and are now regularly used for unsupervised tasks.
 In this review, we do not exhaustively discuss the different types of deep neural network architectures; an overview of the principal terms used herein is given in Table @tbl:glossary.
-A recent book from Goodfellow et al. covers neural network architectures in detail [@url:http://www.deeplearningbook.org/].
-
-| Term | Definition |
-|----------------------------|-------------------------------------------------------------------------------------------|
-| Supervised learning | Machine-learning approaches with goal of prediction of labels or outcomes |
-| Unsupervised learning | Machine-learning approaches with goal of data summarization or pattern identification |
-| Neural network  (NN) | Machine-learning approach inspired by biological neurons where inputs are fed into one or more layers, producing an output layer |
-| Deep neural network | NN with multiple hidden layers. Training happens over the network, and consequently such architectures allow for feature construction to occur alongside optimization of the overall training objective. |
-| Feed-forward neural network (FFNN) | NN that does not have cycles between nodes in the same layer |
-| Multi-layer perceptron (MLP) | Type of FFNN with at least one hidden layer where each deeper layer is a nonlinear function of each earlier layer |
-| Convolutional neural network (CNN) | A NN with layers in which connectivity preserves local structure. These are used for sequence or grid data---such as images or equally-spaced time points. _If the data meet the underlying assumptions_ performance is often good, and such networks can require fewer examples to train effectively because they have fewer parameters and also provide improved efficiency. |
-| Recurrent neural network (RNN) | A neural network with cycles between nodes within a hidden layer. This NN architecture is used for sequential data---such as time series or genome sequences. |
-| Long short-term memory (LSTM) neural network | This special type of RNN has features that enable models constructed via this approach to capture longer-term dependencies. |
-| Autoencoder (AE) | A NN where the training objective is to minimize the error between the output layer and the input layer. Such neural networks are unsupervised and are often used for dimensionality reduction. |
-| Variational Autoencoder (VAE) | This special type of AE has the added constraint that the model is trained to learning normally distributed features. |
-| Denoising Autoencoder (DA) | This special type of AE includes a step where noise is added to the input during the training process. |
-| Generative neural network | Neural networks that fall into this class can be used to generate data similar to input data. These models can be sampled to produce hypothetical examples. |
-| Restricted Bolzmann machine (RBM) | Generative NN that forms the building block for many deep learning approaches, having a single input layer and a single hidden layer, with no connections between the nodes within each layer |
-| Deep belief network (DBN) | Generative NN with several hidden layers, which can be obtained from combining multiple RBMs |
-| Generative adversarial network (GAN) | A generative NN approach where two neural networks are trained. One neural network, the generator, is provided with a set of randomly generated inputs and tasked with generating samples. The second, the discriminator, is trained to differentiate real and generated samples. After the two neural networks are trained against each other, the resulting generator can be used to produce new examples. |
-| Adversarial training | A process by which artificial training examples are maliciously designed to fool a NN and then input as training examples to make the resulting NN robust (no relation to GANs) |
-| Data augmentation | A process by which transformations that do not affect relevant properties of the input data (e.g., arbitrary rotations of histopathology images) are applied to training examples to increase the size of the training set. |
+Table @tbl:glossary also provides select example applications, though in practice each neural network architecture has been broadly applied across multiple types of biomedical data.
+A recent book from Goodfellow et al. covers neural network architectures in detail [@url:http://www.deeplearningbook.org/], and LeCun et al. provide a more general introduction [@doi:10.1038/nature14539].
+
+| Term | Definition | Example applications |
+|----------------------------|-----------------------------|----------------------------------------------------------------|
+| Supervised learning | Machine-learning approaches with goal of prediction of labels or outcomes | |
+| Unsupervised learning | Machine-learning approaches with goal of data summarization or pattern identification | |
+| Neural network  (NN) | Machine-learning approach inspired by biological neurons where inputs are fed into one or more layers, producing an output layer | |
+| Deep neural network | NN with multiple hidden layers. Training happens over the network, and consequently such architectures allow for feature construction to occur alongside optimization of the overall training objective. | |
+| Feed-forward neural network (FFNN) | NN that does not have cycles between nodes in the same layer | Most of the examples below are special cases of FFNNs, except recurrent neural networks. |
+| Multi-layer perceptron (MLP) | Type of FFNN with at least one hidden layer where each deeper layer is a nonlinear function of each earlier layer | MLPs do not impose structure and are frequently used when there is no natural ordering of the inputs (e.g. as with gene expression measurements). |
+| Convolutional neural network (CNN) | A NN with layers in which connectivity preserves local structure. _If the data meet the underlying assumptions_ performance is often good, and such networks can require fewer examples to train effectively because they have fewer parameters and also provide improved efficiency. | CNNs are used for sequence data---such as DNA sequences---or grid data---such as medical and microscopy images. |
+| Recurrent neural network (RNN) | A neural network with cycles between nodes within a hidden layer. | The RNN architecture is used for sequential data---such as clinical time series and text or genome sequences. |
+| Long short-term memory (LSTM) neural network | This special type of RNN has features that enable models to capture longer-term dependencies. | LSTMs are gaining a substantial foothold in the analysis of natural language, and may become more widely applied to biological sequence data. |
+| Autoencoder (AE) | A NN where the training objective is to minimize the error between the output layer and the input layer. Such neural networks are unsupervised and are often used for dimensionality reduction. | Autoencoders have been used for unsupervised analysis of gene expression data as well as data extracted from the electronic health record. |
+| Variational autoencoder (VAE) | This special type of AE has the added constraint that the model is trained to learn normally-distributed features. | VAEs have a track record of producing a valuable reduced representation in the imaging domain, and some early publications have used VAEs to analyze gene expression data. |
+| Denoising autoencoder (DA) | This special type of AE includes a step where noise is added to the input during the training process. The denoising step acts as smoothing and may allow for effective use on  input data that is inherently noisy. | Like AEs, DAs have been used for unsupervised analysis of gene expression data as well as data extracted from the electronic health record. |
+| Generative neural network | Neural networks that fall into this class can be used to generate data similar to input data. These models can be sampled to produce hypothetical examples. | A number of the unsupervised learning neural network architectures that are summarized here can be used in a generative fashion. |
+| Restricted Boltzmann machine (RBM) | A generative NN that forms the building block for many deep learning approaches, having a single input layer and a single hidden layer, with no connections between the nodes within each layer | RBMs have been applied to combine multiple types of omic data (e.g. DNA methylation, mRNA expression, and miRNA expression). |
+| Deep belief network (DBN) | Generative NN with several hidden layers, which can be obtained from combining multiple RBMs | DBNs can be used to predict new relationships in a drug-target interaction network. |
+| Generative adversarial network (GAN) | A generative NN approach where two neural networks are trained. One neural network, the generator, is provided with a set of randomly generated inputs and tasked with generating samples. The second, the discriminator, is trained to differentiate real and generated samples. After the two neural networks are trained against each other, the resulting generator can be used to produce new examples. | GANs can synthesize new examples with the same statistical properties of datasets that contain individual-level records and are subject to sharing restrictions. They have also been applied to generate microscopy images. |
+| Adversarial training | A process by which artificial training examples are maliciously designed to fool a NN and then input as training examples to make the resulting NN robust (no relation to GANs) | Adversarial training has been used in image analysis. |
+| Data augmentation | A process by which transformations that do not affect relevant properties of the input data (e.g. arbitrary rotations of histopathology images) are applied to training examples to increase the size of the training set. | Data augmentation is widely used in the analysis of images because rotation transformations for biomedical images often do not change relevant properties of the image. |
 
 Table: Glossary.
 {#tbl:glossary}

diff --git a/content/03.categorize.md b/content/03.categorize.md
@@ -91,7 +91,7 @@ Ensembles of deep learning and human experts may help overcome some of the chall
 
 One source of training examples with rich phenotypical annotations is the EHR.
 Billing information in the form of ICD codes are simple annotations but phenotypic algorithms can combine laboratory tests, medication prescriptions, and patient notes to generate more reliable phenotypes.
-Recently, Lee et al.[@tag:Lee2016_emr_oct_amd] developed an approach to distinguish individuals with age-related macular degeneration from control individuals.
+Recently, Lee et al. [@tag:Lee2016_emr_oct_amd] developed an approach to distinguish individuals with age-related macular degeneration from control individuals.
 They trained a deep neural network on approximately 100,000 images extracted from structured electronic health records, reaching greater than 93% accuracy.
 The authors used their test set to evaluate when to stop training.
 In other domains, this has resulted in a minimal change in the estimated accuracy [@tag:Krizhevsky2013_nips_cnn], but we recommend the use of an independent test set whenever feasible.