A deep learning package for supervised learning built in Julia.
The package is not yet registered so to download, run the command:
Pkg.clone("https://github.com/Wedg/Alice.jl.git")
There are 3 demos that display almost all the functionality. All the data for each one is contained in the repository.
Predict the digit label (0 to 9) of each image.
Feedforward Neural Network and Convolutional Neural Network - html file (view in browser) or
ipynb file (run in Jupyter)
Predict the object in each image from 4 classes (airplane, car, cat, dog).
Part 1 - Sparse Autoencoder - html file (view in browser) or
ipynb file (run in Jupyter)
Part 2 - Convolutional Neural Network - html file (view in browser) or
ipynb file (run in Jupyter)
NLP (Natural Language Processing) example. Learn feature representations of words through learning to predict the next word in a given sequence of words.
Word Embedding Network - html file (view in browser) or
ipynb file (run in Jupyter)
There are a number of types/structs defined that are used to build a neural network. There is a data container to place all training, validation and test sets in. And there are a number of network layers (input, hidden and output). The current available list is:
Data container:
- Data
Input Layer:
- InputLayer
Hidden Layers:
- FullyConnectedLayer
- WordEmbeddingLayer
- SparseEncoderLayer
- ConvolutionLayer
- MeanPoolLayer
- MaxPoolLayer
Output Layers:
- LinearOutputLayer
- MultiLinearOutputLayer
- LogisticOutputLayer
- SoftmaxOutputLayer
There are 4 methods available:
-
Data(X_train)
-
Data(X_train, y_train)
-
Data(X_train, y_train, X_val, y_val)
-
Data(X_train, y_train, X_val, y_val, X_test, y_test)
X_train
is an array of training data,y_train
is an array of target data for the training set,X_val
is an array of validation data,y_val
is an array of target data for the validation set,X_test
is an array of test data, andy_test
is an array of target data for the test set.The
X_~
arrays contain floating point numbers and can be 2-dimensional (num_feats x num_obs
) (i.e. number of featuresx
number of observations), 3-dimensional - e.g. greyscale images - (dim x dim x num_obs
), or 4-dimensional - e.g. colour images - (dim x dim x num_channels x num_obs
).The
y_~
arrays can contain either floating point numbers (for regression) or integers (for classification) and can either be a vector (for Linear or Logistic output) or an array (for MultiLinear or Softmax output).
Note that a reference to the original data is used as opposed to a copy for better memory management. So if the data is changed that will also change the data in this data container.
There is only 1 method:
-
InputLayer(databox, batch_size)
(
databox
is a data container described above andbatch_size
is an integer giving the number of observations of the training set in each mini-batch.)
Step 3 - Create hidden layers
Each hidden layer has its own constructor. Each constructor starts with the same two arguments. The first one (datatype
- a floating point data type) is optional (if excluded will default to Float64
) and the second (input_dims
- tuple of the size dimensions of the previous layer) is required.
Following the first two (or one if datatype
is left out) each hidden layer has it's own positional arguments (each shown below).
And following the positional arguments there are two optional keyword arguments - activation
(for choosing the activation function) and init
(for choosing the initialisation of the weights). These are described in more detail in the section below).
The constructor functions are (note that the square brackets are just indicating that the argument is optional):
-
FullyConnectedLayer([datatype, ]input_dims, fan_out)
(
fan_out
is the number of neurons in the layer) -
WordEmbeddingLayer([datatype, ]input_dims, vocab_size, num_feats)
(
vocab_size
is the number of words in the vocabulary andnum_feats
is the number of features / length of feature vector given to each word) -
SparseEncoderLayer([datatype, ]input_dims, fan_out, ρ, β)
(
ρ
is the sparsity parameter andβ
is the parameter controlling the weight of the sparsity penalty term) -
ConvolutionalLayer([datatype, ]input_dims, patch_dims)
(
patch_dims
is a tuple giving the size dimensions of the patch / filter used for the convolution operation) -
MeanPoolLayer([datatype, ]input_dims, stride)
-
MaxPoolLayer([datatype, ]input_dims, stride)
(
stride
is the pooling stride used for the pooling operation - either max or mean)
Hidden layer activation functions
The keyword argument activation
is used to select the activation function (if the layer has an activation - WordEmbeddingLayer, MeanPool and MaxPool do not). The default (i.e. applied if no selection is made is :logistic
) and the options are:
:logistic
- logistic:tanh
- hyperbolic tangent:relu
- rectified linear unit
Initialising weights in a hidden layer
The keyword argument init
is used to select the distribution which is sampled from to initialise the layer weights. Note that the bias values are set to zero. The option is to provide either a distribution (any distribution from the Distributions package e.g. Normal(0, 0.01)
or Uniform(-0.5, 0.5)
) or one of the named options which are the following:
:glorot_logistic_uniform
:glorot_logistic_normal
:glorot_tanh_uniform
:glorot_tanh_normal
:he_uniform
:he_normal
:lecun_uniform
:lecun_normal
The default selection (i.e. applied if no selection is made) is :glorot_logistic_uniform
for layers with logistic activations, :glorot_tanh_uniform
for layers with tanh activations, :he_uniform
for layers with relu activations, and Normal(0, 0.01)
for the WordEmbeddingLayer.
See here for my understanding of the merits of the different named options - html file (view in browser) or ipynb file (run in Jupyter)
These are constructed in a similar way to the hidden layers:
-
LinearOutputLayer([datatype, ]databox, input_dims)
-
MultiLinearOutputLayer([datatype, ]databox, input_dims)
-
LogisticOutputLayer([datatype, ]databox, input_dims)
-
SoftmaxOutputLayer([datatype, ]databox, input_dims, num_classes)
(
datatype
,databox
,input_dims
are as defined above andnum_classes
is an integer giving the number of classes / categories)
There is also the init
keyword argument that can be used in the same way as initialising the hidden layers. The output layers are initialised to zero by default.
There is 1 constructor function with 2 methods and 1 keyword argument:
-
NeuralNet(databox, layers)
-
NeuralNet(databox, layers, λ)
-
NeuralNet(databox, layers, λ, regularisation = :L2)
(
databox
is as defined above,layers
is a vector of layers (typically 1 input layer, many hidden layers, and 1 output layer),λ
is the regularisation parameter and ,regularisation
is the keyword argument for the type of regularisation - options are:L1
and:L2
(the default))
In the MNIST demo the first feedforward neural network is created as follows:
# Data Box and Input Layer
databox = Data(train_images, train_labels, val_images, val_labels)
batch_size = 128
input = InputLayer(databox, batch_size)
# Fully connected hidden layers
dim = 30
fc1 = FullyConnectedLayer(size(input), dim, activation = :tanh)
fc2 = FullyConnectedLayer(size(fc1), dim, activation = :tanh)
# Softmax Output Layer
num_classes = 10
output = SoftmaxOutputLayer(databox, size(fc2), num_classes)
# Model
λ = 1e-3 # Regularisation
net = NeuralNet(databox, [input, fc1, fc2, output], λ, regularisation=:L2)
This creates the following:
Neural Network
Training Data Dimensions - (28,28,50000)
Layers:
Layer 1 - InputLayer{Float64}, Dimensions - (28,28,128)
Layer 2 - FullyConnectedLayer{Float64}, Activation - tanh, Dimensions - (30,128)
Layer 3 - FullyConnectedLayer{Float64}, Activation - tanh, Dimensions - (30,128)
Layer 4 - SoftmaxOutputLayer{Float64,Int64}, Dimensions - (10,128)
There are 2 broad options for training:
train
function for stochastic mini-batch training by gradient descent with momentumtrain_nlopt
function for full batch training using the NLopt package that provides an interface to the open-source NLopt library for nonlinear optimisation
train
function:
-
train(net, num_epochs, α, μ[, nesterov = true, shuffle = false, last_train_every = 1, full_train_every = num_epochs, val_every = num_epochs])
(Positional arguments -
net
is the neural network,num_epochs
is the total number of epochs to run through,α
is the learning rate,μ
is the momentum parameter)
(Keyword arguments -nesterov
is whether to use Nesterov's accelerated gradient method (default istrue
, iffalse
uses standard momentum method),shuffle
is whether to randomly shuffle the data before each epoch (default isfalse
),last_train_every
selects the epoch intervals to display the last batch training error (default is1
i.e. every epoch),full_train_every
selects the epoch intervals to display the loss on the full training set (default isnum_epochs
i.e. only at the end), andval_every
selects the epoch intervals to display the loss on the validation set (default isnum_epochs
i.e. only at the end))
train_nlopt
function:
-
train_nlopt(net[, maxiter, algorithm])
(Positional arguments -
net
is the neural network)
(Keyword arguments -maxiter
is the maximum number of iterations through the training set (will stop before that if a tolerance is achieved) (default is100
),algorithm
is any of the NLopt provided algorithms (default is:LD_LBFGS
))
If the train
function has been used there is a plotting function to display the training progress:
-
plot_loss_history(net, last_train_every, full_train_every, val_every)
(
net
is the neural net,last_train_every
,full_train_every
andval_every
are as defined above in thetrain
function but here they are just integers i.e. not keywords)
This will produce something like this:
The train
function will display the results of training on the training and test sets (unless you've chosen not to display). But to manually evaluate performance there are a number of functions:
loss(net, X, y)
- provides training loss on inputsX
and targety
e.g.X_train
andy_train
as defined aboveval_loss(net, X, y)
- provides validation loss (without the regularisation cost) onX
andy
loss_and_accuracy(net, X, y)
- provides training loss and accuracy percentage onX
andy
val_loss_and_accuracy(net, X, y)
- provides validation loss (without regularisation cost) and accuracy percentage onX
andy
accuracy(net, X, y)
- provides the accuracy percentage onX
andy
Note that the accuracy functions will only work for Logistic and Softmax output layers.