Multivariate Classification Using a Feed-Forward Neural Network and Backpropagation

The code has been written in Matlab (basic version with no extra toolboxes) and tested on version 8.3 (R2014a). The main reference for the equations used are How the backpropagation algorithm works and Improving the way neural networks learn, respectively Chapter 2 and 3 from the book Neural Networks and Deep Learning.

FFNN characteristics

Works with any number of inputs/classes.
Variable number of nodes in the input layer.
Variable number of nodes in the output layer.
Variable number of hidden layers, each one with a variable number of nodes.
Logistic sigmoid activation for hidden/output layers.
Cross-entropy with L2 regularization as cost function.
Variable learning rate.
No early stop criterion.

List of functions, parameters, and main variables

Functions	Use
NNBP	Main function (script)
ReadData	Read the data from a text file and return the training/validation/test datasets
FeedForward	Perform the feedforward step
f_activation	Sigmoid activation function
f1_activation	First derivative of the sigmoid activation function
CostFunction	Return the basic cost function for a full dataset
Results	Determine the actual output activation vector and the accuracy for a full dataset

Parameters	Use
nL	Row-vector defining the number of nodes
name	Name of the text file with the dataset to analyze
split	Row-vector defining the number of data in the training/validation/test datasets
maxEpoch	Max. number of iterations
eta	Learning rate
etaCoeff	Coefficient for the learning rate strategy
lambda	Regularization parameter

Main Variables	Use
NNs	Array of structures with the data associated with each layer. The components are: W = weight matrix, B = bias vector, Z = weighted input vector, A = activation vector, D = delta error vector, dB = derivatives of the basic cost function wrt the biases, dW = derivatives of the basic cost function wrt the weights.
costTR,costVA,costTE	Cost function of the training, validation, and test datasets
accTR,accVA,accTE	Accuracy of the training, validation, and test datasets
InTR,InVA,InTE	Input of the training, validation, and test datasets
OutTR,OutVA,OutTE	Desired output of the training, validation, and test datasets
ResTR,ResVA,ResTE	Actual output of the training, validation, and test datasets

Input information

The code works with any number of inputs/classes as long as the dataset is organized correctly and the problem is a classification problem.
The layout of the neural network system is defined in the row-vector nL=[n1 n2 .... nL-1 nL], where: n1 is the number of nodes in the input layer; n2 to nL-1 are the number of nodes in the hidden layers; and nL is the number of nodes in the output layer.
Data must be in a text file (specified by parameter name), each column representing one of the inputs, and with data belonging to different classes in sequence (see examples). If the number of columns is larger than n1, then the extra column(s) are eliminated. The number of classes is defined by nL. The input data are mapped in the interval [-1,+1].
The data partition is defined in the row-vector split=[nTR nVA nTE]. Any value can be used as long as the sum is not larger than the number of data in each class. The three datasets are returned in matrixes InTR, InVA, and InTE. Data are taken sequentially (for instance if split=[15 5 8] then the first 28 rows of each class are used).
A very simple strategy is implemented to change the learning rate eta: at every 10% of the number of iterations, eta is recalculated as etaCoeff times eta.

Output information

Class membership is defined by a 1 in the corresponding column of matrixes OutTR, OutVA, and OutTE. For instance, for the Iris example it is: [1,0,0] for setosa, [0,1,0] for versicolor, and [0,0,1] for virginica.
The cost function for the three partitions is in costTR, costVA, and costTE. These values are plotted versus the iterations at the end of the computation.
The accuracy for the three partitions is in accTR, accVA, and accTE. These values are plotted versus the iterations at the end of the computation. The actual output of the system is assumed to be the highest activation value from all nodes in the output layer.
Matrixes ResTR, ResVA, and ResTE, contain the activation value of all nodes in the output layer at the last iteration.

Example 1: the Iris dataset

The Iris dataset has 3-classes (setosa, versicolor, and virginica), and the input data are organized in a 150 x 4 matrix, with 50 rows for each class. The number of nodes in the input layer can assume any value from 1 to 4, while the number of nodes in the output layer (i.e. the number of classes) must be 3. The parameters used in the example are:

nL = [4 5 3]
name = 'IrisDataset.txt'
split = [34 8 8]
maxEpoch = 500
eta = 2.0
etaCoeff = 0.75
lambda = 0.0

Layout: 4 input nodes, one hidden layer with 5 nodes, 3 nodes in the output layer.
Input data are split with a 68/16/16 percent ratio.
Every 10% of the iterations the learning rate is reduced by 25%, starting from an initial value of 2.
No regularization is used.

These values have been set after a quick tuning of the following hyper-parameters: number of hidden layers, number of nodes in the hidden layers, learning rate, coefficient of the learning rate strategy, regularization parameter. The cost function and accuracy of the training/validation datasets were used to evaluate the results and set the hyper-parameters.

The resulting cost functions and accuracies are here.

Example 2: the Wheat Seeds dataset

The Wheat Seeds dataset has 3-classes, and the input data are organized in a 210 x 7 matrix, with 70 rows for each class. The number of nodes in the input layer can assume any value from 1 to 7, while the number of nodes in the output layer (i.e. the number of classes) must be 3. The parameters used in the example are:

nL = [7 5 3]
name = 'WheatSeedsDataset.txt'
split = [50 10 10]
maxEpoch = 500
eta = 2.0
etaCoeff = 0.75
lambda = 0.0

Layout: 7 input nodes, one hidden layer with 5 nodes, 3 nodes in the output layer.
Input data are split with a 72/14/14 percent ratio.
Every 10% of the iterations the learning rate is reduced by 25%, starting from an initial value of 2.
No regularization is used.

These values have been set after a quick tuning of the following hyper-parameters: number of hidden layers, number of nodes in the hidden layers, learning rate, coefficient of the learning rate strategy, regularization parameter. The cost function and accuracy of the training/validation datasets were used to evaluate the results and set the hyper-parameters.

The resulting cost functions and accuracies are here.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Code_Matlab		Code_Matlab
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multivariate Classification Using a Feed-Forward Neural Network and Backpropagation

FFNN characteristics

List of functions, parameters, and main variables

Input information

Output information

Example 1: the Iris dataset

Example 2: the Wheat Seeds dataset

About

Languages

License

gabrielegilardi/ClassificationNN

Folders and files

Latest commit

History

Repository files navigation

Multivariate Classification Using a Feed-Forward Neural Network and Backpropagation

FFNN characteristics

List of functions, parameters, and main variables

Input information

Output information

Example 1: the Iris dataset

Example 2: the Wheat Seeds dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Languages