The task is to classify whether a patient has diabetes(class 1) or not (class 0), based on the diagnostic measurements provided in the dataset, using logistic regression and neural network as the classifier. The dataset in use is the Pima Indians Diabetes Database(diabetes.csv). The code is written in Python.
Extract features values from the data: Process the original CSV data files into a Numpy matrix or Pandas Dataframe. For this we will first import the libraries. We will then use pandas library to load the CSV data to a pandas data frame.
Data Partitioning:
For this we will first separate the features and target, and then normalize our features. Using sklearn library’s train-test-split, we will partition our data into training, validation and testing data. Here we have randomly chosen 60% of the data for training, 20% for validation and the rest for testing.
Train using Logistic Regression:
We will then define a sigmoid function.
A sigmoid 20 function is an activation function with output always lying between a range of 0 to 1. Now we will define a function for training our model. In this function we have defined a cost/loss variable where we have used our sigmoid function for calculating the loss and also Gradient Descent for logistic regression to train the model. Finally we call the model function by passing training set, learning rate and iterations parameters. Now we will test the performance of our model using the validation set and the testing set. This shows the effectiveness of the model’s generalization power gained by learning.
Train using Neural networks:
For training the Neural Network model we have used 3 hidden layers with different regularization methods(l2, l1). As model complexity increases, it is likely that we overfit. One way to control overfitting is adding a regularization term to the error function. Regularization is used to improve the model’s generalization power gained by learning. It helps in avoiding overfitting by appending penalties to the loss function.
L1 Regularization uses the absolute value of the magnitude of coefficient as penalty term to the loss.
where
is the regularization coefficient that controls relative importance of data-dependent error
(w) and regularization term.
After training our model, when we evaluate it we get an accuracy of about 88%.
We will then plot the accuracy and loss.
L2 Regularization uses the squared magnitude of coefficient as the penalty term to the loss.
Here after training our model, when we evaluate it we get an accuracy of about 98%. This is better than the L1 regularization, who shrinks the unimportant feature’s coefficient to zero. L1 is better in case when we have huge amount of features with us.
We will then plot the accuracy and the loss for training and valid data.
In Dropout regularization technique, the neurons are randomly dropped-out. Here I have applied drop out between two hidden layers. After training our model, when we evaluate it we get an accuracy of about 93%.
We will then plot the accuracy and the loss for training and valid data.
For a small number of hidden neurons, we observe that the accuracy of L2 is better than the dropout.