In this project our goal is to implement the gradient descent algorithm for learning a logistic regression model, and then use it with early stopping regularization to make predictions on several real data sets.
See our implementation of Gradient Descent Algorithm using R language here.
Because the project is written in R language, so it is necessary to have R on your machine.
You can download R here and install it.
This is the easiest way for everyone can run the project.
You can download RStudio here and install it.
You can use git clone
to download the project or just click the green button to download a ZIP file.
Use any tools you like to unzip the project into the folder you want.
This is an important step, ignoring this may cause some problems with reading data. Open RStudio, in the 'console' command line at the bottom left corner, type
setwd('PATH')
where PATH is where you unzip all the R and data files on your machine.
WARNING: In your path, use '/' instead of ''.
Click 'File'->'Open File' at the top left corner, the choose
Scale.R
GradientDescent.R
Experiments_spam.R
Extra_Credits.R
to open them.
First go to Scale.R, then click the 'run' button in the file section (not the whole window) once, this will run the current line, which is the whole file in this case.
Then go to GradientDescent.R, then click 'run' again until there's nothing running in the console section.
Then go to Experiments.R, then click 'run' until we see the package 'ggplot2' is imported (sorry for the inconvenience), then click twice, and wait for seconds, a graph will show in the bottom right section, showing the relationship between error rate and # of iterations. Click twice more, a graph will show the relationship between logistic loss and # of iterations. Then click to the end, a graph will show ROC curve.
At the end of the eighth line in Experiments.R, you can see a '1', and that's for the first dataset, which is 'spam', if you want to change to another dataset, replace '2' with '1' will lead you to SAheart dataset, and '3' is zip.train.
For extra credits, just go to Extra_Credits.R and run from the first line again to the end. You need to wait some processes to finish at middle place because it will generate some tables on demand of Question 2 of extra credits. At last you will see a graph with one baseline and three differnet models trained from three different data sets.
Similarly, at the end of the eighth line in Experiments.R, you can see a '1', and that's the same usage as switch dataset above.
This is our first group project of CS499 Deep Learning course in Spring 2020 at NAU
You can find the requirements for this project here
Dr. T.D.Hocking - tdhock at SICCS
Any cloning or downloading before the project due date constitutes an infringement of our intellectual property rights, and after that it goes to open source. For any of the aforementioned infringements, Zhenyu Lei and Jianxuan Yao will report this to the NAU Academic Integrity Hearing Board.