ML class project repo
Taking sumbittions... Vote some time on Wednesday meeting
- Number Crunchers
- Feature Freakers
- Logistic Aggression
- ?
Kevin Kim, Jonathan Park, Timothy Lung, Eliot Tong
The application project comprises tasks and milestones: milestones are mandatory and have to be applied by all teams tasks are optional Each team needs to complete a total number of milestones and tasks that is 2 times their group size. Each team member needs to be present at at least one demo.
For a group of 5 students your team will have to complete all milestones and tasks. For a group of 4 students (recommended) your team will have to complete the 5 milestone tasks and 3 tasks (of your choice). For a group of 3 students your team will have to complete the 5 milestone tasks and 1 of the tasks.
Note, that some milestones/tasks are demos and some are competitions. It is the teams responsibility to meet the deadlines/sign-up in time for the provided demo slots.
[milestone 1 - Setup and Linear Classifier] Create your team and team repository. Register your team https://www.kaggle.com/c/distance-to-fire-points. Compute basic dataset statistics and describe/visualize them. Train and run a linear model (of your choice). Submit the predictions https://www.kaggle.com/c/distance-to-fire-points.
[task 1 - Random Forest] Train and run a random forest model. Submit the predictions https://www.kaggle.com/t/625886cd9dd549fd979145ed7c16caac.
[milestone 2 - Gaussian Process] Train and run a Gaussian process model. Remeber to experiemnt with differnt kernels and remember that we can evaluate GPs using negative log predictiv density , which ioncorporates the predictive uncertainty. Submit the predictions here.
[task 2 - Support Vector Machine] Train and run a (kernel) Support Vector Machine. Submit the predictions here.
[milestone 3 - Model Evaluation] Compare at least two differnet methods (you applied previously) using 10 re-runs of a 10-fold cross-validation and perform a suitable statistical test to assess whether one of them performs significantly better than the other(s). Sign up for the demo here.
[milestone 4 - Dimensionality Reduction] Perform dimensionality reduction (PCA/SVD) in order to visualize the data (incorporate the target variable in your visualization). Perform dimensionality reduction (PCA/SVD) and use the new feature representation in a model of you applied before. Compare the predictions results using the model evaluation strategy derived in milestone 3. Sign up for the demo here.
[task 3 - Neural Network] Train and run a Neural Network model. Submit the predictions here. [task 4 - Efficiency] Compare the traning time and test time of at least three models (from the ones applied in milestone 1, 2, and/or task 1, 2, 3). Use the average (or mode) runtime over 10 re-runs and perform a suitable statistical test to assess whether one of the models performs significantly better than the others w.r.t. efficieny of training and test time (two comparisons). Sign up for the demo here.
[task 5 - Semi-Supervised Learning] Perform semi-supervised learning and compare your evaluations to a comparable method (you want to achieve a fair compariosn) using the model evaluation strategy dervied in milestone 3. Sign up for the demo here.
[milestone 5 - Final Competition] Train and run a model of your choice (really anything you want!). Submit the predictions here.