Computer Science 499/599 at Northern Arizona University, Fall 2021
Topic: Unsupervised Learning
Dates: Aug 23 - Dec 10.
Meeting time/place:
- CS499, MWF 8-8:50AM, Engineering building, room 314.
- CS599, TuTh 8-9:15AM, Health Professions building, room 229.
Syllabus: Google Doc.
Class Discord Link (will expire on 10/4/2021): https://discord.gg/WrM7QAxv
These provide background/theory about the algorithms we study in this class.
MLAPP by Murphy
- Author’s web page https://www.cs.ubc.ca/~murphyk/MLbook/
- full book online describing many machine learning algorithms from a computer science perspective.
ESL by Hastie, Tibshirani, Friedman
- Free PDF available from author’s web page https://web.stanford.edu/~hastie/ElemStatLearn/ describes many machine learning algorithms from a statistics perspective.
About computational complexity,
- The SICP book, 1.2.3 “Orders of Growth,” has a brief description in general terms (not specific to machine learning).
- The CLRS book has a more detailed description in Chapter 3, “Growth of Functions” (not specific to machine learning).
- Wikipedia usually has a good/accurate characterization of the machine learning algorithms we study. For example K-means clustering, section Complexity.
These provide practical advice about how to write the R code necessary for the homeworks.
Getting Started in R: Tinyverse Edition by Saghir Bashir and Dirk Eddelbuettel.
Impatient R by Burns
fasteR: Becoming productive in R, as fast as possible, by Norm Matloff
Introductions to data.table (efficient R package for data manipulation).
- A gentle introduction to data.table by Atrebas.
- Official datatable-intro vignette.
- RStudio data.table cheat sheet.
Tao Te Programming by Burns
- selected chapters from the book about how to become a good programmmer.
- web page with details about how to purchase the full book.
Data visualization with ggplot2
- Grammar of graphics chapter of my Animint2 Manual (animint2 code is almost identical to ggplot2 code),
- Thomas Lin Pedersen’s 150 minute webinar “Plotting Anything With ggplot2”,
- One web page UC ggplot intro,
- Data visualization chapter of R for Data Science.
For CS599 grad students: guides to writing an R package with C/C++ code.
- Rcpp for Everyone by Masaki E. Tsuda.
- The C book by Mike Banahan, Declan Brady and Mark Doran.
- When and how to write low-level (C/C++) instead of high-level (R/Python) code?
- Make an R package with C++ code, my tutorial screencast videos.
- R packages book by Wickham.
To do the homeworks you need to install the most recent version of R (4.1.1) with either the RStudio IDE (for beginners) or the ESS IDE (for students who already know/use emacs, or who want to learn, my emacs tutorials).
Folder of all class recordings and code demos from last year. Folder with all code demos from this year. Yes you can copy and modify these code demos for your homework, since they are a part of the class material. But in general, copying code for your homework, from classmates or internet sources, is strictly forbidden and will be pursued as an academic integrity violation.
Your content and responses to each homework question will be graded as follows
- Full credit for figures which show correct results, along with code which seems correct and is of high quality.
- This General Usage Rubric will be used to grade the code quality/style/efficiency in each of your homeworks, -5 for each violation of these good coding rules.
- Some code and result figures, but clearly incorrect, -5 to -10.
- Missing code or figure, -10 to -20.
- Missing code and figure, -20 to -40.
Homework topics and readings for each week are listed below. The date of the Monday of each week is written. Each homework is due Friday of that week, 11:59PM.
- Aug 23, Homework 1: installing R, reading CSV, data visualization using ggplot2.
- My slides on applications of ML, My 20 minute intro to R video, for more introductions to R and data visualization, see links under “For homeworks” above.
- Quizzes Due Tues Aug 24: R Basics 1, R ggplot 1. Due Thurs Aug 26: R ggplot 2, R ggplot 3.
- Aug 30, Homework 2: K-means.
- Slides, Introduction to clustering, MLAPP 25.1. Clustering evaluation, MLAPP-25.1.2. K-means is discussed in ESL-14.3.6, MLAPP-11.4.2.5.
- Quiz due Sun Aug 29: Clustering 1. due Tues Aug 31: Kmeans 1.
- Sept 6, Labor day 9/6. Homework 3. Gaussian mixture models
- Slides, ESL-14.3.7, MLAPP-11.4.2. mclust model names figure.
- Quiz due Tues Sept 7: mixture 1.
- Sept 13, Homework 4: Hierarchical Clustering
- Slides, ESL-14.3.12, MLAPP-25.5.1.
- Quizzes Hierarhical 1-3.
- Sept 20, Homework 5: Clustering model selection
- Slides, Estimating the number of clusters, ESL-14.3.11. Model selection for latent variable models, MLAPP-11.5.
- Quizzes model selection 1-3.
- Sept 27, Review and exam, CS599 grad student R package coding project 1 due, CS499 no week 6 homework.
- Oct 4, Homework week 7: Binary segmentation
- Slides, Intro to changepoint detection Truong et al. sections 1-2. Binary segmentation. Section 5.2.2. Estimating the number of changes. section 6.
- Quizzes changepoint 1, binary segmentation 1.
- Oct 11, Homework week 8: Optimal segmentation via dynamic programming.
- Slides, Truong et al sections 4.1.1 (Models and Cost functions, Parametric Models, Maximum likelihood estimation), 5.1. (Optimal detection).
- Quizzes changepoint 2, optimal segmentation 1-2.
- Oct 18, Homework week 9: Hidden Markov Models
- Slides, depmixS4 vignette section 2. Markov Models, MLAPP-17.2. Hidden Markov Models, MLAPP-17.3-5. Learning for HMMs, MLAPP-17.5.
- Quizzes HMM 1-3.
- Oct 25, Homework week 10: Segmentation model selection
- Slides, for AIC/BIC read MLAPP-5.3.2.4 (BIC approximation to log marginal likelihood) and ESL-7.5 (Estimates of In-Sample Prediction Error) and ESL-7.7 (The Bayesian Approach and BIC). Changepoint ROC curve interactive data viz 1, data viz 2
- Quizzes HMM 4-5, model selection 4.
- Nov 1, Review and exam. CS599 grad student R package coding project 2 due, CS499 no homework week 11.
- Nov 8, Veterans day 11/11. Homework week 12: Principal Components Analysis
- Slides, Principal Components Analysis, ESL-14.5. MLAPP-12.2.
- Quizzes PCA 1-3.
- Nov 15, Homework week 13: Auto-encoders
- Slides, torch+luz coding demo, Deep generative models, MLAPP-28.2 to 28.3. Deep auto-encoders, MLAPP-28.3.2. MLAPP-28.4.2 to 28.4.3.
- Quizzes Autoencoders 1-3.
- Nov 22, Thanksgiving 11/25-26. Homework week 14.
- Slides, Reading: dimRed vignette, no quiz.
- Nov 29, Reading week, final exam review questions, CS599 grad student extra credit R package coding project 3 due (this project is not required; you only have to do this if you want extra credit points).
- Final exams. CS499 Mon Dec 6, 7:30-9:30. CS599 Thurs Dec 9, 7:30-9:30.
- can I do my homework with an older version of R? Maybe, try it if you want, but homeworks will typically require using R packages, which are only tested with the most recent versions of R, so if you are getting errors with an old version of R, try upgrading to the most recent version.
- Some function give me a NULL result, how can I work around that? Try if(!is.null(result)){save your results}
- Some for loop over N items takes a long time, but failed/errored at the N-1’th iteration. How can I re-start computations where I left off? Try if(!some_key %in% names(result_list)){do the computations and save result with name some_key in result_list}
Before class you should prepare by doing the suggested readings/videos. When you do that, write a summary in your own words of every section. Also write questions that you have during your reading so you can ask in class or office hours.
During class, take notes by writing what you understood in your own words. Also I would suggest to ask questions in class as soon as you need clarification.
After class, you should review your notes with one of your classmates (ask one of the students who seem to be correctly answering a lot of questions). Ask each other questions and try to teach/summarize some of the material with each other – that is one of the best ways to learn.
Finally after doing all of the above, please come to office hours (see syllabus), or email me to schedule a meeting.