Instructors:
Class Location: San Francisco Campus - gU Classroom, 3rd Floor
Class Time: 11:00 AM - 1:00 PM and 3:00 PM - 5:00 PM M,T
Lab Time: 2:00 PM - 3:00 PM and 5:00 PM - 6:00 PM M,T
Office Hours: Wednesday by Appointment
Advanced Applied Statistics is designed to provide a solid training in advanced, practical statistical data analysis and modeling. The course will cover data analysis in R, experimental design, A/B testing, time series analysis, Bayesian data analysis, stochastic processes, estimation methods and case studies.
The course will focus on how to apply the statistical methods to real world problems. Students will deliver statistical solutions to a real data science problem every 1-2 weeks.
By the end of this course, you will be able to:
- Carry out advanced data analysis in R
- Design experiments to test causal effects
- Perform A/B testing
- Make forecast based on time series data
- Construct Bayesian models to account for statistical uncertainty
- Make principled estimation using both frequentist and Bayesian methods
- DSCI 6001: Linear Algebra
- DSCI 6002: Data Exploration, Feature Engineering, and Statistics for Data Scientists
- DSCI 6003: Machine Learning I
There is no required textbook for this course. Here is a list of recommended readings:
- Machine Learning: A Probabilistic Perspective, by Kevin P. Murphy
- Software for Data Analysis: Programming with R, by John M. Chambers
- Bayesian Methods for Hackers, by Cameron Davidson-Pilon
- Bayesian Data Analysis, by Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin
- Introduction to Time Series and Forecasting, by P. Brockwell and R. Davis
- Time Series Analysis and Its Applications: With R Examples, by Robert H. Shumway and David S. Stoffer
- Statistical Analysis with Missing Data, by Roderick J. A. Little and Donald B. Rubin
Students are expected to be present and on time for all class meetings.
You will learn more easily and enjoyably if you actively participate. Student contribution to class discussions is highly valued and is critical to the learning process. Students will be asked to participate in class activities designed to encourage open conversation about and involvement in course material.
Participation in and completion of lab exercises is a requirement for this course. Each unit includes exercises to provide practice applying techniques discussed in class and to reveal deficiencies in understanding in preparation for skills tests. Some of these will be individual efforts, some will be pairs and group programming.
The readiness assessment tests (RATs) are intended to ensure that students comprehended the material consumed between classes. Students unsure of their comprehension should bring questions to be addressed before the individual RAT.
At Galvanize, Mastery Tracking is utilized to evaluate real-time student performance across standards and issue final grades at the end the course.
Standards are the core-competencies of Galvanize graduates - the knowledge, skills, and habits every student should possess by time they graduate. Standards are measurable, student-focused outcomes that state what students are expected to be able to do by the end of the course. Instructors continually provide formative assessments to monitor student performance and inform their teaching practices. Students who are below ‘mastery’ on a standard are expected to continue practicing said standard (with the instructor’s guidance) until they reach mastery. What matters is that students eventually learn the material, not how many attempts it takes to get there.
Mastery Tracking uses a 4-point scale. Every student is expected to achieve 3 (mastery) across all standards by time they complete the course. 1s and 2s indicate areas where students need further practice and/or interventions to reach mastery. Unlike grades for individual assignments, mastery tracking can always be adjusted according to performance up to and including the final exam.
4 pt Scale:
- Falling far below mastery - Meeting none of the success criteria or has egregious errors
- Approaching mastery - Meeting some of the success criteria
- Mastery - Meeting all of the success criteria
- Exceeding mastery - Truly exceeding expectations and demonstrating proficiency at a higher level of rigor
More details on this later.
The University of New Haven is an academic community based on the principles of honesty, trust, fairness, respect, and responsibility. Academic integrity is a core University value which ensures respect for the academic reputation of the University, its students, faculty and staff, and the degrees it confers.
The University expects that all students will learn in an environment where they work independently in the pursuit of knowledge, conduct themselves in an honest and ethical manner and respect the intellectual work of others. Each member of the University community has a responsibility to be familiar with the definitions contained in, and adhere to, the Academic Integrity Policy. Violations of the Academic Integrity Policy include, but are not limited to:
- Cheating -- i.e. Don't read off of your neighbors exams
- Collusion -- Group work is encouraged except on evaluative exams*. When working together (on exercises, etc.) acknowledgment of collaboration is required.
- Plagiarism -- Reusing code presented in labs and lectures is expected, but copying someone else's solution to a problem is a form of plagiarism (even if you change the formatting or variable names).
- Facilitating academic dishonesty
Students who are dishonest in any class assignment or exam will receive an "F" in this course. More information regarding UNH’s official academic integrity policies are outlined in here.
The breakdown of the grade will be as follows:
- RATs: 20%
- Labs: 40%
- Final Project: 30%
- Participation: 10%
-
Week 1 - Introduction to R programming
- 1.1: R basics
- 1.2: Functions
- 1.3: Data manipulation
- 1.4: Modeling in R
-
Week 2 - Probability review & estimation methods
- 6.1: Probability review
- 6.2: Maximum likelihood estimation (MLE), method of moments (MOM)
- 6.3: Maximum a posteriori (MAP) estimation
- 6.4: Expectation–maximization (EM) algorithm
-
Week 3 - Experimental design & A/B testing
- 2.1: Design of experiments
- 2.2: Analysis of experiments
- 2.3: Bayesian A/B testing
- 2.4: Multi-arm bandit
-
Week 4 - Bayesian data analysis
- 4.1: Prior, likelihood and posterior
- 4.2: Basic Bayesian modeling
- 4.3: Bayesian hypothesis tests
- 4.4: Bayesian regression
-
Week 5 - Time series analysis
- 3.1: The components of time series data
- 3.2: Exponential smoothing
- 3.3: ARIMA models
- 3.4: Bayesian Structural Time Series (BSTS) model
-
Week 6 - Stochastic processes
- 5.1: Markov chains
- 5.2: Gibbs sampling
- 5.3: Markov chain Monte Carlo (MCMC)
- 5.4: Hidden Markov model (HMM)
-
Week 7 - Statistics review & case studies
- 7.1: Regression analysis
- 7.2: Case study I
- 7.3: Case study II
- 7.4: Review III
-
Week 8 - Project presentations