Skip to content

Latest commit

 

History

History
139 lines (89 loc) · 15.8 KB

00_syllabus5460.md

File metadata and controls

139 lines (89 loc) · 15.8 KB

Machine Learning for Ecology

EBIO 5460 Spring 2022

Instructor: Dr Brett Melbourne
Pronouns: he, him, his
email: [email protected]
Office: Ramaley N336 and Zoom
Office hours: Any time by appointment
Contacting me: I prefer email
Class meeting times: Monday & Wednesday 3:35-4:50 PM
Location: Ramaley N183 and sometimes Zoom
Zoom details:

Course description

Machine learning is an exciting topic and recent advances are set to revolutionize many areas of ecology and fundamentally change the way ecologists work in the big data era. It will soon be essential for all ecologists to have a grasp of these tools.

Topics

  • Machine learning: algorithms for prediction
  • Predictive accuracy and the bias-variance tradeoff
  • Inference algorithm: cross validation
  • Model algorithm: knn, smoothers, polynomials
  • Training algorithm (optimizer + cost function)
  • Regularization and penalized optimization
  • Decision tree models (regression, classification)
  • Ensemble algorithms: stacking, bagging, random forests, boosting
  • Neural network models and deep learning
  • Stochastic gradient descent and back propagation
  • Neural network model architectures for different problems
  • Training strategies for deep learning
  • Contemporary and emerging applications of machine learning in ecology

Schedule of topics is here.

Where is it pitched?

This is a practical class. Our aim is to understand the concepts, theory, and algorithms behind the main categories of machine learning algorithms and in particular be able to place any machine learning tool, that typically consists of a collection of algorithms, within a wider data-science context and to identify strengths and weaknesses in its application and when and where it can appropriately be used. A further goal is to be able to combine fundamental algorithms to make your own tools for specific applications. I will try to develop an intuitive understanding of the foundational algorithms and concepts mostly through stochastic simulation and coding plus a bit of basic math. You will code and apply these algorithms to your own datasets. From this foundation you should be able to go deeper through self study.

Prerequisites

We will use R throughout. This is not a beginners class in either R or data science! Ideally you will have taken Part 1 of my EBIO5460 sequence (Part 1: Data Science for Biological Research). If you haven't taken that class it is possible to take the present class but a little catch up will be necessary. I strongly suggest you at least have taken some sort of statistics or computation class, for example, either my grad class "QEE: Quantitative Ecology and Evolution" or "Biometry", or Sam Flaxman's "Computational Biology", or an equivalent intermediate-level data science class. You must already be proficient with R, preferably with experience with your own data. Alternatively, if you are proficient in another language, such as Python, then you should be fine (R is much the same). Proficiency with GitHub is presumed but I will provide materials to quickly get up to speed if you are not already (don't worry, you will initially only need some basic skills). Please see me to discuss any prerequisite areas you think you might need guidance about.

The ultimate learning goals

You will be confident to use machine learning algorithms in your own research. You will have a broad overview of how ecologists are currently using machine learning algorithms to revolutionize ecological research.

Learning format

I'm envisioning a collaborative learning atmosphere. Machine learning is a rapidly advancing field and many new techniques have not yet been applied or are only starting to be applied in ecology. We will explore these cutting-edge areas. Early on (about the first third), I will mostly lecture on basic concepts and you will have weekly reading and practical coding assignments. Collaboration is encouraged both in and out of class. Next, in a student-driven discussion and presentation format, we will explore the recent literature in ecology to consider where machine learning is having a revolutionizing impact on research. In an individual project you will apply algorithms to your own data, which ideally will help you address your thesis research.

I hope that we can create an environment that is relaxed and nonjudgmental so that we will all feel comfortable participating and also that all contributions are valued. I also hope that we can create an environment of respect for each other's learning processes and ideas.

Computing

You'll need access to computing. If you have a reasonably modern laptop, that's perfect. If you don't have access to computing please let me know as soon as possible. I can provide access to some great alternative compute resources.

R computing environment

Please upgrade to the latest versions of R and R Studio. I'm assuming you use these tools regularly and are proficient at upgrading already.

Texts

This class is a mash up and I will sample from several texts and papers. Most of it is available free to you but not all of the textbook material can be posted publicly, so some will be posted to the class Google Drive folder. One book that we'll base a fair bit of the first part of the semester around is:

  • James G, Witten D, Hastie T, Tibshirani R (2021). An Introduction to Statistical Learning: With Applications in R, Second edition. Springer, New York.
  • Book website

Other more advanced texts that I personally am referencing extensively include:

  • Goodfellow I, Bengio Y, Courville A (2016). Deep Learning. The MIT Press, Cambridge, Massachusetts.
  • Hastie T, Tibshirani R, Friedman JH (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York, NY.

A text heavily influencing the way I think about the broader topic of data science from an interdisciplinary perspective is:

  • Efron B & Hastie T (2016). Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge.
  • This text is aimed at masters or first-year PhD students in statistics and data science. After completing this course (and the complementary semester in Part 1), you might find this a useful book with a friendly level of math.

Paperback versions of just about any Springer text can be purchased by University of Colorado students for $25 each. To purchase, do a Chinook search via the CU library (you must be on the campus network or via VPN, not the guest wifi), follow the link to the ebook version, and look for the Springer offer to buy the paperback version. Be aware however that these versions are not in color, which is certainly an issue for interpreting plots!

Grading & assessment scheme

Github portfolio of assignment code commits 30%

Present and lead discussion of contemporary papers from the literature 30%

Participation and peer reviews 10%

Individual assignment presentation 30%

  • Choose between a data project or a literature synthesis project

Exams

There will not be an exam. This material is not suited to exams.

GitHub

Almost everything will be on Github rather than Canvas. I have set up a GitHub organization that is restricted to our class (i.e. your individual repositories are not public). Bookmark this URL:

All of your work will be tracked and submitted here (via a Git commit and push).

Your fieldwork

I realize that as graduate students you may have fieldwork or other research to complete during the semester. Please see me early on so we can talk about how we can work around your fieldwork.

My commitments

I possibly will need to travel for short periods for talks or fieldwork. If that happens I will endeavor to give the class via Zoom at the usual time. If I'm sometimes unable to attend the scheduled class time I will provide materials for these classes and you should show up and participate as usual.

Classroom behavior

Both students and faculty are responsible for maintaining an appropriate learning environment in all instructional settings, whether in person, remote or online. Those who fail to adhere to such behavioral standards may be subject to discipline. Professional courtesy and sensitivity are especially important with respect to individuals and topics dealing with race, color, national origin, sex, pregnancy, age, disability, creed, religion, sexual orientation, gender identity, gender expression, veteran status, political affiliation or political philosophy. For more information, see the policies on classroom behavior and the Student Conduct & Conflict Resolution policies.

Requirement for COVID-19

As a matter of public health and safety due to the pandemic, all members of the CU Boulder community and all visitors to campus must follow university, department and building requirements and all public health orders in place to reduce the risk of spreading infectious disease. Students who fail to adhere to these requirements will be asked to leave class, and students who do not leave class when asked or who refuse to comply with these requirements will be referred to Student Conduct and Conflict Resolution. For more information, see the policy on classroom behavior and the Student Code of Conduct. If you require accommodation because a disability prevents you from fulfilling these safety measures, please follow the steps in the “Accommodation for Disabilities” statement on this syllabus.

CU Boulder currently requires masks in classrooms and laboratories regardless of vaccination status. This requirement is a precaution to supplement CU Boulder’s COVID-19 vaccine requirement. Exemptions include individuals who cannot medically tolerate a face covering, as well as those who are hearing-impaired or otherwise disabled or who are communicating with someone who is hearing-impaired or otherwise disabled and where the ability to see the mouth is essential to communication. If you qualify for a mask-related accommodation, please follow the steps in the “Accommodation for Disabilities” statement on this syllabus. In addition, vaccinated instructional faculty who are engaged in an indoor instructional activity and are separated by at least 6 feet from the nearest person are exempt from wearing masks if they so choose.

If you feel ill and think you might have COVID-19, if you have tested positive for COVID-19, or if you are unvaccinated or partially vaccinated and have been in close contact with someone who has COVID-19, you should stay home and follow the further guidance of the Public Health Office ([email protected]). If you are fully vaccinated and have been in close contact with someone who has COVID-19, you do not need to stay home; rather, you should self-monitor for symptoms and follow the further guidance of the Public Health Office ([email protected]).

In this class, if you are sick or quarantined, please let me know as soon as possible so we can make a plan for you to continue participation and complete the course. You don't need to state the nature of your illness when alerting me.

Accommodation for disabilities

If you qualify for accommodations because of a disability, please submit your accommodation letter from Disability Services to your faculty member in a timely manner so that your needs can be addressed. Disability Services determines accommodations based on documented disabilities in the academic environment. Information on requesting accommodations is located on the Disability Services website. Contact Disability Services at 303-492-8671 or [email protected] for further assistance. If you have a temporary medical condition, see Temporary Medical Conditions on the Disability Services website.

Preferred student names and pronouns

CU Boulder recognizes that students' legal information doesn't always align with how they identify. Students may update their preferred names and pronouns via the student portal; those preferred names and pronouns are listed on instructors' class rosters. In the absence of such updates, the name that appears on the class roster is the student's legal name. I will gladly honor your request to address you by an alternate name or gender pronoun.

Honor code

All students enrolled in a University of Colorado Boulder course are responsible for knowing and adhering to the Honor Code. Violations of the policy may include: plagiarism, cheating, fabrication, lying, bribery, threat, unauthorized access to academic materials, clicker fraud, submitting the same or similar work in more than one course without permission from all course instructors involved, and aiding academic dishonesty. All incidents of academic misconduct will be reported to the Honor Code ([email protected]); 303-492-5550). Students found responsible for violating the academic integrity policy will be subject to nonacademic sanctions from the Honor Code as well as academic sanctions from the faculty member. Additional information regarding the Honor Code academic integrity policy can be found at the Honor Code Office website.

Sexual misconduct, discrimination, harassment and/or related retaliation

The University of Colorado Boulder (CU Boulder) is committed to fostering an inclusive and welcoming learning, working, and living environment. CU Boulder will not tolerate acts of sexual misconduct (harassment, exploitation, and assault), intimate partner violence (dating or domestic violence), stalking, or protected-class discrimination or harassment by members of our community. Individuals who believe they have been subject to misconduct or retaliatory actions for reporting a concern should contact the Office of Institutional Equity and Compliance (OIEC) at 303-492-2127 or [email protected]. Information about OIEC, university policies, reporting options, and the campus resources can be found on the OIEC website.

Please know that faculty and graduate instructors have a responsibility to inform OIEC when they are made aware of incidents of sexual misconduct, dating and domestic violence, stalking, discrimination, harassment and/or related retaliation, to ensure that individuals impacted receive information about their rights, support resources, and reporting options. To learn more about reporting and support options for a variety of concerns, visit Don’t Ignore It.

Religious holidays

Campus policy regarding religious observances requires that faculty make every effort to deal reasonably and fairly with all students who, because of religious obligations, have conflicts with scheduled exams, assignments or required attendance. In this class, in most cases you should have sufficient time to complete the assignments and submit them on time, or early if appropriate. If this does not work for your situation, please notify me at least two weeks in advance of the conflict to request special accommodation. See the campus policy regarding religious observances for full details.