-
Notifications
You must be signed in to change notification settings - Fork 73
Home
Welcome to the PLA wiki!
This project contains code and data associated with the University of Michigan Practical Learning Analytics course. It parallels content from the videos.
A body of code and data tables (student.course.csv, student.record.csv and see below) form the basis of this project.
Content of the code is driven by an initial list of questions address by the course. Several advanced analyses which overlap with these questions are included. Each .R file contains a main-level function as well as several subroutines with accompanying header material that describes the function of the subroutine. Everything is written in R, and we make use of R packages gplots, treemap,
and optmatch
which are called using library()
in R, but must be installed from the command line or Rstudio locally if they are not already.
Once you have a local copy of these .R files, you can cut-and-paste the commands (in gray) below. You'll need to change the paths (e.g. "~/aim-analytics/PLA-MOOC/) to reflect where these files are downloaded and stored on your machine.
grade.penalty.module.R
: This does course-by-course grade penalty analysis, surveying grade penalties among courses and between groups of students. It also has options to using regression and matching to try to isolate particular effects. Ex: What is the grade penalty in Physics 135 and is it different between genders? Are there differences after matching or regression? Note that this makes use of the optmatch
package. Run these commands in R (replacing the paths as needed!):
R> sr <- read.csv("~/aim-analytics/PLA-MOOC/student.record.csv")
R> sc <- read.csv("~/aim-analytics/PLA-MOOC/student.course.csv")
R> source('~/aim-analytics/PLA-MOOC/grade.penalty.module.R')
R> out <- grade.penalty(sr,sc,'PHYSICS',135,GROUP='GENDER',REGRESSION=TRUE,MATCHING=TRUE,PDF=FALSE)
course.persistence.module.R
: This does course-to-course persistence analysis: given the grade a student received in a course, what is the probability they took another course?. Ex: What is the probability that a student that got a B in Physics 140 (Physics I) later took Physics 240 (Physics II)? Run these commands in R (replacing the paths as needed!):
R> sr <- read.csv("~/aim-analytics/PLA-MOOC/student.record.csv")
R> sc <- read.csv("~/aim-analytics/PLA-MOOC/student.course.csv")
R> source('~/aim-analytics/PLA-MOOC/course.persistence.module.R')
R> hh <- course.persistence.setup(sr,sc,'PHYSICS','PHYSICS',140,240,TITLE='Physics 140 -- > 240: Gender',PDF=TRUE)
R> hh <- course.persistence.setup(sr,sc,'PHYSICS','PHYSICS',140,240,TYPE='MAJOR1_DEPT', GROUP1='Physics Department',GROUP2='Chemistry Department', TITLE='Physics 140 -- > 240: MAJOR',PDF=FALSE)
course.pathways.treemaps.R
: This asks two sets of questions. First, for some course of interest, which courses did students take before, during, and after that course and what kinds of grades did they get? And second, what were the eventual majors of those students? Ex: What courses did students take before, during, and after Physics 140, and what were there eventual majors? This makes use of the treemap
package. Run these commands in R (replacing the paths as needed!):
R> sr <- read.csv("~/aim-analytics/PLA-MOOC/student.record.csv")
R> sc <- read.csv("~/aim-analytics/PLA-MOOC/student.course.csv")
R> source('~/aim-analytics/PLA-MOOC/course.pathways.treemaps.R')
R> course.pathway.treemaps(sr,sc,"PHYSICS",140,TERM_RANGE=c(100,156), PDF=FALSE)
course.pathways.barplots.R
: This asks two sets of questions. First, for some course of interest, which courses did students take before, during, and after that course and what kinds of grades did they get? And second, what were the eventual majors of those students? This is basically an alternative visualization of the data rendered by course.pathways.treemaps.R
. Ex: What courses did students take before, during, and after Physics 140, and what were there eventual majors? Run these commands in R (replacing the paths as needed!):
R> sr <- read.csv("~/aim-analytics/PLA-MOOC/student.record.csv")
R> sc <- read.csv("~/aim-analytics/PLA-MOOC/student.course.csv")
R> source('~/aim-analytics/PLA-MOOC/course.pathways.barplots.R')
R> course.pathway.barplots(sr,sc,"PHYSICS",140,TERM_RANGE=c(100,156), PDF=FALSE)
student.term.GPA.R
This reduces a student course table (formatted like the one we provide) into a one-line-per-student-term track of GPA. Beware when running this on the student-course table we provide; due to the fact that student course table, and therefore the grades, GPAOs, and individuals are synthetic, the GPAs computed from the grades in this table may not be consistent with the synthetic GPAOs.
These functions all run on synthetic data provided with this project, but may in principle be run on similarly-formatted data from local sources. The data come in two tables: a student-course table (student.course.csv
) and student-record table (student.record.csv
).
This is a lookup table that maps our integer academic TERMs to human readable terms: FA 2006 = Fall 2006, WN=Winter, SP=Spring, SS =Spring/Summer, Su=Summer, etc.
This includes one-time information about a student: major, gender, etc. This is one line per student.
ANONID: Anonymous ID of the student, used to merge with columns of the student course table.
ADMIT_TERM: Term of admission. Terms are have be re-numbered to preserve anonymity. The same consistent numbering convention is used for all “TERM” fields. These TERMS go back to TERM=53.
HSGPA: HSGPA as recomputed by admissions. Note that this contains ‘0’ as well, whose meaning is unclear.
LAST_ACT_MATH_SCORE: ACT Math Score.
LAST_ACT_ENGL_SCORE: “”
LAST_ACT_READ_SCORE: “”
LAST_ACT_SCIRE_SCORE: “”
LAST_ACT_COMP_SCORE: “”
LAST_SATI_VERB_SCORE: “”
LAST_SATI_MATH_SCORE: “”
LAST_SATI_TOTAL_SCORE: “”
MAJOR1_DESCR: Full name of first undergraduate major degree.
MAJOR2_DESCR: Full name of second undergraduate major degree.
MAJOR3_DESCR: Full name of third undergraduate major degree.
MAJOR1_TERM: The term that MAJOR1 was received, otherwise NA. Degree data become incomplete before TERM 80. Note that degree information goes back to at least TERM 10.
MAJOR2_TERM: The term that MAJOR2 was received, otherwise NA.
MAJOR3_TERM: The term that MAJOR3 was received, otherwise NA.
MAJOR1_DEPT: The department that awarded MAJOR1. This collapses some rare majors and may be preferable for anonymity.
MAJOR2_DEPT: “”
MAJOR3_DEPT: “”
STDNT_GROUP1: Students are allowed up to two groups of 7 available groups denoted A-G.
STDNT_GROUP2: “”
Courses taken by a student and grade received are recorded here. This may be multiple lines per student.
ANONID: Anonymous ID of the student, used to merge with columns of the student record table.
SUBJECT: Subject area of course.
CATALOG_NBR: Catalog number of the course.
GRD_PTS_PER_UNIT: Discrete numerical field ranging from 0-4, indicating the grade received.
GPAO: Grade point in all other classes over the student's career, up to and including the term the course was taken.
CUM_GPA: Actual CUM_GPA as of the term the course was taken.
DIV: The division (P=professional, H=Humanities,SS=Social Sciences, S=Science,E=Engineering,O=Other) of the SUBJECT.
ANON_INSTR_ID: Anonymized instructor ID. I haven’t used this field much yet.
TERM: Term the course was taken. This reaches TERM=60, which is also the minimum TERM for the ADMIT_TERM field in the student-record table.
Below is a full list of the courses in the student course table. You can make the list yourself as well:
-
R> sc <- read.csv("~/aim-analytics/PLA-MOOC/student.course.csv")
-
R> cnames <- paste(sc$SUBJECT,sc$CATALOG_NBR,sep=" ")
-
R> cnames <- cnames[!duplicated(cnames)]
-
R> cnames <- cnames[order(cnames)]
-
R> print(cnames)
ACC 271
ACC 272
AMCULT 240
AMCULT 374
ANTHRBIO 161
ANTHRBIO 368
ANTHRCUL 101
ARTDES 300
ASIAN 230
ASTRO 106
BE 300
BIOLOGY 105
BIOLOGY 118
BIOLOGY 171
BIOLOGY 172
BIOLOGY 173
BIOLOGY 225
BIOLOGY 226
BIOLOGY 305
BIOLOGY 310
BIOLOGY 311
BIT 200
BUDDHST 230
CHEM 125
CHEM 126
CHEM 130
CHEM 210
CHEM 211
CHEM 215
CHEM 216
CHEM 230
CICS 101
CLCIV 101
CLCIV 372
CLCIV 385
CMPTRSC 183
CMPTRSC 280
CMPTRSC 370
COMM 101
COMM 102
DANCE 100
ECON 101
ECON 102
ECON 401
ECON 402
EECS 183
EECS 203
EECS 280
EECS 281
EECS 370
ENGLISH 124
ENGLISH 125
ENGLISH 223
ENGLISH 225
ENGLISH 239
ENGLISH 240
ENGLISH 297
ENGLISH 298
ENGLISH 325
ENGR 100
ENGR 101
ENGR 110
FIN 300
FRENCH 232
GEOSCI 100
GEOSCI 103
GEOSCI 106
GEOSCI 107
GEOSCI 114
GTBOOKS 191
HISTORY 201
HISTORY 374
LHC 250
LHC 306
LHC 350
LING 111
LING 211
MATH 105
MATH 115
MATH 116
MATH 215
MATH 216
MATH 425
MCDB 310
MECHENG 211
MECHENG 240
MKT 300
MO 300
NURS 220
OB 300
OM 311
OMS 301
OMS 311
PHIL 230
PHYSICS 125
PHYSICS 126
PHYSICS 127
PHYSICS 128
PHYSICS 135
PHYSICS 136
PHYSICS 140
PHYSICS 141
PHYSICS 235
PHYSICS 236
PHYSICS 240
PHYSICS 241
POLSCI 101
POLSCI 111
POLSCI 160
POLSCI 300
POLSCI 389
POLSCI 489
PSYCH 111
PSYCH 112
PSYCH 230
PSYCH 240
PSYCH 250
PSYCH 260
PSYCH 270
PSYCH 280
PSYCH 290
PSYCH 303
PSYCH 330
PSYCH 340
PSYCH 350
PSYCH 360
PSYCH 370
PSYCH 380
PSYCH 390
RELIGION 230
SMS 301
SOC 100
SOC 101
SOC 102
SPANISH 101
SPANISH 103
SPANISH 231
SPANISH 232
SPANISH 275
SPANISH 276
SPANISH 277
STATS 100
STATS 250
STATS 350
STATS 425
STRATEGY 390
UC 280
WOMENSTD 220
WOMENSTD 240