13-longitudinal_gpa.Rmd

# MLM, Longitudinal: Hox ch 5 - student GPA


```{r, error=FALSE, warning=FALSE, message=FALSE, results='hide'}
library(tidyverse)    # all things tidy
library(haven)

library(pander)       # nice looking genderal tabulations
library(furniture)    # nice Table1() descriptives
library(texreg)       # Convert Regression Output to LaTeX or HTML Tables

library(psych)        # contains some useful functions, like headTail
library(sjstats)      # ICC calculations
library(sjPlot)       # Visualization for Models

library(lme4)         # non-linear mixed-effects models
library(lmerTest)
library(effects)      # Effect displays for Models
library(effectsize)
library(interactions)
library(performance)

```


## Background


```{block type='rmdlink', echo=TRUE}
The text **"Multilevel Analysis: Techniques and Applications, Third Edition"** [@hox2017] has a companion [website](https://multilevel-analysis.sites.uu.nl/) which includes links to all the data files used throughout the book (housed on the [book's GitHub repository](https://github.com/MultiLevelAnalysis)).  
```


The following example is used through out @hox2017's chapater 5.

The GPA for 200 college students were followed for 6 consecutive semesters (simulated).  Job status was also measured as number of hours worked for the same size occations.  Time-invariant covariates are the student's gender and high school GPA.  The variable `admitted` will not be used.  


```{r}
data_raw <- haven::read_sav("https://github.com/MultiLevelAnalysis/Datasets-third-edition-Multilevel-book/raw/master/chapter%205/GPA2/gpa2long.sav") %>% 
  haven::as_factor()             # retain the labels from SPSS --> factor

tibble::glimpse(data_raw) 
```

```{r}
data_raw %>% 
  dplyr::select(occas, job) %>% 
  table(useNA = "always")
```



```{r}
data_long <- data_raw %>% 
  dplyr::mutate(student = factor(student)) %>% 
  dplyr::mutate(sem = case_when(occas == "year 1 semester 1" ~ 1,
                                occas == "year 1 semester 2" ~ 2,
                                occas == "year 2 semester 1" ~ 3,
                                occas == "year 2 semester 2" ~ 4,
                                occas == "year 3 semester 1" ~ 5,
                                occas == "year 3 semester 2" ~ 6)) %>% 
  dplyr::mutate(job = fct_drop(job)) %>% 
  dplyr::mutate(hrs = case_when(job == "no job" ~ 0,
                                job == "1 hour" ~ 1,
                                job == "2 hours" ~ 2,
                                job == "3 hours" ~ 3,
                                job == "4 or more hours" ~ 4)) %>%  
  dplyr::select(student, sex, highgpa, sem, job, hrs, gpa) %>% 
  dplyr::arrange(student, sem)

psych::headTail(data_long, top = 10)
```


```{r}
data_wide <- data_long %>% 
  tidyr::pivot_wider(names_from = sem,
                     values_from = c(job, hrs, gpa),
                     names_sep = "_")

psych::headTail(data_wide)
```


```{r}
data_wide %>% 
  dplyr::group_by(sex) %>% 
  furniture::table1("High School GPA" = highgpa, 
                    "Initial College GPA" = gpa_1, 
                    "Initial Job" = job_1, 
                    "Initial Hrs" = hrs_1,
                    output = "markdown",
                    digits = 3,
                    total = TRUE,
                    test = TRUE)
```



```{r}
data_wide %>% 
  dplyr::group_by(sex) %>% 
  furniture::table1(gpa_1, gpa_2, gpa_3, gpa_4, gpa_5, gpa_6,
                    output = "markdown",
                    digits = 3,
                    total = TRUE,
                    test = TRUE,
                    caption = "Hox Table 5.2 (page 77) GPA means at six occations, for male and female students",
                    caption.above = TRUE)
```

## MLM

### Null Models and ICC



```{r}
fit_lmer_0_re <- lmerTest::lmer(gpa ~ 1 + (1|student),
                                data = data_long,
                                REML = TRUE)
```

```{r}
performance::icc(fit_lmer_0_re)
```


Over a third of the variance in the 6 GPA measures is variance between individuals, and about two-thirds is variance within individuals across time, $\rho = .369$.


```{r}
fit_lmer_1_re <- lmerTest::lmer(gpa ~ I(sem - 1) + (1|student),
                                data = data_long,
                                REML = TRUE)
```

```{r}
performance::icc(fit_lmer_1_re)
```

After accounting for the linear change in GPA over semesters, about half of the remaining variance in GPA scores is attribuTable person-to-person difference.  

> Another way to say it, is about half of the variance in initial GPAs is due to student-to-student differences.


### Fixed Effects


```{r}
fit_lmer_0_ml <- lmerTest::lmer(gpa ~ 1 + (1|student),
                                data = data_long,
                                REML = FALSE)

fit_lmer_1_ml <- lmerTest::lmer(gpa ~ I(sem - 1) + (1|student),
                                data = data_long,
                                REML = FALSE)


fit_lmer_2_ml <- lmerTest::lmer(gpa ~ I(sem - 1) + hrs + (1|student),
                                data = data_long,
                                REML = FALSE)


fit_lmer_3_ml <- lmerTest::lmer(gpa ~ I(sem - 1) + hrs + highgpa + sex + (1|student),
                                data = data_long,
                                REML = FALSE)
```


```{r, results="asis"}
texreg::knitreg(list(fit_lmer_0_ml, fit_lmer_1_ml, fit_lmer_2_ml, fit_lmer_3_ml))
```

```{r, results="asis"}
texreg::knitreg(list(fit_lmer_0_ml, fit_lmer_1_ml, fit_lmer_2_ml, fit_lmer_3_ml),
                custom.model.names = c("M1: Null",
                                       "M2: Occ",
                                       "M3: Job",
                                       "M4: GPA, Sex"),
                custom.coef.map = list("(Intercept)" = "(Intercept)",
                                       "I(sem - 1)" = "Semester",
                                       "hrs" = "Hours Working",
                                       "highgpa" = "High School GPA",
                                       "sexfemale" = "Female vs. Male"),
                groups = list("Level 1 Main Effects, Occasion-Specific" = 2:3,
                              "Level 2 Main Effects,  Person-Specific" = 4:5),
                custom.note = "%stars. \nNote: Intercept refers to population mean for a Male who is not working during their first semester",
                caption = "Hox Table 5.3 (page 78) Results of Multilevel Anlaysis of GPA, Fixed Effects",
                caption.above = TRUE,
                digits = 3)
```

```{r}
anova(fit_lmer_0_ml, fit_lmer_1_ml, fit_lmer_2_ml, fit_lmer_3_ml)
```

### Variance Explained by linear TIME at Level ONE

```{r}
lme4::VarCorr(fit_lmer_0_ml)  # baseline
```

```{r}
lme4::VarCorr(fit_lmer_1_ml)  # model to compare
```


```{r}
lme4::VarCorr(fit_lmer_0_ml) %>%  # baseline
  print(comp = c("Variance", "Std.Dev"),
        digits = 3)
```

```{r}
lme4::VarCorr(fit_lmer_1_ml) %>%  # model to compare
  print(comp = c("Variance", "Std.Dev"),
        digits = 3)
```



#### Raudenbush and Bryk

* Explained variance is a proportion of first-level variance only
* A good option when the multilevel sampling process is is close to two-stage simple random sampling

```{block type='genericEq', echo=TRUE}
**Raudenbush and Bryk Approximate Formula - Level 1 ** *approximate*
$$
approx \;R^2_1 = \frac{\sigma^2_{e-BL} - \sigma^2_{e-MC}}
             {\sigma^2_{e-BL} }
\tag{Hox 4.8}
$$
```



```{r}
(0.098 - 0.058) / 0.098
```


#### Snijders and Bosker


```{block type='genericEq', echo=TRUE}
**Snijders and Bosker Formula - Level 1 ** 

Random Intercepts Models Only, *address potential negative $R^2$ issue*
$$
R^2_1 = 1 - \frac{\sigma^2_{e-MC} + \sigma^2_{u0-MC}}
                 {\sigma^2_{e-BL} + \sigma^2_{u0-BL}}
$$
```




```{r}
1 - (0.058 + 0.063)/(0.098 + 0.057)
```


### Variance Explained by linear TIME at Level TWO

#### Raudenbush and Bryk

```{block type='genericEq', echo=TRUE}
**Raudenbush and Bryk Approximate Formula - Level 2 ** 
$$
approx \; R^2_s = \frac{\sigma^2_{u0-BL} - \sigma^2_{u0-MC}}
             {\sigma^2_{u0-BL} }
\tag{Hox 4.9}
$$
```





```{r}
(0.057 - 0.063)/ 0.057
```

YIKES!  Negative Variance explained!


#### Snijders and Bosker


```{block type='genericEq', echo=TRUE}
**Snijders and Bosker Formula Extended - Level 2 ** 
$$
R^2_2 = 1 - \frac{\frac{\sigma^2_{e-MC}}{B} + \sigma^2_{u0-MC}}
                 {\frac{\sigma^2_{e-BL}}{B} + \sigma^2_{u0-BL}}
$$

$B$ is the average size of the Level 2 units.  Technically, you should use the *harmonic mean*, but unless the clusters differ greatly in size, it doesn't make a huge difference.
```


```{r}
1 - (0.058/6 + 0.063) / (0.098/6 + 0.057)
```

Reason: The intercept only model overestimates the variance at the occasion level and underestimates the variance at the subject level (see chapter 4 of @hox2017)


### Random Effects



```{r}
fit_lmer_3_re <- lmerTest::lmer(gpa ~ I(sem-1) + hrs + highgpa + sex + 
                                  (1|student),
                            data = data_long,
                            REML = TRUE)


fit_lmer_4_re <- lmerTest::lmer(gpa ~ I(sem-1) + hrs + highgpa + sex +
                                  (I(sem-1)|student),
                            data = data_long,
                            REML = TRUE,
                            control = lmerControl(optimizer ="Nelder_Mead"))
```


```{r}
anova(fit_lmer_3_re, fit_lmer_4_re, refit = FALSE)
```

### Cross-Level Interaction

```{r}
fit_lmer_4_ml <- lmerTest::lmer(gpa ~ I(sem-1) + hrs + highgpa + sex + 
                                  (I(sem-1)|student),
                            data = data_long,
                            REML = FALSE)

fit_lmer_4_re <- lmerTest::lmer(gpa ~ I(sem-1) + hrs + highgpa + sex + 
                                  (I(sem-1)|student),
                            data = data_long,
                            REML = TRUE,
                            control = lmerControl(optimizer ="Nelder_Mead"))

fit_lmer_5_ml <- lmerTest::lmer(gpa ~ I(sem-1) + hrs + highgpa + sex + I(sem-1):sex + 
                                  (I(sem-1)|student),
                            data = data_long,
                            REML = FALSE)

fit_lmer_5_re <- lmerTest::lmer(gpa ~ I(sem-1) + hrs + highgpa + sex + I(sem-1):sex + 
                                  (I(sem-1)|student),
                            data = data_long,
                            REML = FALSE)
```


```{r, results='asis'}
texreg::knitreg(list(fit_lmer_4_re, fit_lmer_5_re),
                single.row = TRUE)
```



```{r, results='asis'}
texreg::knitreg(list(fit_lmer_4_re, fit_lmer_5_re),
                single.row = TRUE,
                custom.model.names = c("M5: Occ Rand",
                                       "M6: Xlevel Int"),
                custom.coef.map = list("(Intercept)" = "(Intercept)",
                                       "I(sem - 1)" = "Semester",
                                       "hrs" = "Hours Working",
                                       "sexfemale" = "Sex: Female vs. Male",
                                       "highgpa" = "High School GPA",
                                       "I(sem - 1):sexfemale" = "Semester X Sex"),
                groups = list("Level 1 Main Effects, Occasion-Specific" = 2:3,
                              "Level 2 Main Effects,  Person-Specific" = 4:5,
                              "Cross Level Interaction" = 6),
                custom.note = "%stars. \nNote: Intercept refers to population mean for a Male who is not working during their first semester",
                caption = "Hox Table 5.4 (page 80) Results of Multilevel Anlaysis of GPA, Varying Effects for Occation",
                caption.above = TRUE,
                digits = 3)
```


```{r}
anova(fit_lmer_4_ml, fit_lmer_5_ml)
```


### Visualize the Model


```{r, fig.cap="Hox Figure 5.4 (page 82) Multilevel model (M6) comapring linear increase in GPA over semester, but student's sex.", warning=FALSE, message=FALSE}
interactions::interact_plot(model = fit_lmer_5_re,
                            pred = sem,
                            modx = sex,
                            interval = TRUE)
```



```{r, warning=FALSE, message=FALSE}
interactions::interact_plot(model = fit_lmer_5_re,
                            pred = sem,
                            modx = sex,
                            mod2 = hrs,
                            mod2.values = c(1, 2, 3),
                            interval = TRUE)
```

```{r, warning=FALSE, message=FALSE}
interactions::interact_plot(model = fit_lmer_5_re,
                            pred = sem,
                            modx = sex,
                            mod2 = highgpa,
                            mod2.values = c(2, 3, 4),
                            interval = TRUE)
```


### Effect Sizes

#### Standardized Parameters

```{r}
effectsize::standardize_parameters(fit_lmer_5_re)
```


#### R-squared type measures

```{r}
performance::r2(fit_lmer_5_re, by_group = TRUE)
```


### Significance

#### Fixed Effects

> The Likelyhood Ratio Test (Deviance Difference Test) is best for establishing significance of fixed effects.

Wald-tests

```{r}
summary(fit_lmer_5_re)
```

F-test with Satterthwaite adjusted degrees of freedom 

```{r}
anova(fit_lmer_5_re)
```



#### Random Effects

Likelyhood Ratio Tests (Deviance Difference Test), by single term deletion

```{r}
lmerTest::ranova(fit_lmer_5_re)
```