70_example_glmm_contraception.Rmd

# GLMM, Binary Outcome: Contraception & Amenorrhea

```{r, include=FALSE}
knitr::opts_chunk$set(comment     = "",
                      echo        = TRUE, 
                      warning     = FALSE, 
                      message     = FALSE,
                      fig.align   = "center", # center all figures
                      fig.width   = 6,        # set default figure width to 4 inches
                      fig.height  = 4)        # set default figure height to 3 inches
```

## Packages

### CRAN

```{r, message=FALSE, error=FALSE}
library(tidyverse)    # all things tidy
library(pander)       # nice looking genderal tabulations
library(furniture)    # nice table1() descriptives
library(texreg)       # Convert Regression Output to LaTeX or HTML Tables
library(psych)        # contains some useful functions, like headTail

library(lme4)         # Linear, generalized linear, & nonlinear mixed models
library(effects)      # Plotting estimated marginal means
library(emmeans)
library(interactions)
library(performance)

library(optimx)       # Unify and streamline optimization capabilities in R
```


### GitHub

Helper `extract` functions for exponentiating parameters form generalized regression models within a `texreg` table of model parameters.

```{r, message=FALSE, error=FALSE}
# install.packages("devtools")
# library(devtools)
# install_github("SarBearSchwartz/texreghelpr")

library(texreghelpr)
```


## Data Prep

> Data on Amenorrhea from Clinical Trial of Contracepting Women.

**Source:** 

Table 1 (page 168) of Machin et al. (1988).
With permission of Elsevier.

**Reference:** 

Machin D, Farley T, Busca B, Campbell M and d'Arcangues C. (1988). Assessing changes in vaginal bleeding patterns in contracepting women. Contraception, 38, 165-179.

**Description:**

The data are from a *longitudinal* clinical trial of contracepting women. In this trial women received an injection of either 100 mg or 150 mg of depot-medroxyprogesterone acetate (DMPA) on the day of randomization  and three additional injections at 90-day intervals. There was a final follow-up visit 90 days after the fourth injection, i.e., one year after the first injection. Throughout the study each woman completed a menstrual diary that recorded any vaginal bleeding pattern disturbances. The diary data were used to determine whether a women experienced *amenorrhea*, the absence of menstrual bleeding for a specified number of days.  A total of 1151 women completed the menstrual diaries and the diary data were used to generate a binary sequence for each woman according to whether or not she had experienced **amenorrhea in the four successive three month intervals**. 

In clinical trials of modern hormonal contraceptives, pregnancy is exceedingly rare (and would be regarded as a failure of the contraceptive method), and is not the main outcome of interest in this study. Instead, the outcome of interest is a binary response indicating whether a woman experienced amenorrhea in the four successive three month intervals. A feature of this clinical trial is that there was **substantial dropout**. More than one third of the women dropped out before the completion of the trial.

**Variable List:**

* Indicators   

    + `id` participant identification
    + `occasion` denotes the four 90-day periods
    
* Outcome or dependent variable    

    + `amenorrhea` Amenorrhea Status: 1=Amenorrhea, 0=No Amenorrhea
    
* Main predictor or independent variable of interest   

    + `dose` 0 = Low (100 mg), 1 = High (150 mg)


### Import


```{r}
data_raw <- read.table("https://raw.githubusercontent.com/CEHS-research/data/master/MLM/RCTcontraception.txt", header=TRUE)
```


```{r}
str(data_raw)
```


```{r}
psych::headTail(data_raw, top = 10)
```

### Long Format

```{r}
data_long <- data_raw %>% 
  dplyr::mutate(id = factor(id)) %>% 
  dplyr::mutate(dose = factor(dose,
                              levels = c("0", "1"),
                              labels = c("Low", "High"))) %>% 
  dplyr::mutate(time = occasion - 1) %>% 
  dplyr::mutate(amenorrhea = amenorrhea %>%       # outcome needs to be numeric
                  as.character() %>% 
                  as.numeric()) %>% 
  dplyr::filter(complete.cases(amenorrhea)) %>%   # dump missing occations
  dplyr::arrange(id, time)
```


```{r}
str(data_long)
```


```{r}
psych::headTail(data_long, bottom = 10)
```


### Wide Format


```{r}
data_wide <- data_long %>% 
  dplyr::select(-time) %>% 
  tidyr::pivot_wider(id_cols = c(id, dose),
                     names_from = occasion,
                     names_prefix = "occasion_",
                     values_from = amenorrhea)
```


```{r}
str(data_wide)
```


```{r}
psych::headTail(data_wide, bottom = 10)
```


## Exploratory Data Analysis

### Summary Statistics

```{r}
data_summary <- data_long %>% 
  dplyr::group_by(dose, occasion) %>% 
  dplyr::summarise(N = n(),
                   M = mean(amenorrhea),
                   SD = sd(amenorrhea),
                   SE = SD/sqrt(N))

data_summary
```

### Visualize

```{r}
data_summary %>% 
  ggplot(aes(x = occasion,
             y = M,
             fill = dose)) +
  geom_col(position = "dodge") +
  theme_bw() +
  theme(legend.position = c(0, 1),
        legend.justification = c(-0.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(1.5, "cm")) +
  labs(x = "90-day windows",
       y = "Observed Proportion of Amenorrhea",
       fill = "Dosage") +
  scale_x_continuous(breaks = 1:4,
                     labels = c("First",
                                "Second",
                                "Third",
                                "Fourth"))
```


```{r}
data_summary %>% 
  ggplot(aes(x = occasion,
             y = M,
             color = dose %>% fct_rev())) +
  geom_errorbar(aes(ymin = M - SE,
                    ymax = M + SE),
                width = .3,
                position = position_dodge(width = .25)) +
  geom_point(position = position_dodge(width = .25)) +
  geom_line(position = position_dodge(width = .25)) +
  theme_bw() +
  theme(legend.position = c(0, 1),
        legend.justification = c(-0.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(1.5, "cm")) +
  labs(x = "90-day windows",
       y = "Observed Proportion of Amenorrhea",
       color = "Dosage") +
  scale_x_continuous(breaks = 1:4,
                     labels = c("First",
                                "Second",
                                "Third",
                                "Fourth"))
```


## GLMM - Basic

### Fit Models

```{r}
fit_1 <- lme4::glmer(amenorrhea ~ time*dose + (1 | id),
                     data = data_long,
                     family = binomial(link = "logit"))

fit_2 <- lme4::glmer(amenorrhea ~ time + dose + (1 | id),
                     data = data_long,
                     family = binomial(link = "logit"))
```


#### Compare via LRT

Should the interaction be included?  No.

```{r}
anova(fit_1, fit_2)
```


```{r}
performance::compare_performance(fit_1, fit_2, rank = TRUE)
```


### Model Parameter Tables

#### Logit Scale



```{r, results="asis"}
texreg::knitreg(list(fit_1, fit_2),
                custom.model.names = c("with", "without"),
                single.row = TRUE,
                caption = "MLM Parameter Estimates: Inclusion of Interaction (SE and p-values)")
```




#### Odds ratio scale



```{r, results="asis"}
texreg::knitreg(list(extract_glmer_exp(fit_1), 
                     extract_glmer_exp(fit_2)),
                custom.model.names = c("with", "without"),
                ci.test = 1,
                single.row = TRUE,
                caption = "MLM Parameter Estimates: Inclusion of Interaction (95% CI's)")
```


### Visualize the Model

#### Sclae = Likert

```{r}
interactions::interact_plot(model = fit_2,
                            pred = time,
                            modx = dose,
                            interval = TRUE,
                            outcome.scale = "link",
                            y.label = "Likert Scale for Ammenorea")
```




#### Scale = Probability


```{r}
interactions::interact_plot(model = fit_2,
                            pred = time,
                            modx = dose,
                            interval = TRUE,
                            outcome.scale = "response",
                            y.label = "Estimated Marginal Probability of Ammenorea") 
```


```{r}
interactions::interact_plot(model = fit_2,
                            pred = time,
                            modx = dose,
                            interval = TRUE,
                            outcome.scale = "response",
                            x.label = "Months",
                            y.label = "Predicted Probability of Amenorrhea",
                            legend.main = "Dosage:",
                            colors = c("black", "black")) +
  geom_hline(yintercept = 0.5,         # reference lines
             color = "gray",
             size = 1.5,
             alpha = .5) +
  theme_bw() +
  theme(legend.position = c(0, 1),
        legend.justification = c(-0.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(1.5, "cm")) +
  scale_x_continuous(breaks = 0:3,
                     labels = c("0",
                                "3",
                                "6",
                                "9"))
```


```{r}
effects::Effect(focal.predictors = c("dose", "time"),
                xlevels = list(time = seq(from = 0, to = 3, by = .1)),
                mod = fit_2) %>% 
  data.frame %>% 
  ggplot(aes(x = time,
             y = fit)) +
  geom_hline(yintercept = c(0, 0.5, 1),         # reference lines
             color = "gray",
             size = 1.5) +
  geom_ribbon(aes(ymin = fit - se,
                  ymax = fit + se,
                  fill = dose),
              alpha = .2) +
  geom_line(aes(color = dose),
            size = 1.5) +
  theme_bw() +
  labs(y = "Predicted Probability")
```

Remove the error bands:

```{r}
effects::Effect(focal.predictors = c("dose", "time"),
                xlevels = list(time = seq(from = 0, to = 3, by = .1)),
                mod = fit_2) %>% 
  data.frame %>% 
  ggplot(aes(x = time,
             y = fit)) +
  geom_hline(yintercept = c(0, 0.5),       
             color = "gray",
             size = 1.5) +
  geom_line(aes(linetype = dose),
            size = 1) +
  theme_bw() +
  theme(legend.position = c(0, 1),
        legend.justification = c(-0.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(1.5, "cm")) +
  labs(x = "90-day Window",
       y = "Predicted Probability of Amenorrhea",
       linetype = "Dosage:") +
  scale_x_continuous(breaks = 0:3,
                     labels = c("First",
                                "Second",
                                "Third",
                                "Fourth"))
```



## GLMM - Optimizers

From the documentation:

The `lme4::glmer()` function fits a generalized linear mixed model, which incorporates both fixed-effects parameters and random effects in a linear predictor, **via maximum likelihood**. The linear predictor is related to the conditional mean of the response through the inverse link function defined in the GLM family.

The expression for the likelihood of a mixed-effects model is an integral over the random effects space. For a linear mixed-effects model (LMM), as fit by lmer, this integral can be evaluated exactly. For a GLMM the **integral must be approximated**. The most reliable approximation for GLMMs is adaptive **Gauss-Hermite quadrature**, at present implemented only for models with a single scalar random effect. The `nAGQ` argument controls the number of nodes in the quadrature formula. A model with a single, scalar random-effects term could reasonably use up to 25 quadrature points per scalar integral.

The `lme4::lmerControl()` function includes an argument for the `optimizer`, which is the name of a optimizing function(s). IT is a character vector or list of functions: length 1 for lmer or glmer, possibly length 2 for glmer). The built-in optimizers are **Nelder_Mead** and **bobyqa** (from the minqa package). Other minimizing functions are allows (constraints do apply).

Special provisions are made for **bobyqa**, **Nelder_Mead**, and optimizers wrapped in the `optimx` package; to use the optimx optimizers (including **L-BFGS-B** from base `optim` and `nlminb`), pass the method argument to `optim` in the `optCtrl` argument (you may also need to load the `optimx` package manually using `library(optimx)`.


### Adaptive Gauss-Hermite Quadrature: Increase the number of quadrature points

> `nAGQ` (integer scalar) the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. Defaults to 1, corresponding to the Laplace approximation. Values greater than 1 produce greater accuracy in the evaluation of the log-likelihood at the expense of speed. A value of zero uses a faster but less exact form of parameter estimation for GLMMs by optimizing the random effects and the fixed-effects coefficients in the penalized iteratively reweighted least squares step. (See Details.)

```{r}
fit_3a <- lme4::glmer(amenorrhea ~ time + I(time^2) + time:dose + I(time^2):dose + (1 | id),
                     data = data_long,
                     nAGQ = 50,             # increase the number of points
                     family = binomial)
```

### Laplace Approximation: switch to the Nelder_Mead optimizer

```{r}
fit_3b <- lme4::glmer(amenorrhea ~ time + I(time^2) + time:dose + I(time^2):dose + (1 | id),
                     data = data_long,
                     control = glmerControl(optimizer ="Nelder_Mead"),
                     family = binomial)
```

### Laplace Approximation: Switch to the `L-BFGS-B` method

```{r, message = TRUE, error=TRUE}
fit_3c <- lme4::glmer(amenorrhea ~ time + I(time^2) + time:dose + I(time^2):dose + (1 | id),
                     data = data_long,
                     control = glmerControl(optimizer ='optimx', optCtrl=list(method='L-BFGS-B')),
                     family = binomial)
```

### Laplace Approximation: Switch to the `nlminb` method

```{r}
fit_3d <- lme4::glmer(amenorrhea ~ time + I(time^2) + time:dose + I(time^2):dose + (1 | id),
                     data = data_long,
                     control = glmerControl(optimizer ='optimx', optCtrl=list(method='nlminb')),
                     family = binomial)
```

## Quadratic Time?

Assess need for quadratic time with the LRT

```{r}
anova(fit_2, fit_3d)
```



```{r, results="asis"}
texreg::knitreg(list(fit_3a, fit_3b, fit_3c, fit_3d),
                custom.model.names = c("nAGQ", "Nelder_Mead",
                                       "L BFGS B", "nlminb"),
                caption = "GLMM: Various methods of ML approximation",
                digits = 4)
```


```{r}
interactions::interact_plot(model = fit_3d,
                            pred = time,
                            modx = dose,
                            interval = TRUE)
```



```{r}
effects::Effect(focal.predictors = c("dose", "time"),
                xlevels = list(time = seq(from = 0, to = 3, by = .1)),
                mod = fit_3d) %>% 
  data.frame %>% 
  ggplot(aes(x = time,
             y = fit)) +
  geom_hline(yintercept = c(0, 0.5),       
             color = "gray",
             size = 1.5) +
  geom_line(aes(linetype = dose),
            size = 1) +
  theme_bw() +
  theme(legend.position = c(0, 1),
        legend.justification = c(-0.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(1.5, "cm")) +
  labs(x = "90-day Window",
       y = "Predicted Probability of Amenorrhea",
       linetype = "Dosage:") +
  scale_x_continuous(breaks = 0:3,
                     labels = c("First",
                                "Second",
                                "Third",
                                "Fourth"))
```


### Post hoc compairsons

```{r}
fit_3d %>% 
  emmeans::emmeans(pairwise ~ dose, at = c(time = 0))
```

```{r}
fit_3d %>% 
  emmeans::emmeans(pairwise ~ dose, at = c(time = 0), type = "response")
```



```{r}
fit_3d %>% 
  emmeans::emmeans(pairwise ~ dose, at = c(time = 1))
```

```{r}
fit_3d %>% 
  emmeans::emmeans(pairwise ~ dose, at = c(time = 1), type = "response")
```

```{r}
fit_3d %>% 
  emmeans::emmeans(pairwise ~ dose, at = c(time = 2), type = "response")
```




```{r}
fit_3d %>% 
  emmeans::emmeans(pairwise ~ dose, at = c(time = 3), type = "response")
```