16-PracticalExamples.Rmd

---
output:
  bookdown::gitbook:
    lib_dir: "book_assets"
    includes:
      in_header: google_analytics.html
  pdf_document: default
  html_document: default
---
<!-- # Practical statistical modeling {#practical-example} -->
# Modelación estadística práctica {#practical-example}

```{r echo=FALSE,warning=FALSE,message=FALSE}
library(tidyverse)
library(ggplot2)
library(BayesFactor)
library(emmeans)
library(cowplot)
library(knitr)
library(ggfortify)

set.seed(123456) # set random seed to exactly replicate results

# load the NHANES data library
library(NHANES)

# drop duplicated IDs within the NHANES dataset
NHANES <- 
  NHANES %>% 
  dplyr::distinct(ID,.keep_all=TRUE)

NHANES_adult <- 
  NHANES %>%
  subset(Age>=18)

```

<!-- In this chapter we will bring together everything that we have learned, by applying our knowledge to a practical example.  In 2007, Christopher Gardner and colleagues from Stanford published a study in the *Journal of the American Medical Association* titled "Comparison of the Atkins, Zone, Ornish, and LEARN Diets for Change in Weight and Related Risk Factors Among Overweight Premenopausal Women -- The A TO Z Weight Loss Study: A Randomized Trial" [@gard:kiaz:alha:2007]. We will use this study to show how one would go about analyzing an experimental dataset from start to finish. -->
En este capítulo reuniremos todo lo que hemos aprendido, aplicando nuestro conocimiento a un ejemplo práctico. En 2007, Christopher Gardner y colegas de Stanford publicaron un estudio en el *Journal of the American Medical Association* titulado "Comparison of the Atkins, Zone, Ornish, and LEARN Diets for Change in Weight and Related Risk Factors Among Overweight Premenopausal Women -- The A TO Z Weight Loss Study: A Randomized Trial" [@gard:kiaz:alha:2007]. Usaremos este estudio para mostrar cómo analizaríamos un conjunto de datos experimental de inicio a fin.

<!-- ## The process of statistical modeling -->
## El proceso de modelación estadística

<!-- There is a set of steps that we generally go through when we want to use our statistical model to test a scientific hypothesis: -->
Hay un conjunto de pasos que generalmente seguimos cuando queremos usar nuestro modelo estadístico para probar hipótesis científicas:

<!-- 1. Specify your question of interest -->
1. Especificar nuestra pregunta de interés.
<!-- 2. Identify or collect the appropriate data -->
2. Identificar o recolectar los datos apropiados.
<!-- 3. Prepare the data for analysis -->
3. Preparar los datos para el análisis.
<!-- 4. Determine the appropriate model -->
4. Determinar el modelo apropiado.
<!-- 5. Fit the model to the data -->
5. Ajustar el modelo a los datos.
<!-- 6. Criticize the model to make sure it fits properly -->
6. Criticar el modelo para asegurarnos que se ajusta apropiadamente.
<!-- 7. Test hypothesis and quantify effect size -->
7. Probar hipótesis y cuantificar el tamaño del efecto.

<!-- ### 1: Specify your question of interest -->
### 1: Especificar nuestra pregunta de interés.

<!-- According to the authors, the goal of their study was: -->
De acuerdo a los autores, el objetivo de su estudio fue:

<!-- > To compare 4 weight-loss diets representing a spectrum of low to high carbohydrate intake for effects on weight loss and related metabolic variables. -->
> Comparar 4 dietas para perder peso que representan un espectro de ingesta de calorías de bajo a alto según sus efectos en la pérdida de peso y variables metabólicas relacionadas.

<!-- ### 2: Identify or collect the appropriate data -->
### 2: Identificar o recolectar los datos apropiados.

<!-- To answer their question, the investigators randomly assigned each of 311 overweight/obese women to one of four different diets (Atkins, Zone, Ornish, or LEARN), and measured their weight along with many other measures of health over time.  The authors recorded a large number of variables, but for the main question of interest let's focus on a single variable: Body Mass Index (BMI).  Further, since our goal is to measure lasting changes in BMI, we will only look at the measurement taken at 12 months after onset of the diet. -->
Para responder la pregunta, los investigadores asignaron aleatoriamente a 311 mujeres con sobrepeso u obesidad a una de cuatro diferentes dietas (Atkins, Zone, Ornish, o LEARN), y midieron su peso junto con otras variables de salud a lo largo del tiempo. Los autores registraron un gran número de variables, pero para la pregunta principal de interés nos enfocaremos en una variable sencilla: Índice de Masa Corporal (IMC; *Body Mass Index*, BMI). Además, como nuestra meta es medir cambios perdurables en IMC, revisaremos únicamente la medición tomada 12 meses después del inicio de la dieta.

<!-- ### 3: Prepare the data for analysis -->
### 3: Preparar los datos para el análisis.


```{r echo=FALSE, message=FALSE}
# generate a dataset based on the results of Gardner et al. Table 3

set.seed(123456)
dietDf <- 
  data.frame(diet=c(rep('Atkins',77),
                    rep('Zone',79),
                    rep('LEARN',79),
                    rep('Ornish',76))) %>%
  mutate(
    BMIChange12Months=ifelse(diet=='Atkins',
                             rnorm(n=77,mean=-1.65,sd=2.54),
                      ifelse(diet=='Zone',
                             rnorm(n=79,mean=-0.53,sd=2.0),
                      ifelse(diet=='LEARN',
                             rnorm(n=79,mean=-0.92,sd=2.0),
                      rnorm(n=76,mean=-0.77,sd=2.14)))),
    BMIInitial=ifelse(diet=='Atkins',
                             rnorm(n=77,mean=-32,sd=4),
                      ifelse(diet=='Zone',
                             rnorm(n=79,mean=31,sd=3),
                      ifelse(diet=='LEARN',
                             rnorm(n=79,mean=31,sd=4),
                      rnorm(n=76,mean=32,sd=3)))),
    BMI12months=BMIInitial + BMIChange12Months,
    physicalActivity=ifelse(diet=='Atkins',
                            rnorm(n=77,mean=34,sd=6),
                     ifelse(diet=='Zone',
                            rnorm(n=79,mean=34,sd=6.0),
                     ifelse(diet=='LEARN',
                            rnorm(n=79,mean=34,sd=5.0),
                      rnorm(n=76,mean=35,sd=7) )))
  )

summaryDf <- 
  dietDf %>% 
  group_by(diet) %>% 
  summarize(
    n=n(),
    meanBMIChange12Months=mean(BMIChange12Months),
    varBMIChange12Months=var(BMIChange12Months)
  ) %>%
  mutate(
    crit_val_lower = qt(.05, n - 1),
    crit_val_upper = qt(.95, n - 1),
    ci.lower=meanBMIChange12Months+(sqrt(varBMIChange12Months)*crit_val_lower)/sqrt(n),
    ci.upper=meanBMIChange12Months+(sqrt(varBMIChange12Months)*crit_val_upper)/sqrt(n)
  )

tableDf <- summaryDf %>%
  dplyr::select(-crit_val_lower,
                -crit_val_upper, 
                -varBMIChange12Months) %>%
  rename(Diet = diet,
         N = n,
         `Mean BMI change (12 months)`=meanBMIChange12Months,
         `CI (lower limit)`=ci.lower,
         `CI (upper limit)`=ci.upper)
```


<!-- Box plots for each condition, with the 50th percentile (i.e the median) shown as a black line for each group. -->
```{r AtoZBMIChangeDensity,echo=FALSE,fig.cap="Boxplots para cada condición, con el percentil 50 (i.e. la mediana) mostrada con una línea negra para cada grupo.", fig.width=4, fig.height=4, out.width="50%"}
ggplot(dietDf,aes(diet,BMIChange12Months)) + 
  geom_boxplot()

```

<!-- The actual data from the A to Z study are not publicly available, so we will use the summary data reported in their paper to generate some synthetic data that roughly match the data obtained in their study, with the same means and standard deviations for each group. Once we have the data, we can visualize them to make sure that there are no outliers. Box plots are useful to see the shape of the distributions, as shown in Figure \@ref(fig:AtoZBMIChangeDensity). Those data look fairly reasonable - there are a couple of outliers within individual groups (denoted by the dots outside of the box plots), but they don't seem to be extreme with regard to the other groups. We can also see that the distributions seem to differ a bit in their variance, with Atkins showing somewhat greater variability than the others.  This means that any analyses that assume the variances are equal across groups might be inappropriate.  Fortunately, the ANOVA model that we plan to use is fairly robust to this. -->
Los datos reales del estudio A a la Z no están disponibles públicamente, por lo que usaremos el resumen de los datos reportado en su artículo para generar datos sintéticos que aproximadamente coincidan con los datos obtenidos en su estudio, con las mismas medias y desviaciones estándar para cada grupo. Una vez que tenemos los datos, podemos visualizarlos para asegurarnos que no haya valores atípicos (outliers). Los diagramas de caja (boxplots) son útiles para ver la forma de la distribución, como se muestra en la Figura \@ref(fig:AtoZBMIChangeDensity). Esos datos se ven bastante razonables - hay algunos outliers dentro de cada grupo individual (denotados por los puntos que quedan fuera de los boxplots), pero no se ve que sean extremos en comparación con los otros grupos. También podemos ver que las distribuciones parecen diferir un poco en sus varianzas, la dieta Atkins parece mostrar una variabilidad un poco mayor que las demás. Esto significa que cualquier análisis que asuma que las varianzas son iguales entre los grupos podría resultar inapropiado. Afortunadamente, el modelo ANOVA que planeamos usar es bastante robusto a esto.

<!-- ### 4. Determine the appropriate model -->
### 4: Determinar el modelo apropiado.

<!-- There are several questions that we need to ask in order to determine the appropriate statistical model for our analysis. -->
Hay varias preguntas que necesitamos hacer para poder determinar el modelo estadístico apropiado para nuestro análisis.

<!-- * What kind of dependent variable? -->
<!--     * BMI: continuous, roughly normally distributed -->
<!-- * What are we comparing? -->
<!--     * mean BMI across four diet groups -->
<!--     * ANOVA is appropriate -->
<!-- * Are observations independent? -->
<!--     * Random assignment should ensure that the assumption of independence is appropriate -->
<!--     * The use of difference scores (in this case the difference between starting weight and weight after 12 months) is somewhat controversial, especially when the starting points differ between the groups.  In this case the starting weights are very similar between the groups, so we will use the difference scores, but in general one would want to consult a statistician before applying such a model to real data. -->
* ¿Qué tipo de variable dependiente?
    * IMC: continua, aproximadamente distribuida de manera normal.
* ¿Qué estamos comparando?
    * Media de IMC de cuatro grupos de dieta.
    * ANOVA es apropiado.
* ¿Las observaciones son independientes?
    * La asignación aleatoria debería asegurar que la suposición de independencia es apropiada.
    * El uso de puntuaciones de diferencias (en este caso la diferencia entre el peso inicial y el peso después de 12 meses) es algo controversial, especialmente cuando los puntos de inicio difieren entre los grupos. En este caso los puntos de inicio son muy similares entre los grupos, así que usaremos las puntuaciones de diferencias, pero en general uno querrá consultar a un estadístico antes de aplicar ese tipo de modelo a datos reales.

<!-- ### 5. Fit the model to the data -->
### 5: Ajustar el modelo a los datos.

<!-- Let's run an ANOVA on BMI change to compare it across the four diets. Most statistical software will automatically convert a nominal variable into a set of dummy variables.  A common way of specifying a statistical model is using *formula notation*, in which the model is specified using a formula of the form: -->
Realicemos un ANOVA sobre los cambios en IMC para compararlo entre los cuatro tipos de dieta. La mayoría de los softwares estadísticos automáticamente convertirán una variable nominal en un conjunto de variables ficticias (dummy). Una manera común de especificar el modelo estadístico es usando la *notación de fórmula*, en la cual el modelo es especificado usando una fórmula con la estructura:

$$ 
\text{variable dependiente} \sim \text{variables independientes}
$$

<!-- In this case, we want to look at the change in BMI (which is stored in a variable called *BMIChange12Months*) as a function of diet (which is stored in a variable called *diet), so we use the formula: -->
En este caso, queremos revisar los cambios en IMC (que están guardados en una variable llamada *BMIChange12Months*) como una función de la dieta (que está guardada en la variable llamada *diet*), así que usamos la siguiente fórmula:

$$
BMIChange12Months \sim diet
$$

<!-- Most statistical software (including R) will automatically create a set of dummy variables when the model includes a nominal variable (such as the *diet* variable, which contains the name of the diet that each person received).  Here are the results from this model fitted to our data: -->
La mayoría del software estadístico (incluyendo R) automáticamente creará un conjunto de variables ficticias (dummy) cuando el modelo incluye una variable nominal (como la variable *diet*, que contiene el nombre de la dieta que cada persona recibió). Aquí están los resultados de este modelo ajustado a nuestros datos:

```{r echo=FALSE}
# perform ANOVA and print result

lmResult <- lm(BMIChange12Months ~ diet, data = dietDf)
summary(lmResult)
```

<!-- Note that the software automatically generated dummy variables that correspond to three of the four diets, leaving the Atkins diet without a dummy variable. This means that the intercept represents the mean of the Atkins diet group, and the other three variables model the difference between each of those diets and the Atkins diet. Atkins was chosen as the unmodeled baseline variable simply because it is first in alphabetical order. -->
Nota que el software estadístico automáticamente generó las variables ficticias (*dummy variable*) que corresponden a tres de las cuatro dietas, dejando la dieta Atkins sin una variable ficticia (*dummy variable*). Esto significa que la constante representa la media del grupo de la dieta Atkins, y las otras tres variables modelan la diferencia entre las medias de cada una de las dietas y la media de la dieta Atkins. Atkins fue elegida como la variable de línea base sin modelar simplemente porque es la primera en orden alfabético.

<!-- ### 6. Criticize the model to make sure it fits properly -->
### 6: Criticar el modelo para asegurarnos que se ajusta apropiadamente.

<!-- The first thing we want to do is to critique the model to make sure that it is appropriate. One thing we can do is to look at the residuals from the model. In Figure \@ref(fig:residualPlot), we plot the residuals for each individual grouped by diet. There are no obvious differences in the distributions of residuals across conditions, we can go ahead with the analysis. -->
La primera cosa que queremos hacer es criticar el modelo para asegurarnos que es apropiado. Una cosa que podemos hacer es ver los residuales del modelo. En la Figura \@ref(fig:residualPlot), graficamos los residuales para cada persona agrupados según la dieta. No hay diferencias obvias en las distribuciones de los residuales entre las condiciones, podemos seguir adelante con el análisis.

<!-- Distribution of residuals for for each condition -->
```{r residualPlot, echo=FALSE, fig.cap="Distribución de residuales para cada condición.", fig.width=4, fig.height=4}
dietDf <- dietDf %>%
  mutate(lmResid=lmResult$residuals)

ggplot(dietDf, aes(x=lmResid, group=diet, color=diet)) + 
  geom_density() + 
  xlab('Residuals')
```

<!-- Another important assumption of the statistical tests that we apply to linear models is that the residuals from the model are normally distributed.  It is a common misconception that linear models require that the *data* are normally distributed, but this is not the case; the only requirement for the statistics to be correct is that the residual errors are normally distributed.  The right panel of Figure \@ref(fig:diagnosticQQPlot) shows a Q-Q (quantile-quantile) plot, which plots the residuals against their expected values based on their quantiles in the normal distribution. If the residuals are normally distributed then the data points should fall along the dashed line --- in this case it looks pretty good, except for a couple of outliers that are apparent at the very bottom.  Because this model is also relatively robust to violations of normality, and these are fairly small, we will go ahead and use the results. -->
Otra suposición importante de las pruebas estadísticas que aplicamos a modelos lineales es que los residuales del modelo estén normalmente distribuidos.  Es una idea equivocada algo común pensar que los modelos lineales requieren que los *datos* estén distribuidos de manera normal, pero esto no es correcto; el único requisito para que las estadísticas estén correctas es que los errores residuales estén normalmente distribuidos.   El panel derecho de la Figura \@ref(fig:diagnosticQQPlot) muestra una gráfica Q-Q (*quantile-quantile*, cuantil-cuantil) que grafica los residuales contra su valor esperado basado en sus cuantiles en la distribución normal. Si los residuales están normalmente distribuidos entonces los datos caerían a lo largo de la línea punteada --- en este caso se ven bastante bien, excepto por un par de valores atípicos (*outliers*) que son evidentes hasta abajo. Debido a que este modelo es también relativamente robusto a desviaciones de la normalidad, y a que éstas son relativamente pequeñas, seguiremos adelante y usaremos estos resultados.


<!-- Q-Q plot of actual residual values against theoretical residual values -->
```{r diagnosticQQPlot, echo=FALSE, fig.cap="Gráfica Q-Q de los valores residuales reales contra sus valores residuales teóricos.", fig.width=4, fig.height=4}

ggplot(dietDf, aes(sample = lmResid)) +
  stat_qq() + stat_qq_line()
```

<!-- ### 7. Test hypothesis and quantify effect size -->
### 7: Probar hipótesis y cuantificar el tamaño del efecto.

<!-- First let's look back at the summary of results from the ANOVA, shown in Step 5 above. The significant F test shows us that there is a significant difference between diets, but we should also note that the model doesn't actually account for much variance in the data; the R-squared value is only 0.03, showing that the model is only accounting for a few percent of the variance in weight loss.  Thus, we would not want to overinterpret this result. -->
Primero, veamos de nuevo el resumen de resultados del ANOVA, mostrado en el Paso 5 arriba. La prueba F significativa nos muestra que existe una diferencia significativa entre las dietas, pero deberíamos notar también que el modelo realmente no explica mucha varianza en los datos; el valor de R-cuadrada (*R-squared*) es sólo 0.03, mostrando que el modelo sólo explica muy poco porcentaje de la varianza en la pérdida de peso. Por lo tanto, no querremos sobre-interpretar este resultado.

<!-- The significant result in the omnibus F test also doesn't tell us which diets differ from which others. We can find out more by comparing means across conditions.  Because we are doing several comparisons, we need to correct for those comparisons, which is accomplished using a procedure known as the Tukey method, which is implemented by our statistical software: -->
El resultado significativo en la prueba F omnibus tampoco nos dice cuáles dietas difieren de cuáles otras. Podemos averiguar más si comparamos las medias entre condiciones. Como estaríamos haciendo varias comparaciones, necesitamos aplicar una corrección por esas comparaciones múltiples, esto se puede lograr usando un procedimiento conocido como método Tukey, que se puede calcular con nuestro software estadístico:

```{r echo=FALSE}
# compute the differences between each of the means
leastsquare <- emmeans(lmResult, 
                      pairwise ~ diet,
                      adjust="tukey")
 
# display the results by grouping using letters

multcomp::cld(leastsquare$emmeans, 
    alpha=.05,  
    Letters=letters)

```

<!-- The letters in the rightmost column show us which of the groups differ from one another, using a method that adjusts for the number of comparisons being performed; conditions that share a letter are not significantly different from one another.  This shows that Atkins and LEARN diets don't differ from one another (since they share the letter a), and the LEARN, Ornish, and Zone diets don't differ from one another (since they share the letter b), but the Atkins diet differs from the Ornish and Zone diets (since they share no letters). -->
Las letras en la columna hasta la derecha nos muestran cuáles de los grupos difieren de los otros, usando un método que ajusta por el número de comparaciones siendo realizadas; las condiciones que comparten una letra no son significativamente diferentes entre ellas. Esto nos muestra que las dietas Atkins y LEARN no difieren entre sí (porque comparten la letra a), y las dietas LEARN, Ornish, y Zone no difieren entre sí (porque comparten la letra b), pero la dieta Atkins difiere de las dietas Ornish y Zone (porque no comparten letras).


<!-- ### What about possible confounds? -->
### ¿Qué pasa con los posibles factores de confusión (confounds)?

<!-- If we look more closely at the Garder paper, we will see that they also report statistics on how many individuals in each group had been diagnosed with *metabolic syndrome*, which is a syndrome characterized by high blood pressure, high blood glucose, excess body fat around the waist, and abnormal cholesterol levels and is associated with increased risk for cardiovascular problems. The data from the Gardner paper are presented in Table \@ref(tab:metsymData). -->
Si vemos más de cerca el artículo de Gardner, veremos que también reportaron estadísticas sobre cuántas personas en cada grupo habían sido diagnosticadas con *síndrome metabólico*, que es un síndrome caracterizado por presión sanguínea alta, alta glucosa en sangre, exceso de grasa corporal alrededor de la cintura, y niveles de colesterol anormales; este síndrome está asociado con mayor riesgo de problemas cardiovasculares. Los datos del artículo de Gardner se presentan en la Tabla \@ref(tab:metsymData).

```{r metsymData, echo=FALSE}
summaryDf <- 
  summaryDf %>% 
  mutate(
    nMetSym=c(22,20,29,27),
    nNoMetSym=n-nMetSym,
    pMetSym=nMetSym/(nMetSym+nNoMetSym)
  ) 

displayDf <- summaryDf %>%
  dplyr::select(diet,n,pMetSym) %>%
  rename(`P(metabolic syndrome)`=pMetSym,
         N=n,
         Diet=diet)

# Presence of metabolic syndrome in each group in the AtoZ study.
kable(displayDf, caption="Presencia de síndrome metabólico en cada grupo del estudio AtoZ.")
```

<!-- Looking at the data it seems that the rates are slightly different across groups, with more metabolic syndrome cases in the Ornish and Zone diets -- which were exactly the diets with poorer outcomes.  Let's say that we are interested in testing whether the rate of metabolic syndrome was significantly different between the groups, since this might make us concerned that these differences could have affected the results of the diet outcomes.  -->
Mirando estos datos parece que las proporciones son ligeramente distintas entre los grupos, con más casos de síndrome metabólico en las dietas Ornish y Zone -- que fueron justamente las dietas con peores resultados. Digamos que estamos interesados en probar si la proporción de personas con síndrome metabólico fue significativamente diferente entre los grupos, porque esto nos podría llevar a preocuparnos de que estas diferencias hayan podido haber afectado los resultados de las dietas.

<!-- #### Determine the appropriate model -->
#### Determinar el modelo apropiado 

<!-- * What kind of dependent variable? -->
<!--    * proportions -->
<!-- * What are we comparing? -->
<!--     * proportion with metabolic syndrome across four diet groups -->
<!--     * chi-squared test for goodness of fit is appropriate against null hypothesis of no difference -->
* ¿Qué tipo de variable dependiente?
	* proporciones
* ¿Qué estamos comparando?
	* proporción de síndrome metabólico en los cuatro grupos de dieta
	* prueba ji-cuadrada de bondad de ajuste (*goodness of fit*) es apropiada contra la hipótesis nula de no diferencia

<!-- Let's first compute that statistic, using the chi-squared test function in our statistical software: -->
Primero calculemos el estadístico, usando una función de la prueba ji-cuadrada en nuestro software estadístico:

```{r echo=FALSE}
contTable <- as.matrix(summaryDf[,9:10])
chisq.test(contTable)
```

<!-- This test shows that there is not a significant difference between means. However, it doesn't tell us how certain we are that there is no difference; remember that under NHST, we are always working under the assumption that the null is true unless the data show us enough evidence to cause us to reject the null hypothesis. -->
Esta prueba muestra que no hay una diferencia significativa entre los grupos. Sin embargo, no nos dice qué tan seguros estamos de que no haya una diferencia; recuerda que bajo la NHST, siempre estamos trabajando bajo la suposición de que la nula es verdadera a menos que los datos nos muestren suficiente evidencia que nos lleve a rechazar la hipótesis nula.

<!-- What if we want to quantify the evidence for or against the null?  We can do this using the Bayes factor. -->
¿Qué pasa si queremos cuantificar la evidencia a favor o en contra de la nula? Podemos hacer esto usando el factor de Bayes.

```{r echo=FALSE}

bf <- contingencyTableBF(contTable, 
                         sampleType = "indepMulti", 
                         fixedMargin = "cols")
bf
```

<!-- This shows us that the alternative hypothesis is 0.058 times more likely than the null hypothesis, which means that the null hypothesis is 1/0.058 ~ 17 times more likely than the alternative hypothesis given these data. This is fairly strong, if not completely overwhelming, evidence in favor of the null hypothesis. -->
Esto nos muestra que la hipótesis alternativa es 0.058 veces más probable que la hipótesis nula, que significa que la hipótesis nula es 1/0.058 ~ 17 veces más probable que la hipótesis alternativa dados estos datos. Esto es evidencia bastante fuerte, si no abrumadoramente fuerte, en favor de la hipótesis nula.

<!-- ## Getting help -->
## Obtener ayuda

<!-- Whenever one is analyzing real data, it's useful to check your analysis plan with a trained statistician, as there are many potential problems that could arise in real data.  In fact, it's best to speak to a statistician before you even start the project, as their advice regarding the design or implementation of the study could save you major headaches down the road.  Most universities have statistical consulting offices that offer free assistance to members of the university community.  Understanding the content of this book won't prevent you from needing their help at some point, but it will help you have a more informed conversation with them and better understand the advice that they offer. -->
Siempre que uno analiza datos reales, es útil verificar nuestro plan de análisis con una persona entrenada en estadística, porque hay muchos problemas potenciales que podrían surgir en datos reales. De hecho, es mucho mejor hablar con un estadísticx antes de siquiera comenzar el proyecto, pues su asesoría acerca del diseño o implementación del estudio podrían salvarte de grandes dolores de cabeza posteriormente. La mayoría de las universidades tienen oficinas o departamentos de consultoría estadística que ofrecen asistencia gratuita a los miembros de la comunidad universitaria. Entender el contenido de este libro no evitará que necesites su ayuda en algún punto, pero sí te ayudará a tener una conversación más informada con ese departamento y a entender mejor la asesoría que puedan ofrecerte.