-
Notifications
You must be signed in to change notification settings - Fork 23
/
Copy pathstressecho.qmd
112 lines (83 loc) · 6.1 KB
/
stressecho.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
title: "Stress Echo Descriptive Analysis"
author: "Frank Harrell"
date: last-modified
format:
html:
cap-location: margin
reference-location: margin
code-tools: true
code-fold: show
code-block-bg: "#f1f3f5"
code-block-border-left: "#31BAE9"
self-contained: true
execute:
warning: false
message: false
major: descriptive statistics; graphics
---
## Introduction
This report reads an existing R dataset (officially a "data frame") and makes various descriptive summaries, packaging text, code, and output into an html file. Some of the graphics are rendered both as static and as interactive graphs. The interactive ones allow you to pan, zoom, and hover with the mouse to see more details. Interactive graphics are constructed using the R `plotly` package, which the `Hmisc` package loads automatically. To run this report you'll need to install the `Hmisc` and `plotly` packages and their dependencies (`RStudio` will take care of the dependencies).
## Loading User-Contributed R Packages
In R a _package_ is a collection of functions, function documentation ("help files"), and example datasets.
When you install R and `RStudio` many standard packages are automatically installed. There are over 10,000 user-contributed packages that extend what R can do. These include specialty packages for biomedical research (e.g., flow cytometry analysis, genomics, interfaces to REDCap) and a huge number of packages for analysis, table making, and graphics. After using the `RStudio` `Tools` menu to install add-on packages, you load packages into R to make them available when running your report by using `require` or `library` commands. In the following we get access to the [`Hmisc` package](https://hbiostat.org/R/Hmisc).
```{r setup}
require(Hmisc) # Hmisc must already be installed
# For certain Hmisc & rms functions set default output format to html
options(prType='html')
```
You can load this complete report script into your `RStudio` script editor window by running the command (usually from the R `console` under `RStudio`) `getRs('stressecho.qmd', put='rstudio')`.
## Accessing Data
There are many ways to import data into R, as described [here](https://hbiostat.org/rflow/fcreate.html). Much of the time you will be importing `.csv` or `REDCap` files or binary files created by `Stata`, `SPSS`, or `SAS`. For learning R there is a wide variety of ready-to-use datasets on the Vanderbilt Department of Biostatistics dataset repository at [hbiostat.org/data](https://hbiostat.org/data). These datasets are already in R binary format and many of them are fully documented and annotated. In the `Hmisc` package there is a function `getHdata` that finds these datasets, downloads them from the web server, and loads them into R's memory for easy access.
Use the`getHdata` function to download the [stress echocardiogram dataset](https://hbiostat.org/data/repo/stressEcho.html) and read it into R's memory. This dataset is annotated with variable labels and, for continuous variables, units of measurement. These annotations are used by `contents`, `describe`, and certain graphics functions.
Sometimes datasets have long names that are tedious to type. I use a convention of copying the currently active dataset into an R data frame called `d` to save typing during analysis.
```{r import}
getHdata(stressEcho)
d <- stressEcho # copy dataset so can refer to short name d
```
## Data Dictionary
The names of variables in the stress echo dataset can be printed by using the command `names(d)`. But let's use an `Hmisc` package function to provide more metadata
```{r}
contents(d)
```
In the main output you'll find that the number of levels for categorical variables (R `factor` variables) are highlighted. These are hyperlinks, and if you click on the number the pointer will jump to the list of levels below.
## Descriptive Statistics
The `Hmisc` package `describe` function produces appropriate descriptive statistics for each variable in a data frame depending on the variable's nature (binary, categorical, continuous, etc.). [See [this](https://hbiostat.org/R/glossary.html) for definitions of some of the items in `describe` output.]{.aside} If you just type `describe(d)` you'll get plain text output with no graphics. Running the output of `describe` through the `html` function converts it to html format which contains small graphics files to render high-resolution spike histograms for continuous variables.
```{r}
des <- describe(d) # store results of describe() in des
des # can also use html(describe(d))
```
`describe` also has a `plot` method that translates descriptive statistics to purely graphical form and draws larger high-resolution histograms for continuous variables. The `plot` method produces two graphics objects: one for categorical variables and one for continuous variables. We save the overall `plot` result in object `p` then reference two sub-objects^[For these static graphics we could have just typed `plot(des)` and two plots would appear in succession. That approach does not work when making interactive graphs later.]
```{r}
p <- plot(des)
```
```{r}
#| label: fig-cat
#| fig-cap: "Categorical variables"
p$Categorical
```
```{r}
#| label: fig-cont
#| fig-cap: "Continuous variables"
p$Continuous
```
To render these two graphs in interactive format using `plotly` we set a system option that the `Hmisc` package monitors: `grType`. It's default value is `'plain'`.
```{r}
options(grType='plotly')
p <- plot(des)
```
```{r}
#| label: fig-cat-interactive
#| fig-cap: "Categorical variables. Hover over points to see numerators and denominators of proportions, and hover over the leftmost point to see more information."
p$Categorical
```
```{r}
#| label: fig-cont-interactive
#| fig-cap: "Continuous variables. Hover over the leftmost spike to see a full list of descriptive statistics. Hover over other spikes to see $x$-axis values and frequency counts."
p$Continuous
```
## Computing Environment
A function `session` stored in an object in the `Hmisc` package prints the current computing environment in a nice format.
```{r echo=FALSE,results='asis'}
markupSpecs$html$session()
```