-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexample-notebook.Rmd
196 lines (150 loc) · 4.88 KB
/
example-notebook.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
title: "acro R Notebook"
output: html_notebook
---
### Check if acro is installed and if it is not install it from CRAN
```{r}
# Check if acro is installed
if (!requireNamespace("acro", quietly = TRUE)) {
# If not installed, install it
install.packages("acro")
}
```
### Load the acro package
```{r}
library("acro")
```
### Initiate acro
First of all, we need to call acro_init() to initialise an acro object. This function takes the parameter suppress which can be TRUE or FALSE to choose whether to automatically apply suppression to the results or not. The default is no suppression.
```{r}
acro_init()
```
### Load the data
- The dataset used in this example notebook is the nursery dataset from OpenML.
- The code below reads the data from a folder called "nursery_data" which we assume is at the same level as the folder where you are working.
- The path might need to be changed if the data has been downloaded and stored elsewhere.
```{r}
data <- farff::readARFF("nursery_data/nursery.arff")
data <- as.data.frame(data)
names(data)[names(data) == "class"] <- "recommend"
```
- Convert the children column to integers, replacing 'more' with random int from range 4-10
```{r}
data$children <- as.numeric(as.character(data$children))
data[is.na(data)] <- round(runif(sum(is.na(data)), min = 4, max = 10), 0)
unique(data$children)
```
### Examples of producing tabular output using acro
#### ACRO Crosstab
```{r}
index <- data[, c("recommend")]
columns <- data[, c("parents")]
values <- data[, c("children")]
# convert the values to an array
values <- matrix(values, ncol = 1)
table <- acro_crosstab(index = index, columns = columns, values = values, aggfunc = "sum")
table
```
#### ACRO table
```{r}
index <- data[, c("parents")]
columns <- data[, c("social")]
table <- acro_table(index = index, columns = columns, deparse.level = 1)
table
```
#### ACRO pivot table
```{r}
index <- "parents"
values <- "children"
aggfunc <- list("mean", "std")
table <- acro_pivot_table(data, values = values, index = index, aggfunc = aggfunc)
table
```
#### ACRO histogram
```{r}
acro_hist(data, "children")
```
#### ACRO survival analysis
In this example a different data set will be used. The lung dataset from the survival package is used.
```{r}
# Load the lung dataset
data(lung)
# head(lung)
acro_surv_func(time = lung$time, status = lung$status, output = "plot")
```
### Examples of producing regression outputs using acro
#### Start by manipulating the nursery data to get two numeric variables
- The 'recommend' column is converted to an integer scale
```{r}
data$recommend <- as.character(data$recommend)
data$recommend[which(data$recommend == "not_recom")] <- "0"
data$recommend[which(data$recommend == "recommend")] <- "1"
data$recommend[which(data$recommend == "very_recom")] <- "2"
data$recommend[which(data$recommend == "priority")] <- "3"
data$recommend[which(data$recommend == "spec_prior")] <- "4"
data$recommend <- as.numeric(data$recommend)
```
```{r}
# extract relevant columns
df <- data[, c("recommend", "children")]
# drop rows with missing values
df <- df[complete.cases(df), ]
# formula to fit
formula <- "recommend ~ children"
```
#### ACRO Linear Model
```{r}
acro_lm(formula = formula, data = df)
```
#### ACRO Logit Model
This is an example of logit regression using ACRO
We use a different combination of variables from the original dataset.
```{r}
# extract relevant columns
df <- data[, c("finance", "children")]
# drop rows with missing values
df <- df[complete.cases(df), ]
# convert finance to numeric
df <- transform(df, finance = as.numeric(finance))
# subtract 1 to make 1s and 2S into 0a and 1s
df$finance <- df$finance - 1
# formula to fit
formula <- "finance ~ children"
```
```{r}
acro_glm(formula = formula, data = df, family = "logit")
```
#### ACRO Probit Model
```{r}
acro_glm(formula = formula, data = df, family = "probit")
```
### Examples of functionality to let users manage their output
#### Rename Output
This function can be used to rename the outputs to a more descriptive name.
```{r}
acro_rename_output("output_0", "crosstab")
```
#### Remove Output
This function can be used to delete outputs from the acro object.
```{r}
acro_remove_output("output_3")
```
#### Display Outputs
This function can be used to list all the outputs created so far
```{r}
acro_print_outputs()
```
#### Add Comments to Output
This function is used to add comments to the outputs. It can be used to provide a description or to pass additional information to the output checkers.
```{r}
acro_add_comments("output_1", "This is a crosstab on the nursery dataset.")
```
#### Finalise
- The users must call finalise() at the end of each session.
- Each output is saved to a CSV file.
- The SDC analysis for each output is saved to a json file or Excel file
(depending on the extension of the name of the file provided as an input to the function)
```{r}
# acro_finalise("RTEST", "xlsx")
acro_finalise("RTEST", "json")
```