-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
239 lines (177 loc) · 7.97 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# proffer <img src="https://r-prof.github.io/proffer/reference/figures/logo.png" align="right" alt="logo" width="120" height="139" style="border: none; float: right;">
[![CRAN](https://www.r-pkg.org/badges/version/proffer)](https://cran.r-project.org/package=proffer)
[![license](https://img.shields.io/badge/licence-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![check](https://github.com/r-prof/proffer/workflows/check/badge.svg)](https://github.com/r-prof/proffer/actions?workflow=check)
[![codecov](https://codecov.io/github/r-prof/proffer/coverage.svg?branch=main)](https://app.codecov.io/github/r-prof/proffer?branch=main)
The `proffer` package profiles R code to find bottlenecks. Visit <https://r-prof.github.io/proffer/> for documentation. <https://r-prof.github.io/proffer/reference/index.html> has a complete list of available functions in the package.
## Why use a profiler?
This data processing code is slow.
```{r, eval = FALSE}
system.time({
n <- 1e5
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
x[i, ] <- x[i, ] + 1
}
x
})
#> user system elapsed
#> 82.060 28.440 110.582
```
Why exactly does it take so long? Is it because `for` loops are slow as a general rule? Let us find out empirically.
```{r, eval = FALSE}
library(proffer)
px <- pprof({
n <- 1e5
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
x[i, ] <- x[i, ] + 1
}
x
})
#> ● url: http://localhost:57517
#> ● host: localhost
#> ● port: 57517
```
When we navigate to `http://localhost:64610` and look at the flame graph, we see `[<-.data.frame()` (i.e. `x[i, ] <- x[i, ] + 1`) is taking most of the runtime.
<center>
<a href="https://r-prof.github.io/proffer/reference/figures/flame.png">
<img src="https://r-prof.github.io/proffer/reference/figures/flame.png" alt="top" align="center" style = "border: none; float: center;">
</a>
</center>
So we refactor the code to avoid data frame row assignment. Much faster, even with a `for` loop!
```{r}
system.time({
n <- 1e5
x <- rnorm(n)
y <- rnorm(n)
for (i in seq_len(n)) {
x[i] <- x[i] + 1
y[i] <- y[i] + 1
}
x <- data.frame(x = x, y = y)
})
```
Moral of the story: before you optimize, throw away your assumptions and run your code through a profiler. That way, you can spend your time optimizing where it counts!
## Managing the pprof server
The `pprof` server is a background [`processx`](https://github.com/r-lib/processx) process, and you can manage it with the `processx` methods [described here](https://processx.r-lib.org/#managing-external-processes). Remember to terminate the process with `$kill()` when you are done with it.
```{r, eval = FALSE}
# px is a process handler.
px <- pprof({
n <- 1e4
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
x[i, ] <- x[i, ] + 1
}
x
})
#> ● url: http://localhost:50195
#> ● host: localhost
#> ● port: 50195
# Summary of the background process.
px
#> PROCESS 'pprof', running, pid 10451.
px$is_alive()
# [1] TRUE
# Error messages, some of which do not matter.
px$read_error()
#> [1] "Main binary filename not available.\n"
# Terminate the process when you are done.
px$kill()
```
## Serving pprof remotely
As with Jupyter notebooks, you can serve `pprof` from one computer and use it from another computer on the same network. On the server, you must
1. Find the server's host name or IP address in advance.
2. Supply `"0.0.0.0"` as the `host` argument.
```{r, eval = FALSE}
system2("hostname")
#> mycomputer
px <- pprof({
n <- 1e4
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
x[i, ] <- x[i, ] + 1
}
x
}, host = "0.0.0.0")
#> ● url: http://localhost:610712
#> ● host: localhost
#> ● port: 610712
```
Then, in the client machine navigate a web browser to the server's host name or IP address and use the port number printed above, e.g. `https://mycomputer:61072`.
## Installation
For old versions of `proffer` (0.0.2 and below) refer to [these older installation instructions](https://github.com/r-prof/proffer/blob/f76bde56796396e83fee00f94430c94974f18303/README.md#installation) instead of the ones below.
### The R package
The latest release of `proffer` is available on [CRAN](https://CRAN.R-project.org).
```{r, eval = FALSE}
install.packages("proffer")
```
Alternatively, you can install the development version from GitHub.
```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("r-prof/proffer")
```
The `proffer` package requires the `RProtoBuf` package, which may require installation of additional system dependencies on Linux. See its [installation instructions](https://github.com/eddelbuettel/rprotobuf#installation).
### Non-R dependencies
`proffer` requires the copy of `pprof` that comes pre-packaged with the Go language. You can install Go at <https://go.dev/doc/install>.^[One of the graph visualizations requires Graphviz, which you <https://www.graphviz.org/download>, but this visualization is arguably not as useful as the flame graph.]
### Configuration
You can set the `PROFFER_GO_BIN` environment variable to a custom location for the Go binary. See [`usethis::edit_r_environ()`](https://usethis.r-lib.org/reference/edit.html) for directions on how to make this configuration permanent.
### Local testing
Run `pprof_sitrep()` again to verify that everything is installed and configured correctly.
```{r}
library(proffer)
pprof_sitrep()
```
If all dependencies are accounted for, `proffer` should work. Test it out with `test_pprof()`. On a local machine, it should launch a browser window showing an instance of `pprof`.
```{r, eval = FALSE}
library(proffer)
process <- test_pprof()
```
When you are done testing, you can clean up the process to conserve resources.
```{r, eval = FALSE}
process$kill()
```
## Telemetry
Recent versions of Go implement telemetry by default. Functions in `proffer` such as `pprof()` turn off telemetry in order to comply with CRAN policies. Read <https://go.dev/doc/telemetry> to learn how to restore telemetry settings after using `proffer`.
## Contributing
We encourage participation through [issues](https://github.com/r-prof/proffer/issues) and [pull requests](https://github.com/r-prof/proffer/pulls). `proffer` has a [Contributor Code of Conduct](https://github.com/r-prof/proffer/blob/main/CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms.
## Resources
Profilers identify bottlenecks, but the do not offer solutions. It helps to learn about fast code in general so you can think of efficient alternatives to try.
- <http://adv-r.had.co.nz/Performance.html>
- <https://www.r-bloggers.com/2016/01/strategies-to-speedup-r-code/>
- <https://www.r-bloggers.com/2013/04/faster-higher-stonger-a-guide-to-speeding-up-r-code-for-busy-people/>
- <https://cran.r-project.org/package=data.table/vignettes/datatable-intro.html>
## Similar work
### profvis
The [`profvis`](https://github.com/r-lib/profvis) package is easier to install than `proffer` and easy to invoke.
```{r, eval = FALSE}
library(profvis)
profvis({
n <- 1e5
x <- data.frame(x = rnorm(n), y = rnorm(n))
for (i in seq_len(n)) {
x[i, ] <- x[i, ] + 1
}
x
})
```
However, `profvis`-generated flame graphs can be [difficult to read](https://github.com/r-lib/profvis/issues/115) and [slow to respond to mouse clicks](https://github.com/r-lib/profvis/issues/104).
<center>
<a href="https://r-prof.github.io/proffer/reference/figures/profvis.png">
<img src="https://r-prof.github.io/proffer/reference/figures/profvis.png" alt="top" align="center" style = "border: none; float: center;">
</a>
</center>
`proffer` uses [`pprof`](https://github.com/google/pprof) to create friendlier, faster visualizations.