Skip to content

Latest commit

 

History

History
283 lines (229 loc) · 8.95 KB

README.md

File metadata and controls

283 lines (229 loc) · 8.95 KB

proffer logo

CRAN license active Travis build status AppVeyor build status Codecov

The proffer package profiles R code to find bottlenecks. Visit https://r-prof.github.io/proffer for documentation. https://r-prof.github.io/proffer/reference/index.html has a complete list of available functions in the package.

Why use a profiler?

This data processing code is slow.

system.time({
  n <- 1e5
  x <- data.frame(x = rnorm(n), y = rnorm(n))
  for (i in seq_len(n)) {
    x[i, ] <- x[i, ] + 1
  }
  x
})
#>   user  system elapsed 
#> 82.060  28.440 110.582 

Why exactly does it take so long? Is it because for loops are slow as a general rule? Let’s find out empirically.

library(proffer)
px <- pprof({
  n <- 1e5
  x <- data.frame(x = rnorm(n), y = rnorm(n))
  for (i in seq_len(n)) {
    x[i, ] <- x[i, ] + 1
  }
  x
})
#> http://localhost:64610

When we navigate to http://localhost:64610 and look at the flame graph, we see [<-.data.frame() (i.e. x[i, ] <- x[i, ] + 1) is taking most of the runtime.

top

So we refactor the code to avoid data frame row assignment. Much faster, even with a for loop!

system.time({
  n <- 1e5
  x <- rnorm(n)
  y <- rnorm(n)
  for (i in seq_len(n)) {
    x[i] <- x[i] + 1
    y[i] <- y[i] + 1
  }
  x <- data.frame(x = x, y = y)
})
#>    user  system elapsed 
#>   0.046   0.001   0.048

Moral of the story: before you optimize, throw away your assumptions and run your code through a profiler. That way, you can spend your time optimizing where it counts!

Managing the pprof server

Sometimes, your pprof server may not work right away. If that happens, take a look at the error logs.

px <- pprof({
  n <- 1e4
  x <- data.frame(x = rnorm(n), y = rnorm(n))
  for (i in seq_len(n)) {
    x[i, ] <- x[i, ] + 1
  }
  x
})
#> http://localhost:50195

px # How is my background process doing?
#> PROCESS 'R', finished.
px$is_alive()
# [1] FALSE

px$read_error() # Why did it quit soon?
#> [1] "sh: /user/local/bin/pprof: No such file or directory\nWarning message:\nIn system2(Sys.getenv(\"pprof_path\"), args) : error in running command\n"

# Can my system find pprof?
test_pprof()
#> Error: cannot find pprof executable. See the setup instructions at https://r-prof.github.io/proffer.
assert_pprof()
#> Error: cannot find pprof executable. See the setup instructions at https://r-prof.github.io/proffer.
pprof_path()
#> ""

# Maybe my system cannot find the pprof executable.
# Let me find out where I actually installed pprof.
system("which", "pprof")
#> "/home/landau/alternative/path/pprof"

# I can put a line in my .Rprofile or .Renviron file
# to automatically tell new sessions where pprof lives.
Sys.setenv(pprof_path = "/home/landau/alternative/path/pprof")

# Now, pprof should work.
px <- pprof({
  n <- 1e4
  x <- data.frame(x = rnorm(n), y = rnorm(n))
  for (i in seq_len(n)) {
    x[i, ] <- x[i, ] + 1
  }
  x
})
#> http://localhost:64610

px
#> PROCESS 'R', running, pid 12361.
px$is_alive()
# [1] TRUE

# Now a web browser should be able to open http://localhost:64610.

It is best to take down the pprof server when you are done with it.

px$kill()

px is the handle of a callr background process. To learn more about how to manage the process, have a look at the callr documentation, particularly the function r_bg().

Installation

The latest release of proffer is available on CRAN.

install.packages("proffer")

Alternatively, you can install the development version from GitHub.

# install.packages("remotes")
remotes::install_github("r-prof/proffer")

To use functions pprof() and serve_pprof(), pprof needs to be installed. Installing pprof is hard, so if you have trouble, please do not hesitate to open an issue and ask for help. And if you cannot install pprof, then profvis is an excellent alternative.

As you follow the installation instructions below, you can run test_pprof(), assert_pprof(), or pprof_path() at any time to see if proffer can find and use pprof. If these functions succeed early, you are already done.

  1. Install the RProtoBuf package. On Linux, you also need to install the supporting protocol buffer libraries, e.g. sudo apt-get install protobuf-compiler libprotobuf-dev libprotoc-dev on Ubuntu.
  2. Install Graphviz and ensure the Graphviz executables appear in your PATH environment variable (directions here).
  3. Install the Go programming language.
  4. Ensure your system can find the Go binaries. Open your command line interface of choice (e.g. Terminal or Command Prompt) and type go version. If you get an error, you may need to set the PATH environment variable as described here for Linux and here for Windows
  5. Follow these instructions to set the GOPATH environment variables on your system. Type go env GOPATH in in a new terminal session verify that you set it correctly.
  6. Enter go get -u github.com/google/pprof in your terminal to install pprof
  7. Find the path to the pprof executable. It is usually in the bin subdirectory of GOPATH, e.g. /home/landau/go/bin/pprof.
  8. Add a line to your .Renviron file to set the pprof_path environment variable, e.g. pprof_path=/home/landau/go/bin/pprof. This variable tells proffer how to find pprof.
  9. Open a new R session check that pprof installed correctly.
Sys.getenv("pprof_path")
#> /home/landau/go/bin/pprof
file.exists(Sys.getenv("pprof_path"))
#> TRUE
system2(Sys.getenv("pprof_path")) # Shows the pprof help menu on Unix systems.
shell(Sys.getenv("pprof_path")) # Analogous for Windows.

Contributing

We encourage participation through issues and pull requests. proffer has a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Resources

Profilers identify bottlenecks, but the do not offer solutions. It helps to learn about fast code in general so you can think of efficient alternatives to try.

Similar work

profvis

The profvis is much easier to install than proffer and equally easy to invoke.

library(profvis)
profvis({
  n <- 1e5
  x <- data.frame(x = rnorm(n), y = rnorm(n))
  for (i in seq_len(n)) {
    x[i, ] <- x[i, ] + 1
  }
  x
})

However, profvis-generated flame graphs can be difficult to read and slow to respond to mouse clicks.

top

proffer uses pprof to create friendlier, faster visualizations.