Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with brms #12

Closed
6 tasks done
odaniel1 opened this issue Dec 17, 2020 · 12 comments
Closed
6 tasks done

Integration with brms #12

odaniel1 opened this issue Dec 17, 2020 · 12 comments

Comments

@odaniel1
Copy link

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

Hi Will - I've been a user of drake for half a year now, and am just starting to try out targets so was excited to see your post on the Stan discourse about stantargets.

I'm testing stantargets as a part of a Bayesian workflow [Gelman et al. 2020] - for which I was interested in the model specification being a part of the pipeline: specifically integrating brms ability to convert a formula into stan code (brms::make_stancode) and to appropriately shape the data (brms::make_standata).

Intuitively this seemed like it should fit well with the the targetopia/pipeline framework - but my attempt at an implementation had two conflicts with tar_stan_mcmc:

  • The stan_files argument does not appear to accept a target as an argument - in the implementation below I'm defining this file path as a target so that it is shared between defining the model (brms::make_stancode) and running the model (tar_stan_mcmc).

  • The data argument does not appear to support arguments of the class standata which are returned by calls to brms::make_standata.

I've indicated my current workarounds in the comments - I'd welcome your views as to their optimality, and any indications as to whether it'd be possible to support this type of workflow more intuitively, or alternatively views as to why this isn't desired!

Reproducible example

library(targets)

tar_script({
  library(targets)
  library(stantargets)
  
  options(crayon.enabled = FALSE)
  tar_option_set(
    packages = c("tibble", "brms"),
    memory = "transient",
    garbage_collection = TRUE
  )
  
  tar_pipeline(
    tar_target(
      data,
      tibble(x = rnorm(10), y=  0.5 * rnorm(10))
    ),
    tar_target(
      stan_model_path,
      "x.stan",
      format = "file"
    ),
    tar_target(
      stan_formula,
      brmsformula(formula = y ~ x)
    ),
    tar_target(
      stan_model,
      make_stancode(
        formula = stan_formula,
        data = data,
        save_model = stan_model_path)
    ),
    tar_target(
      stan_data,
      make_standata(
        formula = stan_formula,
        data = data
      )
    ),
    tar_stan_mcmc(
      example,
      # this fails - explicitly providing the path x.stan works, so long as the
      # stan_model target has run. But this incurs duplication/implicit dependency,
      # and  breaks the pipeline workflow.
      stan_files = stan_model_path,
      # this fails - the data argument expects the object returned to be of
      # type list(), whereas brms::make_standata returns type standata
      data = stan_data,
      refresh = 0,
      init = 1,
      show_messages = FALSE,
      chains = 4,
      parallel_chains = 4,
      iter_warmup = 200,
      iter_sampling = 400
    )
  )
})

Created on 2020-12-17 by the reprex package (v0.3.0)

@jgabry
Copy link

jgabry commented Dec 17, 2020

  • The data argument does not appear to support arguments of the class standata which are returned by calls to brms::make_standata.

I'm not sure about the other error you're getting but for this one I think you can solve it just by using data = unclass(standata) instead of data = standata.

@wlandau
Copy link
Member

wlandau commented Dec 18, 2020

You bring up good points, @odaniel1. And thanks for the Bayesian workflow paper, super helpful and relevant! @jgabry, if there is anything I can do to help, please let me know.

I'm not sure about the other error you're getting but for this one I think you can solve it just by using data = unclass(standata) instead of data = standata.

Thanks @jgabry! I confirmed that unclass(make_standata(...)) works.

The stan_files argument does not appear to accept a target as an argument - in the implementation below I'm defining this file path as a target so that it is shared between defining the model (brms::make_stancode) and running the model (tar_stan_mcmc).

I went back and forth on this one early in development, but I ultimately decided stan_files should be a fixed character of known Stan file paths. This approach allows you to include multiple models in the same tar_stan_mcmc() call, and the internal static branching scheme is user-friendly, clean, and consistent with other tar_stan_*() functions that also need to do dynamic branching for batched replication.

But as you saw, that means when you try to use brms, it requires a lot of custom code. At first glance it looked easy, but then I realized brms::make_stancode() requires the data too, which I did not see coming. And even when I apply the unclass(standata) to your reprex, the stan_model target is disconnected from the example_mcmc_x target (no edge in the graph) so the MCMC will not rerun if stan_model changes.

targets::tar_visnetwork()

Screen Shot 2020-12-17 at 9 34 57 PM

So the problem is:

  1. I think stantargets makes good assumptions about users who manually write static Stan model files up front and then call cmdstanr directly, and these assumptions translate into tangible benefits for a large fraction of practical use cases.
  2. brms makes good workflow assumptions too, but they are incompatible with (1).

Possible alternatives:

  1. Pursue user-level workarounds to (2) above. Unfortunately, the friction here seems prohibitive.
  2. Build brms functionality directly into stantargets. I gave it some thought, and I do not see this playing out well:
    • If we modify the existing tar_stan_() functions to accommodate with brms, that's two sets of conflicting assumptions we are trying to accommodate, and that could get messy, ambiguous, and limiting.
    • If we add new functions for brms, the package could get more cumbersome and more difficult to maintain, especially because including brms sets a precedent for including similar Stan-based packages that are likely to arise in the future.
  3. Develop a brand new brmstargets package, a targetopia package just for brms completely tailored to all the assumptions and use cases that brms prioritizes.

I propose we go with (3). This is how I am thinking about the targetopia anyway: one targetopia package for each methodology package. For example, I am just starting to develop a new jagstargets package separately from stantargets on top of R2jags. I feel this is a clean way to go and allows for a lot of flexibility in the right places.

@odaniel1 or @jgabry, is brmstargets something you might be interested in creating and owning? If not, I will start sometime after I finish jagstargets. It may be a while before I get to it. Also cc @paul-buerkner.

@wlandau
Copy link
Member

wlandau commented Dec 18, 2020

And just to be clear, if we go with (3), you could use brmstargets and stantargets in the same pipeline. They only generate target objects, not entire pipelines, and they should ideally follow similar enough patterns and conventions.

@odaniel1
Copy link
Author

Thanks @wlandau and @jgabry for your thoughts - and really insightful to understand the logic behind requiring stan_files to be pre-registered.

I've copied below a reprex with a hack that serves as a workaround to both issues (with thanks to @jgabry for the unclass solution). Its not particularly elegant - but should work for me, and hopefully others, as a stop-gap.

Re: a brmstargets package - I have to concede I'm not all that familiar with brms myself as in the past I've hand coded stan files (and paid the price in efficiency!). I'd be keen to help tackle some of the easier aspects, but probably don't have the familiarity required with the inner workings of either brms or targets. If any of you (@jgabry, @wlandau, @paul-buerkner, other) do decide to go ahead give me a shout and I'd be more than happy to chip-in where I can!

  library(targets)
  library(stantargets)
  library(targets)
  library(fs)

  # predefine the model filepath outside of the targets pipeline as
  # stantargets::tar_stan_mcmc requires the stan_files argument to be a fixed
  # character vector
  stan_model_path <- fs::file_create("x.stan")
  
  # wrapper around brms::make_standata:
  # - @jgabry's suggestion of returning unclass(standata) ensures the data generated
  #   by brms::make_standata can be used by tar_stan_mcmc
  # - the (unused) stan_model argument forces a pipeline dependency of tar_stan_mcmc
  #   on the stan_model target
  make_standata_target <- function(stan_model, ...){
    if(!("brmsmodel" %in% class(stan_model)))
      stop("Expecting an object of class brmsmodel in stan_model argument")
    
    standata <- make_standata(...)
    return(unclass(standata))
  }
  
  options(crayon.enabled = FALSE)
  tar_option_set(
    packages = c("tibble", "brms"),
    memory = "transient",
    garbage_collection = TRUE
  )
  
  tar_pipeline(
    tar_target(
      data,
      tibble(x = rnorm(10), y=  0.5 * rnorm(10))
    ),
    tar_target(
      stan_formula,
      brmsformula(formula = y ~ 1 + x)
    ),
    tar_target(
      stan_model,
      make_stancode(
        formula = stan_formula,
        data = data,
        save_model = stan_model_path)
    ),
    tar_target(
      stan_data,
      make_standata_target(
        stan_model,
        formula = stan_formula,
        data = data
      )
    ),
    tar_stan_mcmc(
      example,
      stan_files = stan_model_path,
      data = stan_data,
      refresh = 0,
      init = 1,
      show_messages = FALSE
    )
  )
#> <pipeline with 10 targets>

Created on 2020-12-18 by the reprex package (v0.3.0)

@wlandau
Copy link
Member

wlandau commented Dec 18, 2020

Your workaround looks pretty good. The only issue I see at a glance is the stan_model target.

tar_target(
  stan_model,
  make_stancode(
    formula = stan_formula,
    data = data,
    save_model = stan_model_path
  )
)

stan_model_path is a character string in the global environment, so changes to the file itself will not invalidate stan_model. So it looks like we'll need to define another tar_target() with format = "file". This is exactly the sort of confusing low-level detail I would hope to abstract away in brmstargets.

# _targets.R
library(targets)
library(stantargets)
library(targets)
library(fs)

stan_model_path <- fs::file_create("x.stan")

make_standata_target <- function(stan_model, ...){
  if(!("brmsmodel" %in% class(stan_model)))
    stop("Expecting an object of class brmsmodel in stan_model argument")
  
  standata <- make_standata(...)
  return(unclass(standata))
}

options(crayon.enabled = FALSE)
tar_option_set(
  packages = c("tibble", "brms"),
  memory = "transient",
  garbage_collection = TRUE
)

tar_pipeline(
  # New target to track the template model so the brms model stays up to date..
  tar_target(
    stan_model_path_target,
    stan_model_path,
    format = "file"
  ),
  tar_target(
    data,
    tibble(x = rnorm(10), y=  0.5 * rnorm(10))
  ),
  tar_target(
    stan_formula,
    brmsformula(formula = y ~ 1 + x)
  ),
  tar_target(
    stan_model,
    make_stancode(
      formula = stan_formula,
      data = data,
      save_model = stan_model_path_target
    )
  ),
  tar_target(
    stan_data,
    make_standata_target(
      stan_model,
      formula = stan_formula,
      data = data
    )
  ),
  tar_stan_mcmc(
    example,
    stan_files = stan_model_path,
    data = stan_data,
    refresh = 0,
    init = 1,
    show_messages = FALSE
  )
)

Screen Shot 2020-12-18 at 12 03 53 PM

@wlandau
Copy link
Member

wlandau commented Dec 18, 2020

Your workaround looks pretty good. The only issue I see at a glance is the stan_model target.

tar_target(
  stan_model,
  make_stancode(
    formula = stan_formula,
    data = data,
    save_model = stan_model_path
  )
)

stan_model_path is a character string in the global environment, so changes to the file itself will not invalidate stan_model. So it looks like we'll need to define another tar_target() with format = "file". This is exactly the sort of confusing low-level detail I would hope to abstract away in brmstargets.

# _targets.R
library(targets)
library(stantargets)
library(targets)
library(fs)

stan_model_path <- fs::file_create("x.stan")

make_standata_target <- function(stan_model, ...){
  if(!("brmsmodel" %in% class(stan_model)))
    stop("Expecting an object of class brmsmodel in stan_model argument")
  
  standata <- make_standata(...)
  return(unclass(standata))
}

options(crayon.enabled = FALSE)
tar_option_set(
  packages = c("tibble", "brms"),
  memory = "transient",
  garbage_collection = TRUE
)

tar_pipeline(
  # New target to track the template model so the brms model stays up to date.
  tar_target(
    stan_model_path_target,
    stan_model_path,
    format = "file"
  ),
  tar_target(
    data,
    tibble(x = rnorm(10), y=  0.5 * rnorm(10))
  ),
  tar_target(
    stan_formula,
    brmsformula(formula = y ~ 1 + x)
  ),
  tar_target(
    stan_model,
    make_stancode(
      formula = stan_formula,
      data = data,
      save_model = stan_model_path_target
    )
  ),
  tar_target(
    stan_data,
    make_standata_target(
      stan_model,
      formula = stan_formula,
      data = data
    )
  ),
  tar_stan_mcmc(
    example,
    stan_files = stan_model_path,
    data = stan_data,
    refresh = 0,
    init = 1,
    show_messages = FALSE
  )
)

Screen Shot 2020-12-18 at 12 03 53 PM

@wlandau
Copy link
Member

wlandau commented Dec 18, 2020

And of course, no pressure to take on brmstargets. If all goes well, there will be plenty of targetopia packages to go around, so I plan to make a habit of broadcasting opportunities like this.

@jgabry
Copy link

jgabry commented Dec 19, 2020

I'm certainly not opposed to a separate brmstargets package, although I don't think I have the bandwidth at the moment to take the lead on that. If someone else wants to do it soon that's great, or maybe it's something we can add down the line.

P.S. @wlandau these diagrams are really cool!

@wlandau
Copy link
Member

wlandau commented Dec 19, 2020

I'm certainly not opposed to a separate brmstargets package, although I don't think I have the bandwidth at the moment to take the lead on that.

I figured as much, you are already everywhere in the Stan dev space. I expect to create brmstargets myself unless someone else has a burning desire to do so, and that's totally fine. Just looking for ways to include more devs in the targetopia.

P.S. @wlandau these diagrams are really cool!

Thanks! If you like tar_visnetwork(), you might also like the new tar_watch() app for monitoring progress. Demo: https://wlandau.github.io/posts/2020-12-18-non-blocking-app/.

@wlandau
Copy link
Member

wlandau commented Dec 21, 2020

I mentioned brmstargets as a concept at https://wlandau.github.io/targetopia.html, which I think is sufficient to close the issue in stantargets.

@wlandau wlandau closed this as completed Dec 21, 2020
@paul-buerkner
Copy link

I am happy to hear that you consider adding brms to your list of supported workflows! If there is anything you want or need my input on, please let me know. I am also happy to have a call with those interested in building brmstargets if you think that would help you.

@wlandau
Copy link
Member

wlandau commented Dec 31, 2020

Fantastic, Paul, thank you for your support! I will let you know when I have a chance to start working on brmstargets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants