Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible scenario comparison functionality #221

Closed
pratikunterwegs opened this issue Apr 24, 2024 · 3 comments
Closed

Possible scenario comparison functionality #221

pratikunterwegs opened this issue Apr 24, 2024 · 3 comments
Assignees
Labels
Scenarios Related to scenario comparison functionality

Comments

@pratikunterwegs
Copy link
Collaborator

This issue logs discussions and prototypes for proposed scenario comparison functionality. Please feel free to add to the conversation below.

Scenario comparison requirements

  1. User-friendly; should be possible to compare scenarios in 1 - 2 functions and < 10 lines of code;
  2. Inputs and outputs should be inter-operable with non-{epidemics} outputs; may have better functionality for {epidemics} outputs (?);
  3. Should be make like-for-like comparisons (matching parameter sets and replicates across scenarios);
  4. Flexibly select outcomes: recoveries or cases, and deaths where available (?);
  5. Return raw differences or a summary of differences.

Discussion points

  1. Should there be a way to prevent outputs from different model structures from being compared?
  2. How should scenarios be identified? e.g. as a simple numeric identifier based on position in a list, or is a unique identifier necessary?

A small reprex shows my thinking on this so far, please feel free to suggest changes.

library(epidemics)

population_size <- 14e6

# prepare initial conditions as proportions
initial_conditions <- c(
  S = population_size - 11, E = 10, I = 1, H = 0, F = 0, R = 0
) / population_size

# prepare a <population> object
guinea_population <- population(
  name = "Guinea",
  contact_matrix = matrix(1), # note dummy value
  demography_vector = 14e6, # 14 million, no age groups
  initial_conditions = matrix(
    initial_conditions,
    nrow = 1
  )
)

population = guinea_population

replicates = 50

# generate set of intervention + baseline scenarios to compare
intervention_set <- list(
  baseline = NULL,
  response = list(
    transmission_rate = intervention(
      "reduce_beta", "rate", 50, 100, 0.5
    )
  ),
  response2 = list(
    transmission_rate = intervention(
      "reduce_beta", "rate", 1, 100, 0.5
    )
  )
)

data = model_ebola(
  population,
  # transmission_rate = transmission_rate,
  intervention = intervention_set,
  replicates = replicates
)

outcomes_averted = function(baseline,
                            scenarios,
                            summarise = TRUE,
                            by_group = TRUE,
                            ...) {
  
  # collect arguments to `epidemic_size()`
  args = list(...) # not used currently
  
  # get epidemic size for baseline and response scenarios
  baseline_outcomes = epidemic_size(baseline, simplify = FALSE, by_group = by_group)
  scenario_outcomes = lapply(scenarios, epidemic_size, simplify = FALSE, by_group = by_group)
  
  # get differences between response and baseline in terms of epidemic size
  averted = vapply(scenario_outcomes, function(df) {
    df$value - baseline_outcomes$value
  }, FUN.VALUE = numeric(nrow(baseline_outcomes)))
  
  # return summarised values or raw differences
  if (summarise) {
    averted_summary = apply(averted, 2, function(x) {
      averted_median = mean(x)
      averted_lims = quantile(x, probs = c(0.05, 0.95))
      names(averted_lims) = sprintf("quantile_%s", c("05", "95"))
      c(median = averted_median, averted_lims)
    })
    
    averted_summary = as.data.frame(t(averted_summary))
    averted_summary$scenario = seq_len(nrow(averted_summary))
    
    # return difference summary
    averted_summary
  } else {
    averted = as.data.frame(averted)
    colnames(averted) = sprintf("scenario_%s", seq_along(averted))
    if (nrow(averted) > 1) {
      averted$replicate = seq_len(nrow(averted))
    }
    
    # return raw difference data
    averted
  }
}

outcomes_averted(
  baseline = data[scenario == 1]$data[[1]],
  scenarios = data[scenario != 1]$data
)
#>    median quantile_05 quantile_95 scenario
#> 1 -228.78     -436.95       47.95        1
#> 2 -359.30     -527.05     -151.70        2

Created on 2024-04-24 with reprex v2.0.2

@pratikunterwegs pratikunterwegs self-assigned this Apr 24, 2024
@pratikunterwegs pratikunterwegs added the Scenarios Related to scenario comparison functionality label Apr 24, 2024
@adamkucharski
Copy link
Member

Thanks for sharing. Some quick initial thoughts:

  • Reflecting on past projects comparing outputs, the big bottlenecks where a lot of code is wasted have been book-keeping (e.g. iterating over scenarios and keeping track of which one is which), storage and retreival (e.g. recording relevant model outputs so they can be analysed later) and trajectory matching and summarising (e.g. pulling the case incidence out of each stored object then comparing total cases averted – and in the case of a stochastic model, or simulations from a posterior with parameter uncertainty, making sure that the code is doing like-for-like comparisons of the same parameter/seed in each sample)
  • There may be a lot of different things a user wants to compare (too many to predict), so although outcomes_averted() is a useful starting point, making data[scenario == 1]$data[[1]] etc. much easier to handle would be useful, e.g. maybe with something intuitive like get_outcome(data,"cases",1)

@pratikunterwegs
Copy link
Collaborator Author

Thanks, would definitely be good to discuss this more before implementing something as it might take some restructuring of inputs etc.

(e.g. iterating over scenarios and keeping track of which one is which)

What would users expect in terms of keeping track of scenarios? The model output currently makes scenarios from the input NPI + vax combinations automatically. Is there a call for changing how that's done - e.g. passing a full intervention set (NPIs + vaccinations) as a single scenario?

making sure that the code is doing like-for-like comparisons of the same parameter/seed in each sample

A question here is whether discrete values of infection parameters, e.g. $R_0$ = 1.5, 2.0, 2.5, should count as separate scenarios, because currently they are treated similar to parameter uncertainty around single values. Might need to pass the parameters, and some way of specifying uncertainty around them, to the 'scenario' object from above.

@pratikunterwegs
Copy link
Collaborator Author

Partially addressed in the v0.4.0 release. Moving to Discussions page.

@epiverse-trace epiverse-trace locked and limited conversation to collaborators Jun 10, 2024
@pratikunterwegs pratikunterwegs converted this issue into discussion #245 Jun 10, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Scenarios Related to scenario comparison functionality
Development

No branches or pull requests

2 participants