Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a day-of-week effect to simulated dataset #91

Merged
merged 6 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ repos:
entry: Cannot commit .Rhistory, .RData, .Rds or .rds.
language: fail
files: '\.(Rhistory|RData|Rds|rds)$'
exclude: ^tests/testthat/data/
exclude: '(^tests/testthat/data/)|(^data-raw/)'
#####
# Secrets
- repo: https://github.com/Yelp/detect-secrets
Expand Down
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# RtGam v0.3.0

## Features
* Add day of week effect to simulated timeseries (#91)

# RtGam v0.2.0

This release introduces new functionality for working with RtGam models, including S3 methods for prediction and plotting. It also includes bug fixes, improved handling of input data, and enhanced developer tooling to streamline contributions and maintenance.
Expand Down
65 changes: 33 additions & 32 deletions R/data.R
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
# nolint start: line_length_linter
#' Synthetic dataset of stochastic SIR system with known Rt
#'
#' A dataset from Gostic, Katelyn M., et al. "Practical considerations for
#' measuring the effective reproductive number, Rt." PLoS Computational Biology
#' 16.12 (2020): e1008409. The data are simulated from a stochastic SEIR
#' compartmental model.
#' A simulated dataset derived from Gostic, Katelyn M., et al. "Practical
#' considerations for measuring the effective reproductive number, Rt."
#' PLoS Computational Biology 16.12 (2020): e1008409. The data are simulated
#' from a stochastic SEIR compartmental model. The original timeseries and Rt
#' are available in the `incidence` and `true_rt` columns and with additional
#' columns added or modified to increase noise and add a day-of-week effect.
#'
#' This synthetic dataset has a number of desirable properties:
#'
#' 1. The force of infection changes depending on the Rt, allowing for sudden
#' changes in the Rt. This allows for modeling of sudden changes in infection
#' dynamics, which might otherwise be difficult to capture. Rt estimation
#' framework
#' dynamics, which might otherwise be difficult to capture.
#'
#' 2. The realized Rt is known at each timepoint
#'
Expand All @@ -21,37 +23,32 @@
#' frameworks, providing practical guidance on how to use this dataset to
#' evaluate Rt estimates.
#'
#' In practice, we've found that the amount of observation noise in the
#' incidence and/or observed cases is often undesirably low for testing. Many
#' empirical datasets are much noisier. As a result, models built with these
#' settings in mind can perform poorly on this dataset or fail to converge. We
#' manually add observation noise with `rnbinom(299, mu =
#' stochastic_sir_rt[["obs_cases"]], size = 10)` and the random seed 123456 and
#' store it in the `obs_incidence` column.
#' In practice, we've found that the amount of observation noise is often
#' undesirably low for testing. Many empirical datasets are much noisier. As a
#' result, models built with these realistically noisy settings in mind can
#' perform poorly on this dataset or fail to converge. To better reflect realistic
#' settings, we manually add observation noise and a day-of-week reporting effect.
#'
#' @name stochastic_sir_rt
#' @format `stochastic_sir_rt` A data frame with 301 rows and 12 columns:
#' @format `stochastic_sir_rt` A data frame with 299 rows and 6 columns:
#' \describe{
#' \item{time}{Timestep of the discrete-time stochastic SEIR simulation}
#' \item{date}{Added from the original Gostic, 2020 dataset. A date
#' corresponding to the assigned `time`. Arbitrarily starts on January 1st,
#' 2023.}
#' \item{S, E, I, R}{The realized state of the stochastic SEIR system}
#' \item{dS, dEI, DIR}{The stochastic transition between compartments}
#' \item{incidence}{The true incidence in the `I` compartment at time t}
#' \item{obs_cases}{The observed number of cases at time t from
#' forward-convolved incidence.}
#' \item{obs_incidence}{Added from the original Gostic, 2020 dataset. The
#' `incidence` column with added negative-binomial observation noise.
#' Created with `set.seed(123456)` and the call
#' `rnbinom(299, mu = [["incidence"]], size = 10)` Useful for
#' testing.}
#' \item{true_r0}{The initial R0 of the system (i.e., 2)}
#' \item{true_rt}{The known, true Rt of the epidemic system}
#' \item{reference_date}{The date cases were observed.}
#' \item{true_rt}{The known, true Rt of the epidemic system.}
#' \item{dow}{The magnitude of the day-of-week effect added to the log of `incidence`.}
#' \item{true_cases}{The true number of cases occurring on the `date` in the
#' simulated system before observation noise or day-of-week effects.}
#' \item{obs_cases}{The observed number of cases on `date` after the day-of-week
#' reporting effect and observation noise have been applied to `true_cases`.}
#' \item{obs_cases_no_dow}{The observed number of cases on `date` after observation
#' noise has been applied to `true_cases`. It does not include a day of week effect.}
#' }
#' @source
#' <https://github.com/cobeylab/Rt_estimation/tree/d9d8977ba8492ac1a3b8287d2f470b313bfb9f1d> # nolint
#' <https://github.com/CDCgov/cfa-epinow2-pipeline/pull/17> # nolint
#' <https://github.com/cobeylab/Rt_estimation/tree/d9d8977ba8492ac1a3b8287d2f470b313bfb9f1d>
#' <https://github.com/CDCgov/cfa-epinow2-pipeline/pull/91>
#' <https://github.com/CDCgov/cfa-epinow2-pipeline/pull/17>
#' @references Gostic, Katelyn M., et al. "Practical considerations for measuring the
#' effective reproductive number, R t." PLoS computational biology 16.12
#' (2020): e1008409.
"stochastic_sir_rt"

#' Generation interval corresponding to the sample `stochastic_sir_rt` dataset
Expand All @@ -73,4 +70,8 @@
#' @name sir_gt_pmf
#' @format `sir_gt_pmf` A numeric vector of length 26 that sums to one within
#' numerical tolerance
#' @references Gostic, Katelyn M., et al. "Practical considerations for measuring the
#' effective reproductive number, R t." PLoS computational biology 16.12
#' (2020): e1008409.
# nolint end: line_length_linter
"sir_gt_pmf"
4 changes: 2 additions & 2 deletions R/plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
#' @returns A ggplot2 object
#' @examples
#' fit <- RtGam(
#' stochastic_sir_rt[["obs_incidence"]],
#' stochastic_sir_rt[["date"]]
#' stochastic_sir_rt[["obs_cases"]],
#' stochastic_sir_rt[["reference_date"]]
#' )
#' # Plot draws from the fitted model against the data
#' plot(fit)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ This snippet uses the simulated dataset to demonstrate how to use the main model
library(RtGam)

fit <- RtGam(
cases = stochastic_sir_rt[["obs_incidence"]],
cases = stochastic_sir_rt[["obs_cases"]],
# Randomly chosen date
reference_date = stochastic_sir_rt[["date"]]
)
Expand Down
Binary file added data-raw/gostic2020.rds
Binary file not shown.
44 changes: 44 additions & 0 deletions data-raw/stochastic_sir_rt.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Start with the original Gostic, 2020 dataset
gostic2020 <- readRDS(file.path("data-raw", "gostic2020.rds"))

# Add a date col for package use, abitrarily starting from Jan 1, 2023
gostic2020["reference_date"] <- gostic2020[["time"]] + as.Date("2023-01-01") - 1

# Add an outcome col with a day of week effect

# Use a sine curve to make effect large and obvious
dow_effect <- 0.5 * cos(seq(from = 0, to = pi, length.out = 7))
nrep <- ceiling(nrow(gostic2020) / 7)
# Get dims to match
dow_col <- rep(dow_effect, nrep)[seq_len(nrow(gostic2020))]
gostic2020["dow"] <- dow_col

# Apply dow to log expected cases
gostic2020["incidence_dow"] <- exp(log(gostic2020[["incidence"]]) + dow_col)

# Apply outcome noise
withr::with_seed(12345, {
gostic2020["obs_cases"] <- as.integer(rnbinom(
nrow(gostic2020),
mu = gostic2020[["incidence_dow"]],
size = 10
))
gostic2020["obs_cases_no_dow"] <- as.integer(rnbinom(
nrow(gostic2020),
mu = gostic2020[["incidence"]],
size = 10
))
})

# Subset to cols of interest
cols <- c(
"reference_date",
"true_rt",
"dow",
"incidence",
"obs_cases",
"obs_cases_no_dow"
)

stochastic_sir_rt <- gostic2020[, cols]
usethis::use_data(stochastic_sir_rt, overwrite = TRUE)
Binary file modified data/stochastic_sir_rt.rda
Binary file not shown.
4 changes: 2 additions & 2 deletions man/plot.RtGam.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/sir_gt_pmf.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

61 changes: 30 additions & 31 deletions man/stochastic_sir_rt.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 8 additions & 8 deletions vignettes/RtGam.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -38,22 +38,22 @@ library(ggplot2)
The `RtGam` package requires a time series of observed incident case counts.
For demonstration purposes, the package includes a simulated epidemic dataset, `stochastic_sir_rt`, adapted from @gostic2020practical.

In this example dataset, `obs_incidence` is the vector of observed case counts, and `date` is the corresponding vector of dates.
In this example dataset, `obs_cases` is the vector of observed case counts, and `reference_date` is the corresponding vector of dates.
The dataset has a timeseries of 299 days that we'll subset to 60 days to make plotting easier.


```{r plot data}
data <- stochastic_sir_rt[41:100, ]
ggplot(data) +
geom_point(aes(date, obs_incidence)) +
geom_point(aes(reference_date, obs_cases)) +
theme_bw() +
labs(x = "Reference date", y = "Observed cases")
```

Because this dataset was simulated, we know the true $R_t$ that generated the epidemic timeseries.
```{r plot true rt}
ggplot(data) +
geom_point(aes(date, true_rt)) +
geom_point(aes(reference_date, true_rt)) +
theme_bw() +
labs(x = "Reference date", y = "True Rt") +
scale_y_continuous(
Expand All @@ -71,8 +71,8 @@ The model can automatically select hyperparameters like the smoothing basis dime

```{r fit model}
fit <- RtGam(
cases = data[["obs_incidence"]],
reference_date = data[["date"]]
cases = data[["obs_cases"]],
reference_date = data[["reference_date"]]
)
print(fit)
```
Expand All @@ -84,8 +84,8 @@ See `smooth_dim_heuristic()` and `penalty_dim_heuristic()` for more on how these

```{r fit model again}
fit_non_adaptive <- RtGam(
cases = data[["obs_incidence"]],
reference_date = data[["date"]],
cases = data[["obs_cases"]],
reference_date = data[["reference_date"]],
m = 1
)
print(fit_non_adaptive)
Expand Down Expand Up @@ -147,7 +147,7 @@ plot(
mean_delay = 0,
gi_pmf = sir_gt_pmf
) +
geom_point(aes(date, true_rt), data = data)
geom_point(aes(reference_date, true_rt), data = data)
```

Future development effort will aim to improve model performance on challenging datasets like this example.
Expand Down
Loading