Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datanames slot in teal_transform_module #1327

Closed
wants to merge 15 commits into from

Conversation

gogonzo
Copy link
Contributor

@gogonzo gogonzo commented Aug 21, 2024

Closes #1298

  • Added datanames slot to the teal_transform_module
  • Added tests

Now it works like this:

  1. filter panel displays (and thus filters) datasets which are in module$datanames and in module-transformers$datanames.
  2. summary displays row counts for datanames specified in module$datanames, module-transformers$datanames and everything which have been created in a transformer (for example ANL)
  3. teal_module$server receives always only module$datanames

With above rules, there are following cases possible:

  1. app developer specifies module$datanames = "ADSL" and adds transform which modifies ADSL (doesn't need anything else).
    • If module-transform$datanames = NULL then only ADSL will be displayed in the filter panel
    • if module-transform$datanames = c("ADTTE", "whatever") then "ADTTE" and "whatever" will also be displayed and filtered even if not used by either transform and teal_module. teal_module though will always receive only module$datanames (ADSL).
  2. app developer specifies module$datanames = "ADSL" and adds transform which modifies ADSL based on other datasets (does need other datasets to work)
    • If module-transform$datanames = NULL then only ADSL will be displayed in the filter panel and only ADSL will be filtered. Transform will still work but other datasets won't be filtered
    • If module-transform$datanames = c("ADTTE", "whatever") then both will be displayed, filtered and passed to transform.
  3. app developer specified module$datanames = "ANL" and specifies a transform which uses "ADSL" and "ADTTE" to make ANL.
    • if module-transform$datanames = NULL then nothing will be displayed in the filter panel as there is no information about "ADSL" and "ADTTE". Transform would still work and ANL would be created based on unfiltered ADSL and ADTTE
    • if module-transform$datanames = c("ADSL", "ADTTE") then everything is working as expected. ADSL and ADTTE is displayed in the filter panel and filtered are passed to transform module.
app example
options(
  teal.log_level = "TRACE",
  teal.show_js_log = TRUE,
  # teal.bs_theme = bslib::bs_theme(version = 5),
  shiny.bookmarkStore = "server"
)

# pkgload::load_all("teal.data")
pkgload::load_all("teal")

trans <- list(
  anl_w_datanames = teal_transform_module(
    label = "ANL with datanames",
    ui = function(id) {
      ns <- NS(id)
      tagList(
        div("This transformer adds ANL based on specified ADSL, ADTTE"),
        numericInput(ns("obs"), "Number of subjects", value = 400, min = 0, max = 400)
      )
    },
    server = function(id, data) {
      moduleServer(id, function(input, output, session) {
        reactive({
          within(data(),
            {
              ANL <- dplyr::inner_join(
                head(ADSL, nobs),
                ADTTE[c("STUDYID", "USUBJID", setdiff(colnames(ADTTE), colnames(ADSL)))],
                by = c("USUBJID", "STUDYID")
              )
            },
            nobs = input$obs
          )
        })
      })
    },
    datanames = "ADTTE"
  ),
  anl_wout_datanames = teal_transform_module(
    label = "ANL without datanames",
    ui = function(id) {
      ns <- NS(id)
      tagList(
        div("This transformer adds ANL based on unspecified ADSL, ADTTE"),
        numericInput(ns("obs"), "Number of subjects", value = 400, min = 0, max = 400)
      )
    },
    server = function(id, data) {
      moduleServer(id, function(input, output, session) {
        reactive({
          within(data(),
            {
              ANL <- dplyr::inner_join(
                head(ADSL, nobs),
                ADTTE[c("STUDYID", "USUBJID", setdiff(colnames(ADTTE), colnames(ADSL)))],
                by = c("USUBJID", "STUDYID")
              )
            },
            nobs = input$obs
          )
        })
      })
    }
  ),
  adsl_w_datanames = teal_transform_module(
    label = "modify ADSL",
    ui = function(id) {
      ns <- NS(id)
      tagList(
        div("This transformer modifies ADSL based on specified ADTTE")
      )
    },
    server = function(id, data) {
      moduleServer(id, function(input, output, session) {
        reactive({
          within(data(),
            {
              ADTTE_summary <- ADTTE |>
                dplyr::group_by(STUDYID, USUBJID) |>
                dplyr::summarize(PARAMCD_AVAL = paste(paste(PARAMCD, "-", AVAL), collapse = "; "))
              ADSL <- dplyr::left_join(ADSL, ADTTE_summary)
            },
            nobs = input$obs
          )
        })
      })
    },
    datanames = c("ADSL", "ADTTE")
  ),
  adsl_wout_datanames = teal_transform_module(
    label = "modify ADSL",
    ui = function(id) {
      ns <- NS(id)
      tagList(
        div("This transformer modifies ADSL based on unspecified ADTTE")
      )
    },
    server = function(id, data) {
      moduleServer(id, function(input, output, session) {
        reactive({
          within(data(),
            {
              ADTTE_summary <- ADTTE |>
                dplyr::group_by(STUDYID, USUBJID) |>
                dplyr::summarize(PARAMCD_AVAL = paste(sprintf("%s - %.2f", PARAMCD, AVAL), collapse = "; "))
              ADSL <- dplyr::left_join(ADSL, ADTTE_summary)
            },
            nobs = input$obs
          )
        })
      })
    },
    datanames = "ADSL"
  )
)

data <- teal_data_module(
  once = FALSE,
  ui = function(id) {
    ns <- NS(id)
    tagList(
      numericInput(ns("obs"), "Number of observations to show", 1000),
      actionButton(ns("submit"), label = "Submit")
    )
  },
  server = function(id, ...) {
    moduleServer(id, function(input, output, session) {
      logger::log_trace("example_module_transform2 initializing.")
      eventReactive(input$submit, {
        data <- teal_data(a = NULL) |>
          within(
            {
              logger::log_trace("Loading data")
              ADSL <- head(teal.data::rADSL, n = n)
              ADTTE <- teal.data::rADTTE
              iris <- iris
              iris_raw <- iris
              CO2 <- CO2
              factors <- names(Filter(isTRUE, vapply(CO2, is.factor, logical(1L))))
              CO2[factors] <- lapply(CO2[factors], as.character)
            },
            n = as.numeric(input$obs)
          )
        join_keys(data) <- default_cdisc_join_keys[c("ADSL", "ADTTE")]
        teal.data::datanames(data) <- c("CO2", "ADTTE", "iris_raw", "iris", "ADSL")
        data
      })
    })
  }
)

app <- teal::init(
  data = data,
  modules = list(
    example_module("anl - transform w/ datanames", dataname = "ANL", transformers = trans["anl_w_datanames"]),
    example_module("anl - transform w/o datanames", dataname = "ANL", transformers = trans["anl_wout_datanames"]),
    example_module("adsl - transform w/ datanames", dataname = "ADSL", transformers = trans["adsl_w_datanames"]),
    example_module("adsl - transform w/o datanames", dataname = "ADSL", transformers = trans["adsl_wout_datanames"])
  ),
  filter = teal_slices(
    teal_slice("ADSL", "SEX"),
    teal_slice("ADSL", "AGE", selected = c(18L, 65L)),
    teal_slice("ADTTE", "PARAMCD", selected = "CRSD"),
    include_varnames = list(
      ADSL = c("SEX", "AGE")
    )
  )
)

runApp(app)

@gogonzo gogonzo added the core label Aug 21, 2024
Copy link
Contributor

github-actions bot commented Aug 21, 2024

badge

Code Coverage Summary

Filename                          Stmts    Miss  Cover    Missing
------------------------------  -------  ------  -------  ----------------------------------------------------------------------------------------------------------------------------------------
R/checkmate.R                        24       0  100.00%
R/dummy_functions.R                  49      11  77.55%   30, 32, 44, 55-62
R/get_rcode_utils.R                  12       0  100.00%
R/include_css_js.R                   22      17  22.73%   12-38, 76-82
R/init.R                            112      52  53.57%   107-114, 163-172, 174, 186-207, 218, 236-239, 246-253, 256-257, 259
R/landing_popup_module.R             25      25  0.00%    61-87
R/module_bookmark_manager.R         158     127  19.62%   47-68, 88-138, 143-144, 156, 203, 238-315
R/module_data_summary.R             193      68  64.77%   24-52, 94, 192, 232-272
R/module_filter_data.R               52       2  96.15%   22-23
R/module_filter_manager.R           230      57  75.22%   56-62, 73-82, 90-95, 108-112, 117-118, 291-314, 340, 367, 379, 386-387
R/module_init_data.R                118      11  90.68%   44-53, 67
R/module_nested_tabs.R              166      70  57.83%   33-121, 150, 200, 253, 283
R/module_snapshot_manager.R         216     146  32.41%   89-95, 104-113, 121-133, 152-153, 170-180, 184-199, 201-208, 215-230, 234-238, 240-246, 249-262, 265-273, 304-318, 321-332, 335-341, 355
R/module_teal_data.R                114      21  81.58%   42-48, 87-90, 114-123
R/module_teal_with_splash.R          12      12  0.00%    22-38
R/module_teal.R                     141      92  34.75%   42-145, 182-183, 191
R/module_transform_data.R            56      32  42.86%   17-51
R/modules.R                         170      22  87.06%   148-152, 207-210, 245, 308-309, 360, 372-380
R/reporter_previewer_module.R        19       2  89.47%   30, 34
R/show_rcode_modal.R                 24      24  0.00%    17-42
R/tdata.R                            14      14  0.00%    19-61
R/teal_data_module-eval_code.R       24       0  100.00%
R/teal_data_module-within.R           7       0  100.00%
R/teal_data_module.R                 51       1  98.04%   81
R/teal_data_utils.R                  38       0  100.00%
R/teal_lockfile.R                    56      22  60.71%   34-38, 43-51, 72, 91, 94-102, 109
R/teal_reporter.R                    68       6  91.18%   69, 77, 125-126, 129, 146
R/teal_slices-store.R                29       0  100.00%
R/teal_slices.R                      63       0  100.00%
R/TealAppDriver.R                   353     353  0.00%    52-735
R/utils.R                           193       0  100.00%
R/validate_inputs.R                  32       0  100.00%
R/validations.R                      58      37  36.21%   110-377
R/zzz.R                              12       8  33.33%   3-15
TOTAL                              2911    1232  57.68%

Diff against main

Filename                  Stmts    Miss  Cover
----------------------  -------  ------  --------
R/module_init_data.R        +10       0  +0.86%
R/module_nested_tabs.R       -1       0  -0.25%
R/modules.R                  +8      +1  +0.02%
R/teal_data_module.R         +9      +1  -1.96%
R/teal_data_utils.R          -5       0  +100.00%
TOTAL                       +21      +2  +0.24%

Results for commit: 3bf87ac

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

Copy link
Contributor

github-actions bot commented Aug 21, 2024

Unit Tests Summary

  1 files   25 suites   9m 11s ⏱️
258 tests 252 ✅ 6 💤 0 ❌
535 runs  529 ✅ 6 💤 0 ❌

Results for commit 3bf87ac.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Aug 21, 2024

Unit Test Performance Difference

Test Suite $Status$ Time on main $±Time$ $±Tests$ $±Skipped$ $±Failures$ $±Errors$
shinytest2-landing_popup 💚 $43.42$ $-1.20$ $0$ $0$ $0$ $0$
shinytest2-module_bookmark_manager 💚 $34.53$ $-1.08$ $0$ $0$ $0$ $0$
shinytest2-reporter 💚 $70.29$ $-1.00$ $0$ $0$ $0$ $0$
Additional test case details
Test Suite $Status$ Time on main $±Time$ Test Case
module_teal 👶 $+0.48$ displays_transform_datanames_also_if_specified
module_teal 👶 $+0.39$ doesn_t_receive_extra_transform_datasets_if_module_datanames_all_and_datanames_empty
module_teal 💀 $0.40$ $-0.40$ doesn_t_receive_extra_transform_datasets_not_set_in_datanames_if_module_datanames_all_
module_teal 💀 $0.62$ $-0.62$ receives_all_objects_from_env_except_DATA._raw__when_DATA_is_present_in_the_env_and_module_datanames_all_and_datanames_is_empty
module_teal 👶 $+0.78$ receives_all_possible_objects_and_those_specified_in_transform_datanames_are_filtered
module_teal 👶 $+0.59$ receives_all_possible_objects_while_those_specified_in_module_datanames_are_filtered
module_teal 👶 $+0.61$ receives_datanames_derived_from_env_when_module_datanames_all_and_datanames_is_empty
module_teal 💀 $0.41$ $-0.41$ receives_extra_transform_datasets_if_module_datanames_all_and_datanames_empty
module_teal 👶 $+0.40$ receives_extra_transform_datasets_if_module_datanames_all_only_when_added_to_datanames_in_transform
modules 👶 $+0.00$ module_datanames_is_appended_by_its_transformers_datanames
modules 👶 $+0.00$ module_datanames_is_set_to_all_if_transformer_datanames_is_all_
modules 👶 $+0.00$ module_datanames_stays_all_regardless_of_transformers
utils 💀 $0.00$ $-0.00$ teal_data_datanames_returns_datanames_which_are_set_by_teal.data_datanames
utils 💀 $0.00$ $-0.00$ teal_data_datanames_returns_names_of_the_env_s_objects_when_datanames_not_set

Results for commit 3acef50

♻️ This comment has been updated with latest results.

Copy link
Contributor

@donyunardi donyunardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start!

I have a few comments:

module$datanames not provided

When modules$datanames is not provided (i.e. example_module("adsl - no transformer no datanames in module"), the default value for modules$datanames is all which means the module has access to all datasets.

However, I don’t see the ANL object and only datasets in teal_data_module available to the module. Shouldn't ANL be available in the module or does the user need to specify it explicitly in module$datanames?

teal_transform_module$datanames vs module$datanames

From my understanding of the current design, when teal_transform_module$datanames is provided, it surfaces the datasets to the filter panel, allowing users to subset the datasets (and their parents if they have one), triggering the transformation to be rerun.

teal_module though will always receive only module$datanames (ADSL).

Ultimately, only the datasets specified in module$datanames are available to the module (and its parents if they have one).

Since we’re using the same term datanames, I have a slight concern that this might cause confusion for users, who might think that teal_transform_module$datanames has the same scope or purpose as module$datanames. I may be overthinking this, but I’m curious to hear what everyone else thinks.

Data Summary UI

The format of the Data Summary UI differs when the module only needs the new dataset (e.g., missing a bold header). Might need adjustment for consistency?

Only ANL

With other datasets

@gogonzo
Copy link
Contributor Author

gogonzo commented Aug 22, 2024

When modules$datanames is not provided (i.e. example_module("adsl - no transformer no datanames in module"), the default value for modules$datanames is all which means the module has access to all datasets.

However, I don’t see the ANL object and only datasets in teal_data_module available to the module. Shouldn't ANL be available in the module or does the user need to specify it explicitly in module$datanames

If modules$datanames == "all" then ANL will also be available in summary and in teal_module. "adsl - no transformer no datanames in module" doesn't create ANL. In ("adsl - *") modules, ADSL is modified based on ADTTE.

From my understanding of the current design, when teal_transform_module$datanames is provided, it surfaces the datasets to the filter panel, allowing users to subset the datasets (and their parents if they have one), triggering the transformation to be rerun.

Exactly.

Ultimately, only the datasets specified in module$datanames are available to the module (and its parents if they have one).

Exactly.

Since we’re using the same term datanames, I have a slight concern that this might cause confusion for users, who might think that teal_transform_module$datanames has the same scope or purpose as module$datanames. I may be overthinking this, but I’m curious to hear what everyone else thinks.

I agree, this is why I didn't just @inheritParams module in the documentation of teal_transform_module. I think this is a relevant concern and I would focus on the documentation and maybe parameter naming. I'll think more about datanames in general.

The format of the Data Summary UI differs when the module only needs the new dataset (e.g., missing a bold header). Might need adjustment for consistency?

Good point. I'll fix it here 👍. Fixed, it was a problem of en example which had a failure for "anl - transform w/o datanames"

@donyunardi
Copy link
Contributor

donyunardi commented Aug 22, 2024

If modules$datanames == "all" then ANL will also be available in summary and in teal_module. "adsl - no transformer no datanames in module" doesn't create ANL. In ("adsl - *") modules, ADSL is modified based on ADTTE.

I don't see ANL available to the module when I don't provide the module$datanames argument or if I set module$datanames = 'all'

Example Code
options(
  teal.log_level = "TRACE",
  teal.show_js_log = TRUE,
  # teal.bs_theme = bslib::bs_theme(version = 5),
  shiny.bookmarkStore = "server"
)

pkgload::load_all("../teal.data")
pkgload::load_all("../teal")

anl <- teal_transform_module(
  label = "ANL with datanames",
  ui = function(id) {
    ns <- NS(id)
    tagList(
      div("This transformer adds ANL based on specified ADSL, ADTTE"),
      numericInput(ns("obs"), "Number of subjects", value = 400, min = 0, max = 400)
    )
  },
  server = function(id, data) {
    moduleServer(id, function(input, output, session) {
      reactive({
        within(data(),
          {
            ANL <- dplyr::inner_join(
              head(ADSL, nobs),
              ADTTE[c("STUDYID", "USUBJID", setdiff(colnames(ADTTE), colnames(ADSL)))],
              by = c("USUBJID", "STUDYID")
            )
          },
          nobs = input$obs
        )
      })
    })
  }
)

data <- teal_data_module(
  once = FALSE,
  ui = function(id) {
    ns <- NS(id)
    tagList(
      numericInput(ns("obs"), "Number of observations to show", 1000),
      actionButton(ns("submit"), label = "Submit")
    )
  },
  server = function(id, ...) {
    moduleServer(id, function(input, output, session) {
      logger::log_trace("example_module_transform2 initializing.")
      eventReactive(input$submit, {
        data <- teal_data(a = NULL) |>
          within(
            {
              logger::log_trace("Loading data")
              ADSL <- head(teal.data::rADSL, n = n)
              ADTTE <- teal.data::rADTTE
              iris <- iris
              iris_raw <- iris
              CO2 <- CO2
              factors <- names(Filter(isTRUE, vapply(CO2, is.factor, logical(1L))))
              CO2[factors] <- lapply(CO2[factors], as.character)
            },
            n = as.numeric(input$obs)
          )
        join_keys(data) <- default_cdisc_join_keys[c("ADSL", "ADTTE")]
        teal.data::datanames(data) <- c("CO2", "ADTTE", "iris_raw", "iris", "ADSL")
        data
      })
    })
  }
)

app <- teal::init(
  data = data,
  modules = list(
    example_module("anl - no module$datanames provided", transformers = list(anl)),
    example_module("anl - module$datanames set to all", datanames = "all", transformers = list(anl))
  ),
  filter = teal_slices(
    teal_slice("ADSL", "SEX"),
    teal_slice("ADSL", "AGE", selected = c(18L, 65L)),
    include_varnames = list(
      ADSL = c("SEX", "AGE")
    )
  )
)

runApp(app)

@gogonzo Do you get a different result?

SessionInfo
r$> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Ventura 13.6.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] teal_0.15.2.9056      teal.slice_0.5.1.9010 shiny_1.8.1.1         teal.data_0.6.0.9008  testthat_3.2.1.1      teal.code_0.5.0.9007 

loaded via a namespace (and not attached):
 [1] gtable_0.3.5             xfun_0.43                bslib_0.7.0              ggplot2_3.5.1            htmlwidgets_1.6.4        shinyjs_2.1.0           
 [7] teal.widgets_0.4.2.9017  vctrs_0.6.5              tools_4.4.1              generics_0.1.3           parallel_4.4.1           tibble_3.2.1            
[13] fansi_1.0.6              pkgconfig_2.0.3          data.table_1.15.4        checkmate_2.3.1          desc_1.4.3               lifecycle_1.0.4         
[19] compiler_4.4.1           brio_1.1.5               munsell_0.5.1            fontawesome_0.5.2        codetools_0.2-20         httpuv_1.6.15           
[25] shinyWidgets_0.8.6       htmltools_0.5.8.1        sass_0.4.9               lazyeval_0.2.2           yaml_2.3.8               plotly_4.10.4           
[31] tidyr_1.3.1              later_1.3.2              pillar_1.9.0             jquerylib_0.1.4          cachem_1.0.8             mime_0.12               
[37] parallelly_1.37.1        tidyselect_1.2.1         digest_0.6.35            future_1.33.2            purrr_1.0.2              dplyr_1.1.4             
[43] listenv_0.9.1            rprojroot_2.0.4          fastmap_1.1.1            grid_4.4.1               colorspace_2.1-0         cli_3.6.2               
[49] logger_0.3.0             magrittr_2.0.3           pkgbuild_1.4.4           utf8_1.2.4               withr_3.0.0              teal.reporter_0.3.1.9011
[55] scales_1.3.0             promises_1.3.0           backports_1.4.1          httr_1.4.7               rmarkdown_2.26           globals_0.16.3          
[61] memoise_2.0.1            evaluate_0.23            knitr_1.46               shinycssloaders_1.0.0    viridisLite_0.4.2        rlang_1.1.3             
[67] Rcpp_1.0.12              xtable_1.8-4             glue_1.7.0               formatR_1.14             renv_1.0.7               pkgload_1.3.4           
[73] jsonlite_1.8.8           teal.logger_0.2.0.9006   R6_2.5.1

@gogonzo
Copy link
Contributor Author

gogonzo commented Aug 23, 2024

If modules$datanames == "all" then ANL will also be available in summary and in teal_module. "adsl - no transformer no datanames in module" doesn't create ANL. In ("adsl - *") modules, ADSL is modified based on ADTTE.

I don't see ANL available to the module when I don't provide the module$datanames argument or if I set module$datanames = 'all'

Yeah, thanks for this example. I have something to fix here 👍

@kumamiao
Copy link
Contributor

kumamiao commented Aug 23, 2024

Thank you @gogonzo for the PoC! The logic listed makes sense.

Since we’re using the same term datanames, I have a slight concern that this might cause confusion for users, who might think that teal_transform_module$datanames has the same scope or purpose as module$datanames. I may be overthinking this, but I’m curious to hear what everyone else thinks.

I agree, this is why I didn't just @inheritParams module in the documentation of teal_transform_module. I think this is a relevant concern and I would focus on the documentation and maybe parameter naming. I'll think more about datanames in general.

Just wanted to echo with @donyunardi 's concern on the parameter name for datanames here, would be great to name it differently, thanks @gogonzo for trying to address this.

@gogonzo Do you get a different result?

I get the same result as you do, I can only get ANL when I set module$datanames to ANL.

@gogonzo
Copy link
Contributor Author

gogonzo commented Aug 23, 2024

@kumamiao @donyunardi
Dony's case reminded me what I forgot to mention. If module@datanames == "all" then module gets datanames specified in teal_data object. But app developer doesn't need to specifiy datanames(data) <- because if object doesn't contain any proxy variables then all bindings in teal_data are used. This means that the process of determining datanames for the module is multi-staged:

  • module$datanames if specified
  • else datanames(data) if specified
  • else ls(data@env)

Same applies to the module-transform$datanames when determining datanames for transform. This gets little complicated and I'll spend some time on simplifying this.

@gogonzo
Copy link
Contributor Author

gogonzo commented Aug 23, 2024

@donyunardi @kumamiao
The bug you've detected is marked with the red color on this diagram. This is how datanames are determined for teal_module and currently this (red) condition is not satisfied. I'll fix this but in the mean time we can discuss datanames topic in general.

image

Note: Sidebar gets datanames based on the same logic but replace "module gets:" with "sidebar gets:" and module$datanames with union(module$datanames, module-transform$datanames)(because sidebar needs to receive datanames needed by module and needed by transform).

In my opinion this is getting too complicated:

  1. I've raised already my opinion about datanames = NULL that it shouldn't be handled this way. I can't find the issue where we had the discussion about this
  2. Do we really need the whole logic when module$datanames = "all", maybe datanames(data) should be ignored and we should show all datasets (ls(<teal_data>@env)) when module$datanames = "all" or some when module$datanames = <some datasets>. This will require app developer always specify module$datanames when @env contains some temp variables created in the preprocessing. Diagram would look like this
image

@donyunardi
Copy link
Contributor

I've raised already my opinion about datanames = NULL that it shouldn't be handled this way. I can't find the issue where we had the discussion about this

We discussed it here.

maybe datanames(data) should be ignored and we should show all datasets (ls(<teal_data>@env)) when module$datanames = "all" or some when module$datanames = . This will require app developer always specify module$datanames when @env contains some temp variables created in the preprocessing.

I’m open to this approach. Syntactically, it makes sense to me. If a user specifies datanames = 'all', then teal will provide all the objects in teal_data. However, as you pointed out, this might include objects other than data.frame. Initially, I considered having teal filter these objects to show only data.frame objects, but I’ve reconsidered. It would be better for the module itself to handle this filtering. The teal module can be designed to display only data.frame objects (or other specific objects) as needed.

For now, I'm leaning toward just giving the user all objects.

If we decide to move forward with this behavior, we'll need to carefully assess how it will affect all the modules that currently have datanames = 'all' in both tmg and tmc.

Idea

Have you considered adding a new argument to the transformer to specify additional data objects that should be appended to the teal_data object?

Code can look something like this:

teal_transform_module <- function(label, ui, server, datanames, data_to_be_added_to_teal_data)

User can define the value(s) of the new_data_object to specify which objects they want to pull from the transform module. Behind the scene, we will then append this information: datanames(data) <- c(datanames(data), data_to_be_added_to_teal_data).

@gogonzo
Copy link
Contributor Author

gogonzo commented Aug 25, 2024

Have you considered adding a new argument to the transformer to specify additional data objects that should be appended to the teal_data object?

I don't think anything except module$datasets should decide what to include in teal_data which is passed to the module.
I previously (two or three weeks ago) thought that one can just "extend" datanames in the teal_transform_module$server by evaluating this

datanames(data) <- c(datanames(data), "added_in_transform")

And I thought this makes sense back then, but after a short period of time even I forgot about this, which confirms that it datanames logic might be too complex to remember. This is why I proposed simplest solution (with all consequences)


If we decide to ignore datanames(<teal data>) in teal then <teal_data>@datanames wouldn't make so much sense at this moment. I suggest to spend more time on discussing this. Currently we need following datanames in the app:

  1. Sidebar might need extra datanames if teal_transform_module needs other-datasets to modify target-dataset
  2. teal_module$server receives specific datanames when specified
  3. teal_module$server receives all datanames when datanames == "all" (this can include connection objects etc.). What "all" means is the key of the discussion. It definitely doesn't mean all-datasets but for example:
  • in tm_variable_browser means all data.frames
  • in tm_data_table means all data.frames
  • what else?

teal.data::datanames is a handy way to limit datanames but it doesn't address (3) entirely. For example if we have a tm_data_table which uses utilizes data.frames only but displays other datasets anyway. Since "all" is not specific enough, maybe we need datanames to be:

  • for eager selection: character vector for specific datanames
  • for dynamic selection: function which takes teal_data and returns datanames which satisfy the condition:
    • everything(): equivalent of today's "all"
    • all_off(class): returns datanames of specific class
    • all_if(condition): returns datanames which satisfy condition, for example all_if(nrow(.data) > 5)
    • matches(match): returns datanames which matches regexp

This should be adjusted as much as possible with tidyselect. We tried to do something similar in teal.transfrorm for dynamic columns selection. We liked idea of introducing more functions, like all_numeric(), all_factors()

@gogonzo
Copy link
Contributor Author

gogonzo commented Aug 26, 2024

Let me rephrase once again my thoughts. We need to know few things:

  1. module$datanames defines what teal_module needs.
  2. We need a way to specify teal_transform_module needs - this is a subject of this PR and I enabled it though transform$dataname
  3. I believe that we can't just base on module$datanames as teal_transform_module might need completely different datasets in order to create/modify datasets needed by the teal_module
  4. teal_transform_module might create a new dataset which should be included in the teal_module if its name matches module$datanames - matches either by being an element of module$datanames or when module$datanames == "all"

I think that above points define exactly what we need and we can discuss how provide simple logic for app-developer and not overcomplicate internal code. In this PR I've added $datanames to the teal_transform_module to optionally specify what datasets are needed for teal_transform_module to perform. Since we pass data in following order: filter-panel -> teal_transform_module -> teal_module, we need filter-panel to display and filter datanames = c(transform$datanames, module$datanames). This is why I think (1) (2) is unavoidable and shouldn't be discussed.


Issue: module$datanames == "all" and new datanames added in transform.

The real problem is what @donyunardi noticed that if module$datanames == "all" and if teal_data@datanames is specified, then dataset is ignored (because it is not specified in the teal_data object. This is the logic from the moment when @donyunardi made a comment:

image

Given above, I've made a fix and introduced an extra step in teal internals to figure out what is needed (automatically), let's call it a solution 1 - determine datanames automatically:

image

Then I realised that this gets pretty complicated and provoked discussion, by illustrating solution 2 - simply determine datanames from the teal_data environment

image

Do we really need the whole logic when module$datanames = "all", maybe datanames(data) should be ignored and we should show all datasets (ls(<teal_data>@env)) when module$datanames = "all" or some when module$datanames = . This will require app developer always specify module$datanames when @env contains some temp variables created in the preprocessing. Diagram would look like this

In Today's meeting @pawelru proposed a solution 3 - app developer needs to add extra items to @datanames, but IMO forcing app-developer to append datanames only when data@datanames is specified and something new is added is to specific condition, which would result that app-developer might not remember what to specify when (currently needs to understand relation between module$datanames, data@datanames)

image

So I said, instead of forcing app-developer to update datanames in specific scenario, we should force it always, so that @datanames is always up to date scenario 4 - allways specify @datanames

image

I personally like (4) and (2). I would like (2) the most if we find a simple way to exclude proxy datanames not needed for teal_module.

@kumamiao
Copy link
Contributor

Hi @gogonzo , thank you so much for the quick and comprehensive summary!!

Like you, I like (2) the most if we find a good way to only get needed data for teal_module, but otherwise I do not have strong preferences on either (2) or (4), both would make sense. To me personally, I feel (2) gives more flexibility to the user - if I'm understanding this, if user does not like all data@env objects to be displayed, user can choose to do what they need to do in option (4)?

I tested the current PR and things are working well, but could you please check if listed below are the expected behavior under option (2)?
a. In the last 2 modules, I specified module$datanames, and the subset of dataframes are valid on the left encoding panel, but the Active Filter Summary still displays both subset of dataframe + all newly added env objects
b. newly created data in transform are not listed under Active Filter Variables?

app example
options(
  teal.log_level = "TRACE",
  teal.show_js_log = TRUE,
  # teal.bs_theme = bslib::bs_theme(version = 5),
  shiny.bookmarkStore = "server"
)

pkgload::load_all("../teal.data")
pkgload::load_all("../teal")

anl <- teal_transform_module(
  label = "ANL with datanames",
  ui = function(id) {
    ns <- NS(id)
    tagList(
      div("This transformer adds ANL based on specified ADSL, ADTTE"),
      numericInput(ns("obs"), "Number of subjects", value = 400, min = 0, max = 400)
    )
  },
  server = function(id, data) {
    moduleServer(id, function(input, output, session) {
      reactive({
        within(data(),
               {
                 sub_iris <- head(iris)
                 ANL <- dplyr::inner_join(
                   head(ADSL, nobs),
                   ADTTE[c("STUDYID", "USUBJID", setdiff(colnames(ADTTE), colnames(ADSL)))],
                   by = c("USUBJID", "STUDYID")
                 )
                 tmp_obj1 <- 100
                 tmp_obj2 <- "test"
               },
               nobs = input$obs
        )
      })
    })
  }
)

data <- teal_data_module(
  once = FALSE,
  ui = function(id) {
    ns <- NS(id)
    tagList(
      numericInput(ns("obs"), "Number of observations to show", 1000),
      actionButton(ns("submit"), label = "Submit")
    )
  },
  server = function(id, ...) {
    moduleServer(id, function(input, output, session) {
      logger::log_trace("example_module_transform2 initializing.")
      eventReactive(input$submit, {
        data <- teal_data(a = NULL) |>
          within(
            {
              logger::log_trace("Loading data")
              ADSL <- head(teal.data::rADSL, n = n)
              ADTTE <- teal.data::rADTTE
              iris <- iris
              iris_raw <- iris
              CO2 <- CO2
              factors <- names(Filter(isTRUE, vapply(CO2, is.factor, logical(1L))))
              CO2[factors] <- lapply(CO2[factors], as.character)
            },
            n = as.numeric(input$obs)
          )
        join_keys(data) <- default_cdisc_join_keys[c("ADSL", "ADTTE")]
        teal.data::datanames(data) <- c("CO2", "ADTTE", "iris_raw", "iris", "ADSL")
        data
      })
    })
  }
)

app <- teal::init(
  data = data,
  modules = list(
    example_module("NULL", datanames = NULL, transformers = list(anl)),
    example_module("no call", transformers = list(anl)),
    example_module("all", datanames = "all", transformers = list(anl)),
    example_module("ANL", datanames = "ANL", transformers = list(anl)),
    example_module("ANL+iris", datanames = c("ANL", "iris"), transformers = list(anl))
  ),
  filter = teal_slices(
    teal_slice("ADSL", "SEX"),
    teal_slice("ADSL", "AGE", selected = c(18L, 65L)),
    include_varnames = list(
      ADSL = c("SEX", "AGE")
    )
  )
)

runApp(app)

@gogonzo
Copy link
Contributor Author

gogonzo commented Aug 28, 2024

In the last 2 modules, I specified module$datanames, and the subset of dataframes are valid on the left encoding panel, but the Active Filter Summary still displays both subset of dataframe + all newly added env objects

Good point @kumamiao I need to fix it

I feel (2) gives more flexibility to the user - if I'm understanding this, if user does not like all data@env objects to be displayed, user can choose to do what they need to do in option (4)?

If there is an extra conditions in (2) then it will be basically (4)


At this moment, it feels like (4) is the most relevant:

  • app developer doesn't need to specify data@datanames in initial data. All objects from initial data@env will determine data@datanames (automatically set in teal on app initialization)
  • app developer can specify data@datanames to limit datanames
  • in teal_transform_module app developer doesn't have to specify anything if transform uses datanames specified by module$datanames
  • app developer needs to specify teal_transform_module(datanames) if transform uses any datasets not included in module$datanames
  • app developer needs to add to data@datanames when module$datanames == "all" and when transform adds some new dataset which has to be passed to module.

@donyunardi
Copy link
Contributor

donyunardi commented Aug 28, 2024

Thanks for the laying down all the conditions nicely @gogonzo.

Could you help me understand this statement?

app developer needs to add to data@datanames when module$datanames == "all" and when transform adds some new dataset which has to be passed to module.

When/where would user add/update data@datanames? Would it be in the server of teal_transform_module?
if so, then if you look at @kumamiao example code, she has this line:

example_module("all", datanames = "all", transformers = list(anl)),

So here,module$datanames is all and there is no statement to add ANL object in data$datanames inside the server of anl teal_transform_module. However, I see ANL is being passed to the module.

From what I observed, it doesn't look like app developer needs to add/update data$datanames when module$datanames == "all".

@gogonzo
Copy link
Contributor Author

gogonzo commented Aug 29, 2024

From what I observed, it doesn't look like app developer needs to add/update data$datanames when module$datanames == "all"

Sorry @donyunardi my bad - I didn't remove lines which were always adding everything from teal_transform_module - now it works as described.

@gogonzo
Copy link
Contributor Author

gogonzo commented Sep 2, 2024

Closing this PR in favour of #1334

@gogonzo gogonzo closed this Sep 2, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 2, 2024
@gogonzo gogonzo deleted the 1298_transform_datanames@main branch September 2, 2024 06:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: filter panel for related datanames of ANL are not shown
3 participants