add_traceability errors #19

kyleam · 2023-08-17T16:34:00Z

I started to run render_scorecard with add_traceability = TRUE and hit into many errors. Before aborting, I let it run for a bit (ended up being 79 packages) to hopefully capture a good representation of the different types of errors.

Here are the different classes of errors that I spot. The packages listed under each class are just what happened to be in that first set of 79 packages.

no testthat directory found
```
no `testthat` directory found at /tmp/Rtmp3DnehF/SCORECARD_1310e2ddecd/AER/tests
```
packages: AER (1.2-10), BB (2019.10-1), DEoptimR (1.0-14), DEoptim (2.2-8), DT (0.28), DistributionUtils (0.6-0), Formula (1.2-5), GPArotation (2023.3-1), KernSmooth (2.23-22), MASS (7.3-60), MatrixModels (0.5-2), Matrix (1.6-0), PKI (0.1-12), PerformanceAnalytics (2.0.4)
no testing directory found
```
no testing directory found at /tmp/Rtmp3DnehF/SCORECARD_131047538e42/ATE
```
packages: ATE (0.2.0), AUC (0.3.2), BiasedUrn (2.0.10), DiagrammeRsvg (0.1), DiceDesign (1.9), Epi (2.47.1), FME (1.3.6.3), FNN (1.1.3.2), GA (3.2.3), GenSA (1.1.9), HDInterval (0.2.4), JM (1.5-2), JMbayes (0.8-85), Lahman (11.0-0), MCMCpack (1.6-3), MatchIt (4.5.4), NMF (0.26), PKPDdatasets (0.1.0.1494604782), PKPDmodels (0.3.2), PK (1.3-5)
basename error
```
ℹ In argument: `code_file = paste0("R/", basename(.data$code_file))`.
Caused by error in `basename()`:
! a character vector argument expected
```
packages: ALEPlot (1.1), Cairo (1.6-0), CompQuadForm (1.4.3), Exact (3.2), KMsurv (0.1-5), Kendall (2.2.1), LearnBayes (2.15.1), NonCompart (0.6.0), PEIP (2.2-3), PFIM (5.0), PKreport (1.5)
.init error
```
Must supply `.init` when `.x` is empty.
```
packages: AsioHeaders (1.22.1-2), BH (1.81.0-1)

invalid regular expression

ℹ In index: 1.
Caused by error in `grep()`:
! invalid regular expression '^\s*%(]%\s*(<\-|=)\s*function\s*.*|^\s*setGeneric\s*\(\s*["|']%(]%["|'].*', reason 'Missing ')''

packages: DescTools (0.99.49), Hmisc (4.7-2)

I'm putting this in "First Release" milestone given that gh-7 has "[s]et add_traceability = TRUE to the default argument".

The text was updated successfully, but these errors were encountered:

seth127 · 2023-08-17T18:48:11Z

Good catches @kyleam, and yes I think we should address this before the first release. At the very least, we should capture these errors and print a "could not render Traceability Matrix" in the doc, instead of blowing up.

Let's chat about some of this and then document the outcome of that in this issue. Thanks.

barrettk · 2023-08-17T18:57:00Z

I think we made an incorrect assumption that the man files should also map to an R file as well. The MASS package triggers a lot of warnings for not finding R scripts, but the man files are all pointing to RDS files (they don't actually point to it, but I figured out that that's where they came from).

Cant decide if we should just ditch the warning, or have a verbose argument or something that we only use for our packages. I think these warnings could be helpful for catching issues with our packages (caught one for pmforest a while back), but it's definitely problematic for older R packages.

In package `MASS`, could not find an R script associated with man file: abbey.Rd
In package `MASS`, could not find an R script associated with man file: accdeaths.Rd
In package `MASS`, could not find an R script associated with man file: addterm.Rd
In package `MASS`, could not find an R script associated with man file: Aids2.Rd
In package `MASS`, could not find an R script associated with man file: Animals.Rd
In package `MASS`, could not find an R script associated with man file: anorexia.Rd
In package `MASS`, could not find an R script associated with man file: anova.negbin.Rd
In package `MASS`, could not find an R script associated with man file: area.Rd
In package `MASS`, could not find an R script associated with man file: bacteria.Rd
In package `MASS`, could not find an R script associated with man file: bandwidth.nrd.Rd
In package `MASS`, could not find an R script associated with man file: bcv.Rd
In package `MASS`, could not find an R script associated with man file: beav1.Rd
In package `MASS`, could not find an R script associated with man file: beav2.Rd
In package `MASS`, could not find an R script associated with man file: biopsy.Rd
In package `MASS`, could not find an R script associated with man file: birthwt.Rd
In package `MASS`, could not find an R script associated with man file: Boston.Rd
In package `MASS`, could not find an R script associated with man file: boxcox.Rd

... keeps going ...

In package `MASS`, the R scripts (R/add.R, R/boxcox.R, R/contr.sdif.R, R/corresp.R, R/cov.trob.R, R/dose.p.R, 
R/eqscplot.R, R/fitdistr.R, R/fractions.R, R/gamma.shape.R, R/glmmPQL.R, R/hist.scott.R, R/huber.R, R/hubers.R, 
R/isoMDS.R, R/kde2d.R, R/lda.R, R/lm.ridge.R, R/loglm.R, R/logtrans.R, R/lqs.R, R/mca.R, R/misc.R, R/neg.bin.R,
 R/negbin.R, R/negexp.R, R/parcoord.R, R/qda.R, R/rlm.R, R/sammon.R, R/stdres.R, R/ucv.R, R/write.matrix.R) 
are missing documentation for the following exports: 
dropterm
boxcox
contr.sdif

... keeps going ...

kyleam · 2023-08-17T18:58:59Z

incorrect assumption that the man files should also map to an R file as well.

Perhaps that's behind a lot (or even all?) of the noise I reported in gh-21.

barrettk · 2023-08-17T19:12:44Z

Perhaps that's behind a lot (or even all?) of the noise I reported in #21.

Yes it definitely is. I am currently fixing this with a verbose argument that defaults to FALSE.

This was set it Trac matrix #22

barrettk · 2023-08-18T14:53:17Z

Just FYI The first two issues were addressed in #22

Edit: The remainder of these issues, and some other edge cases are addressed in #31

…defined on the next line: - previously, functions defined as follows could not be captured because they were separated into multiple elements: hello <- function(x){} - by collapsing to a single string, the optional white space now works for new lines as well. This was the core issue for the `Exact` package basename error in #19. - additional testing of this is required.

…hat were written by hand, and not generated via roxygen. In these cases the man files did not reference the R script the function originated in. - We could just say these packages dont have documentation/were not found, but I added support for at least attempting to find the scripts. - In this commit, I look for the function name as an R file if none were found. In other words, man/my_function.Rd would lead to a search for R/my_function.R. We could further improve this searching, but Im sure we dont want to go crazy with it either. This is a minimal patch that works for the `Exact (3.2)` mentioned in #19. (basename error)

- was the issue for Cairo 1.6-0 in #19

kyleam · 2023-08-29T13:52:05Z

@barrettk I know you're leaning toward using an eval-based approach for a getting a set of top-level functions declared in a file. If we wanted to avoid evaluating, I think we could get pretty far without diving too deep into the parse data. In case it's a useful start, here's an example. This could miss some unusual metaprogramming-based declarations and might need to be extended for other constructs of interest, but it may be good enough for our purposes.

code

#' Return top-level functions and generics from files
#'
#' @param files Character vector of files to parse and search.
#'
#' @return A list the same length as `files`. Each element is the set of
#'   functions and generics found in the given file. If `files` is a named
#'   vector, carry those over to the return value. Otherwise set the names to
#'   the relative file names.
#' @noRd
get_toplevel_declarations <- function(files) {
  if (is.null(names(files))) {
    names(files) <- fs::path_rel(files)
  }
  purrr::map(
    files,
    function(f) {
      pd <- tryCatch(parse(file = f, keep.source = TRUE), error = identity)
      if (inherits(pd, "error")) {
        warning("Failed to parse ", f, ": ", conditionMessage(pd))
        return(character())
      }
      data <- utils::getParseData(pd)
      c(
        toplevel_functions(data),
        toplevel_generics(data)
      )
    }
  )
}

toplevel_functions <- function(data) {
  # Find top-level functions that have the following parse structure:
  #
  #   root (id = 0)
  #    `--expr                             -> node1
  #        ¦--expr                         -> node2
  #        ¦   `--SYMBOL: {name}           -> node3
  #        ¦--LEFT_ASSIGN or EQ_ASSIGN
  #        `--expr
  #            ¦--FUNCTION
  #            [...]
  extract_toplevel_names(
    data,
    select_fn = function(d) d$id[d$token == "FUNCTION"],
    depth = 3L,
    name_fn = function(d, id) {
      node1 <- node_parent(d, id, 2)
      n1_kids <- node_children(d, node1)
      if (length(n1_kids) != 3) {
        return(NA_character_)
      }

      node2 <- n1_kids[1]
      node3 <- node_children(d, node2)
      if (length(node3) != 1) {
        return(NA_character_)
      }

      row <- d[match(node3, d$id), ]
      if (!identical(row$token, "SYMBOL")) {
        return(NA_character_)
      }
      return(row$text)
    }
  )
}

toplevel_generics <- function(data) {
  # Find top-level setGeneric calls that have the following parse structure:
  #
  #   root (id = 0)
  #    `--expr                                      -> node1
  #        ¦--expr
  #        ¦   `--SYMBOL_FUNCTION_CALL (setGeneric)
  #        ¦--'('
  #        ¦--expr                                  -> node2
  #        ¦   `--STR_CONST: "{name}"               -> node3
  #        [...]
  #        `--')
  extract_toplevel_names(
    data,
    select_fn = function(d) {
      is_gen <- d$token == "SYMBOL_FUNCTION_CALL" & d$text == "setGeneric"
      data$id[is_gen]
    },
    depth = 3L,
    name_fn = function(d, id) {
      node1 <- node_parent(d, id, 2)
      n1_kids <- node_children(d, node1)
      if (length(n1_kids) < 4) {
        return(NA_character_)
      }

      node2 <- n1_kids[3]
      node3 <- node_children(d, node2)
      if (length(node3) != 1) {
        return(NA_character_)
      }

      row <- d[match(node3, d$id), ]
      if (!identical(row$token, "STR_CONST")) {
        return(NA_character_)
      }
      text <- row$text
      # For very long strings (1000 characters or more), `text` is replaced by
      # `[N chars quoted with '"']`, and `getParseText()` can be used to extract
      # it. In the context of this function, we're not expecting or interested
      # in anything that long.
      if (startsWith(text, "[")) {
        return(NA_character_)
      }
      # Drop outer quotes.
      return(substr(text, 2, nchar(text) - 1))
    }
  )
}

#' Extract top-level names
#'
#' @param data Data frame of parse data, as returned by `utils::getParseData()`.
#' @param select_fn Function to select nodes of interest from parse data. This
#'   is called with `data` and should return a character vector of node IDs.
#' @param depth Consider a node returned by `select_fn` to be "top-level" if it
#'   is at this depth relative to the root node.
#' @param name_fn Function that returns a name for the specified selected node.
#'   This function is called with two arguments, `data` and a top-level node ID
#'   returned by `select_fn`. If, on closer inspection, the passed node is not a
#'   node of interest, this function should return `NA_character_` to indicate
#'   that the value should be filtered out from the return value.
#'
#' @return A character vector of names.
#' @noRd
extract_toplevel_names <- function(data, select_fn, depth, name_fn) {
  nodes <- purrr::keep(
    select_fn(data),
    function(id) identical(node_parent(data, id, depth), 0L)
  )
  nms <- purrr::map_chr(nodes, function(id) name_fn(data, id))
  return(nms[!is.na(nms)])
}

node_parent <- function(data, id, n = 1) {
  while (n > 0) {
    id <- data$parent[match(id, data$id)]
    n <- n - 1
  }
  return(id)
}

node_children <- function(data, id) {
  data$id[data$parent == id]
}

kyleam added the bug Something isn't working label Aug 17, 2023

kyleam added this to the First Release milestone Aug 17, 2023

kyleam mentioned this issue Aug 17, 2023

Clean up before first release #7

Closed

7 tasks

This was referenced Aug 17, 2023

Release cleanup #20

Merged

Trac matrix #22

Merged

seth127 added the trace matrix refactor refactoring traceability matrix into its own function label Aug 17, 2023

seth127 modified the milestones: First Release, Next Aug 17, 2023

seth127 mentioned this issue Aug 18, 2023

Should creating traceability matrix be part of render? #14

Closed

barrettk added a commit that referenced this issue Aug 22, 2023

add handling for no exports found

287b373

- was the issue for Cairo 1.6-0 in #19

barrettk mentioned this issue Aug 24, 2023

Trac matrix fixes #31

Closed

seth127 mentioned this issue Aug 24, 2023

Traceability matrix refactor #30

Merged

kyleam mentioned this issue Aug 31, 2023

Get all functions #36

Closed

3 tasks

seth127 modified the milestones: Next, 0.2.0 release Sep 13, 2023

kyleam mentioned this issue Sep 13, 2023

traceability matrix fixes and extensions #42

Merged

kyleam closed this as completed in #42 Sep 14, 2023

kyleam mentioned this issue Sep 14, 2023

traceability matrix: remaining uncaught errors #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_traceability errors #19

add_traceability errors #19

kyleam commented Aug 17, 2023 •

edited

Loading

seth127 commented Aug 17, 2023

barrettk commented Aug 17, 2023

kyleam commented Aug 17, 2023

barrettk commented Aug 17, 2023 •

edited

Loading

barrettk commented Aug 18, 2023 •

edited

Loading

kyleam commented Aug 29, 2023

add_traceability errors #19

add_traceability errors #19

Comments

kyleam commented Aug 17, 2023 • edited Loading

seth127 commented Aug 17, 2023

barrettk commented Aug 17, 2023

kyleam commented Aug 17, 2023

barrettk commented Aug 17, 2023 • edited Loading

barrettk commented Aug 18, 2023 • edited Loading

kyleam commented Aug 29, 2023

kyleam commented Aug 17, 2023 •

edited

Loading

barrettk commented Aug 17, 2023 •

edited

Loading

barrettk commented Aug 18, 2023 •

edited

Loading