Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_traceability errors #19

Closed
kyleam opened this issue Aug 17, 2023 · 6 comments · Fixed by #42
Closed

add_traceability errors #19

kyleam opened this issue Aug 17, 2023 · 6 comments · Fixed by #42
Labels
bug Something isn't working trace matrix refactor refactoring traceability matrix into its own function
Milestone

Comments

@kyleam
Copy link
Collaborator

kyleam commented Aug 17, 2023

[ cc: @seth127 @barrettk ]

I started to run render_scorecard with add_traceability = TRUE and hit into many errors. Before aborting, I let it run for a bit (ended up being 79 packages) to hopefully capture a good representation of the different types of errors.

Here are the different classes of errors that I spot. The packages listed under each class are just what happened to be in that first set of 79 packages.

  • no testthat directory found

    no `testthat` directory found at /tmp/Rtmp3DnehF/SCORECARD_1310e2ddecd/AER/tests
    

    packages: AER (1.2-10), BB (2019.10-1), DEoptimR (1.0-14), DEoptim (2.2-8), DT (0.28), DistributionUtils (0.6-0), Formula (1.2-5), GPArotation (2023.3-1), KernSmooth (2.23-22), MASS (7.3-60), MatrixModels (0.5-2), Matrix (1.6-0), PKI (0.1-12), PerformanceAnalytics (2.0.4)

  • no testing directory found

    no testing directory found at /tmp/Rtmp3DnehF/SCORECARD_131047538e42/ATE
    

    packages: ATE (0.2.0), AUC (0.3.2), BiasedUrn (2.0.10), DiagrammeRsvg (0.1), DiceDesign (1.9), Epi (2.47.1), FME (1.3.6.3), FNN (1.1.3.2), GA (3.2.3), GenSA (1.1.9), HDInterval (0.2.4), JM (1.5-2), JMbayes (0.8-85), Lahman (11.0-0), MCMCpack (1.6-3), MatchIt (4.5.4), NMF (0.26), PKPDdatasets (0.1.0.1494604782), PKPDmodels (0.3.2), PK (1.3-5)

  • basename error

    ℹ In argument: `code_file = paste0("R/", basename(.data$code_file))`.
    Caused by error in `basename()`:
    ! a character vector argument expected
    

    packages: ALEPlot (1.1), Cairo (1.6-0), CompQuadForm (1.4.3), Exact (3.2), KMsurv (0.1-5), Kendall (2.2.1), LearnBayes (2.15.1), NonCompart (0.6.0), PEIP (2.2-3), PFIM (5.0), PKreport (1.5)

  • .init error

    Must supply `.init` when `.x` is empty.
    

    packages: AsioHeaders (1.22.1-2), BH (1.81.0-1)

  • invalid regular expression

    ℹ In index: 1.
    Caused by error in `grep()`:
    ! invalid regular expression '^\s*%(]%\s*(<\-|=)\s*function\s*.*|^\s*setGeneric\s*\(\s*["|']%(]%["|'].*', reason 'Missing ')''
    

    packages: DescTools (0.99.49), Hmisc (4.7-2)

I'm putting this in "First Release" milestone given that gh-7 has "[s]et add_traceability = TRUE to the default argument".

@kyleam kyleam added the bug Something isn't working label Aug 17, 2023
@kyleam kyleam added this to the First Release milestone Aug 17, 2023
@kyleam kyleam mentioned this issue Aug 17, 2023
7 tasks
This was referenced Aug 17, 2023
@seth127
Copy link
Collaborator

seth127 commented Aug 17, 2023

Good catches @kyleam, and yes I think we should address this before the first release. At the very least, we should capture these errors and print a "could not render Traceability Matrix" in the doc, instead of blowing up.

Let's chat about some of this and then document the outcome of that in this issue. Thanks.

@barrettk
Copy link
Collaborator

I think we made an incorrect assumption that the man files should also map to an R file as well. The MASS package triggers a lot of warnings for not finding R scripts, but the man files are all pointing to RDS files (they don't actually point to it, but I figured out that that's where they came from).

Cant decide if we should just ditch the warning, or have a verbose argument or something that we only use for our packages. I think these warnings could be helpful for catching issues with our packages (caught one for pmforest a while back), but it's definitely problematic for older R packages.

In package `MASS`, could not find an R script associated with man file: abbey.Rd
In package `MASS`, could not find an R script associated with man file: accdeaths.Rd
In package `MASS`, could not find an R script associated with man file: addterm.Rd
In package `MASS`, could not find an R script associated with man file: Aids2.Rd
In package `MASS`, could not find an R script associated with man file: Animals.Rd
In package `MASS`, could not find an R script associated with man file: anorexia.Rd
In package `MASS`, could not find an R script associated with man file: anova.negbin.Rd
In package `MASS`, could not find an R script associated with man file: area.Rd
In package `MASS`, could not find an R script associated with man file: bacteria.Rd
In package `MASS`, could not find an R script associated with man file: bandwidth.nrd.Rd
In package `MASS`, could not find an R script associated with man file: bcv.Rd
In package `MASS`, could not find an R script associated with man file: beav1.Rd
In package `MASS`, could not find an R script associated with man file: beav2.Rd
In package `MASS`, could not find an R script associated with man file: biopsy.Rd
In package `MASS`, could not find an R script associated with man file: birthwt.Rd
In package `MASS`, could not find an R script associated with man file: Boston.Rd
In package `MASS`, could not find an R script associated with man file: boxcox.Rd

... keeps going ...

In package `MASS`, the R scripts (R/add.R, R/boxcox.R, R/contr.sdif.R, R/corresp.R, R/cov.trob.R, R/dose.p.R, 
R/eqscplot.R, R/fitdistr.R, R/fractions.R, R/gamma.shape.R, R/glmmPQL.R, R/hist.scott.R, R/huber.R, R/hubers.R, 
R/isoMDS.R, R/kde2d.R, R/lda.R, R/lm.ridge.R, R/loglm.R, R/logtrans.R, R/lqs.R, R/mca.R, R/misc.R, R/neg.bin.R,
 R/negbin.R, R/negexp.R, R/parcoord.R, R/qda.R, R/rlm.R, R/sammon.R, R/stdres.R, R/ucv.R, R/write.matrix.R) 
are missing documentation for the following exports: 
dropterm
boxcox
contr.sdif

... keeps going ...

@kyleam
Copy link
Collaborator Author

kyleam commented Aug 17, 2023

incorrect assumption that the man files should also map to an R file as well.

Perhaps that's behind a lot (or even all?) of the noise I reported in gh-21.

@barrettk
Copy link
Collaborator

barrettk commented Aug 17, 2023

Perhaps that's behind a lot (or even all?) of the noise I reported in #21.

Yes it definitely is. I am currently fixing this with a verbose argument that defaults to FALSE.

@seth127 seth127 added the trace matrix refactor refactoring traceability matrix into its own function label Aug 17, 2023
@seth127 seth127 modified the milestones: First Release, Next Aug 17, 2023
@barrettk
Copy link
Collaborator

barrettk commented Aug 18, 2023

Just FYI The first two issues were addressed in #22

Edit: The remainder of these issues, and some other edge cases are addressed in #31

barrettk added a commit that referenced this issue Aug 22, 2023
…defined on the next line:

 - previously, functions defined as follows could not be captured because they were separated into multiple elements:
 hello <-
        function(x){}
 - by collapsing to a single string, the optional white space now works for new lines as well. This was the core issue for the `Exact` package basename error in #19.
 - additional testing of this is required.
barrettk added a commit that referenced this issue Aug 22, 2023
…hat were written by hand, and not generated via roxygen. In these cases the man files did not reference the R script the function originated in.

 - We could just say these packages dont have documentation/were not found, but I added support for at least attempting to find the scripts.

 - In this commit, I look for the function name as an R file if none were found. In other words, man/my_function.Rd would lead to a search for R/my_function.R. We could further improve this searching, but Im sure we dont want to go crazy with it either. This is a minimal patch that works for the `Exact (3.2)` mentioned in #19. (basename error)
barrettk added a commit that referenced this issue Aug 22, 2023
 - was the issue for Cairo 1.6-0 in #19
@kyleam
Copy link
Collaborator Author

kyleam commented Aug 29, 2023

@barrettk I know you're leaning toward using an eval-based approach for a getting a set of top-level functions declared in a file. If we wanted to avoid evaluating, I think we could get pretty far without diving too deep into the parse data. In case it's a useful start, here's an example. This could miss some unusual metaprogramming-based declarations and might need to be extended for other constructs of interest, but it may be good enough for our purposes.

code
#' Return top-level functions and generics from files
#'
#' @param files Character vector of files to parse and search.
#'
#' @return A list the same length as `files`. Each element is the set of
#'   functions and generics found in the given file. If `files` is a named
#'   vector, carry those over to the return value. Otherwise set the names to
#'   the relative file names.
#' @noRd
get_toplevel_declarations <- function(files) {
  if (is.null(names(files))) {
    names(files) <- fs::path_rel(files)
  }
  purrr::map(
    files,
    function(f) {
      pd <- tryCatch(parse(file = f, keep.source = TRUE), error = identity)
      if (inherits(pd, "error")) {
        warning("Failed to parse ", f, ": ", conditionMessage(pd))
        return(character())
      }
      data <- utils::getParseData(pd)
      c(
        toplevel_functions(data),
        toplevel_generics(data)
      )
    }
  )
}

toplevel_functions <- function(data) {
  # Find top-level functions that have the following parse structure:
  #
  #   root (id = 0)
  #    `--expr                             -> node1
  #        ¦--expr                         -> node2
  #        ¦   `--SYMBOL: {name}           -> node3
  #        ¦--LEFT_ASSIGN or EQ_ASSIGN
  #        `--expr
  #            ¦--FUNCTION
  #            [...]
  extract_toplevel_names(
    data,
    select_fn = function(d) d$id[d$token == "FUNCTION"],
    depth = 3L,
    name_fn = function(d, id) {
      node1 <- node_parent(d, id, 2)
      n1_kids <- node_children(d, node1)
      if (length(n1_kids) != 3) {
        return(NA_character_)
      }

      node2 <- n1_kids[1]
      node3 <- node_children(d, node2)
      if (length(node3) != 1) {
        return(NA_character_)
      }

      row <- d[match(node3, d$id), ]
      if (!identical(row$token, "SYMBOL")) {
        return(NA_character_)
      }
      return(row$text)
    }
  )
}

toplevel_generics <- function(data) {
  # Find top-level setGeneric calls that have the following parse structure:
  #
  #   root (id = 0)
  #    `--expr                                      -> node1
  #        ¦--expr
  #        ¦   `--SYMBOL_FUNCTION_CALL (setGeneric)
  #        ¦--'('
  #        ¦--expr                                  -> node2
  #        ¦   `--STR_CONST: "{name}"               -> node3
  #        [...]
  #        `--')
  extract_toplevel_names(
    data,
    select_fn = function(d) {
      is_gen <- d$token == "SYMBOL_FUNCTION_CALL" & d$text == "setGeneric"
      data$id[is_gen]
    },
    depth = 3L,
    name_fn = function(d, id) {
      node1 <- node_parent(d, id, 2)
      n1_kids <- node_children(d, node1)
      if (length(n1_kids) < 4) {
        return(NA_character_)
      }

      node2 <- n1_kids[3]
      node3 <- node_children(d, node2)
      if (length(node3) != 1) {
        return(NA_character_)
      }

      row <- d[match(node3, d$id), ]
      if (!identical(row$token, "STR_CONST")) {
        return(NA_character_)
      }
      text <- row$text
      # For very long strings (1000 characters or more), `text` is replaced by
      # `[N chars quoted with '"']`, and `getParseText()` can be used to extract
      # it. In the context of this function, we're not expecting or interested
      # in anything that long.
      if (startsWith(text, "[")) {
        return(NA_character_)
      }
      # Drop outer quotes.
      return(substr(text, 2, nchar(text) - 1))
    }
  )
}

#' Extract top-level names
#'
#' @param data Data frame of parse data, as returned by `utils::getParseData()`.
#' @param select_fn Function to select nodes of interest from parse data. This
#'   is called with `data` and should return a character vector of node IDs.
#' @param depth Consider a node returned by `select_fn` to be "top-level" if it
#'   is at this depth relative to the root node.
#' @param name_fn Function that returns a name for the specified selected node.
#'   This function is called with two arguments, `data` and a top-level node ID
#'   returned by `select_fn`. If, on closer inspection, the passed node is not a
#'   node of interest, this function should return `NA_character_` to indicate
#'   that the value should be filtered out from the return value.
#'
#' @return A character vector of names.
#' @noRd
extract_toplevel_names <- function(data, select_fn, depth, name_fn) {
  nodes <- purrr::keep(
    select_fn(data),
    function(id) identical(node_parent(data, id, depth), 0L)
  )
  nms <- purrr::map_chr(nodes, function(id) name_fn(data, id))
  return(nms[!is.na(nms)])
}

node_parent <- function(data, id, n = 1) {
  while (n > 0) {
    id <- data$parent[match(id, data$id)]
    n <- n - 1
  }
  return(id)
}

node_children <- function(data, id) {
  data$id[data$parent == id]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working trace matrix refactor refactoring traceability matrix into its own function
Projects
None yet
3 participants