Get all functions #36

barrettk · 2023-08-31T19:38:33Z

Improved Readability

moved get_testing_dir down, and get_exports higher (in make-traceability-matrix.R) to improve the flow of building the trace matrix

Actual Changes

we now parse and load the functions in an isolated environment. Syntax errors that lead to parsing errors will trigger warnings if verbose is TRUE
add new tests for export patterns
refactor tests to better isolate "no docs" from syntax errors in the package
fix to tests/testthat/setup.R for testing export patterns (package wasn't being set up properly)

New Test Cases

get_all_functions() should catch multi-line function declarations
get_all_functions() should filter out nested functions
get_exports() should handle exportPattern syntax correctly

- this is needed for use with exportPatterns in the namespace file - Instead of regex or parsing file lines, source all functions into a new environment, and determine exports via ls(). This is the same pattern pkgload::load_all() uses for determining exports. - I picked apart a lot of the internal functions pkgload::load_all() uses, and only kept what we needed. - This code is also refactored to be in more desirable format and map which script the function came from

- this avoids running into dependency issues used outside of function declarations - fix return if there are no funciton lines

- moved get_testing_dir down, and get_exports higher to improve the flow of building the trace matrix - we now parse and load the functions in an isolated environment. Syntax errors that lead to parsing errors will trigger warnings if verbose is TRUE - add new tests for export patterns - refactor tests to better isolate "no docs" from syntax errors in the package

barrettk · 2023-08-31T20:25:16Z

I wasn't able to come up with a method for using getParseData to differentiate between nested vs non-nested functions in a reliable way. I think loading the functions in an isolated namespace is probably preferable anyways, as this mimics how devtools::load_all()/pkgload::load_all() work for determining exports.

kyleam · 2023-08-31T20:26:18Z

I wasn't able to come up with a method for using getParseData to differentiate between nested vs non-nested functions in a reliable way.

Did you review what I posted at #19 (comment)?

barrettk · 2023-08-31T20:30:16Z

Did you review what I posted at #19 (comment)?

Oh no I had not seen that; Ill give that a spin. That logic does look a bit complicated though: If I get the same results for the packages I test I would probably prefer the current method (thanks for taking a stab at it)

barrettk · 2023-09-01T14:43:49Z

FWIW I compared the two methods with the Exact package, where get_functions uses the method @kyleam developed.

> get_functions(pkg_source_path)
# A tibble: 50 × 2
   func                     code_file                   
   <chr>                    <chr>                       
 1 binom.CI                 R/binom.CI.R                
 2 binomialCode             R/binomialCode.R            
 3 checkPairedParam         R/checkPairedParam.R        
 4 checkParam               R/checkParam.R              
 5 chisq_TX                 R/chisq_TX.R                
 6 confInt                  R/confInt.R                 
 7 confIntPaired            R/confIntPaired.R           
 8 convertMethod            R/convertMethod.R           
 9 csmPairedTemp            R/csmPairedTemp.R           
10 csmPairedTemp2sidedDelta R/csmPairedTemp2sidedDelta.R
# ℹ 40 more rows
# ℹ Use `print(n = ...)` to see more rows
> get_all_functions(pkg_source_path)
# A tibble: 48 × 2
   func                     code_file                   
   <chr>                    <chr>                       
 1 binom.CI                 R/binom.CI.R                
 2 binomialCode             R/binomialCode.R            
 3 checkPairedParam         R/checkPairedParam.R        
 4 checkParam               R/checkParam.R              
 5 chisq_TX                 R/chisq_TX.R                
 6 confInt                  R/confInt.R                 
 7 confIntPaired            R/confIntPaired.R           
 8 convertMethod            R/convertMethod.R           
 9 csmPairedTemp            R/csmPairedTemp.R           
10 csmPairedTemp2sidedDelta R/csmPairedTemp2sidedDelta.R

> funcs1 <- get_all_functions(pkg_source_path)$func
> funcs2 <- get_functions(pkg_source_path)$func

> setdiff(funcs2, funcs1)
[1] ".onLoad"   ".onAttach"
> setdiff(funcs1, funcs2)
character(0)

The difference can be attributed to loading the functions in a separate namespace. I dont fully understand what's going on yet, but you can see the difference if you look at methods::setPackageName. This basically assigns the package name to the environment, and is a wrapper for assign(".packageName", name, envir = env). Interestingly, if I was to query this environment (ls(envir = env)), I wouldnt find .packageName in it. Functions that start with . seem to be handled differently, and assumed to be related to the package or something. Will look into it more, but it seems like it would be preferable to ignore these functions. Im hoping it's not all functions that start with ., but rather some more nuanced, recognized convention for ignoring these (but perhaps functions starting with . is reserved for a particular type of function/object?)

kyleam · 2023-09-01T14:46:43Z

if I was to query this environment (ls(envir = env)), I wouldnt find .packageName in it. Functions that start with . seem to be handled differently, and assumed to be related to the package or something

I just skimmed what you wrote, but you're likely overlooking the all.names argument of ls:

all.names: a logical value.  If ‘TRUE’, all object names are returned.
          If ‘FALSE’, names which begin with a ‘.’ are omitted.

barrettk · 2023-09-01T14:50:52Z

I just skimmed what you wrote, but you're likely overlooking the all.names argument of ls

Yup - good catch. It seems pkgload::load_all() uses the default of all.names = FALSE. Any thoughts on which is preferable (ignoring vs not ignoring)?

kyleam · 2023-09-01T14:59:50Z

Any thoughts on which is preferable (ignoring vs not ignoring)?

I haven't looked at this PR yet and don't have the context, but, assuming my understanding is correct that you're just getting a list of candidates to prune based on the export list (including patterns), I don't see why you wouldn't follow what loadNamespace does and use all.names = TRUE:

        for (p in nsInfo$exportPatterns)
            exports <- c(ls(env, pattern = p, all.names = TRUE), exports)

seth127 · 2023-09-01T17:55:03Z

I looked through this a bit today. I added the three "New Test Cases" bullets to the description. If there are other (from #19 or otherwise) feel free to add them there, and check things off as they get implemented/reviewed.

In terms of the approach, it seems like both this one and the one in #19 (comment) handle the cases we're concerned about. I don't have a strong feeling one way or the other, but I guess I lean towards this one, in part because it actually evaluates the code in the way the loading the package does. This just feels more likely to catch weird cases, but I may be off on that. Also, it has the fringe benefit of catching non-functions (some of which end up being exported) and maps them to the script where they're defined. To be honest, I'm not 100% sure we want to include these, but at this point I think it's the right move.

All of that said, @kyleam can you give this a real review at some point next week? If we can get it in good shape by a week or so from now, I think that gives us plenty of time to use it for the next MPN snapshot.

Let me know if there are other questions and we can connect on this later if needed.

seth127 · 2023-09-01T18:05:25Z

Also, it seems like this builds on top of (and replaces some of) what's in #31. @barrettk and @kyleam you may want to discuss whether it's easier to look at these changes, or just merge this into #31 and then review it there. I have no opinion, I just wanted to put it on your radar, before you get too deep into code review.

kyleam · 2023-09-01T19:47:51Z

R/make-traceability-matrix.R

+      if(isTRUE(verbose)) warning(warningCondition(w))
+      w
+    },
+    message = function(m) m


Please unlearn this pattern. It will only bring unfortunate bugs.

any warning halts covr run #11

f6b2efe (coverage: do not catch warnings when running covr, 2023-08-14)

That shouldnt impact covr though? Don't see how that issue is relevant unless we ran this package through covr and it triggered a warning

Don't see how that issue is relevant unless we ran this package through covr and it triggered a warning

covr isn't a relevant piece. See the demo in the issue I linked (gh-11). tryCatch(..., {message,warning} = ...) is a bad pattern because it halts the execution on the first warning or message received.

If you want to act on a message or warning, you can instead use withCallingHandlers and then resume execution. Depending on the case, you may find it more convenient to use purrr::quietly, which uses withCallingHandlers underneath to handle messages and warnings and also has logic to capture standard output.

…ct.org/doc/manuals/r-devel/R-ints.html) (meant to remove .__T__ and .__global__) - update docs

- Will only warn for parsing errors (eval errors/warnings will be coerced to messages when verbose = TRUE (otherwise swallowed)). - could be worth looking at the test `make_traceability_matrix - cant link exports integration test` - Not sure if this^ is how we want messages/warnings to display for various issues (with verbose = TRUE)

kyleam

[ At this point, I'm going to leave be any inline comments and stick to a high-level review. ]

As I understand it, the approach here is

parse each code file under R/
for each file, evaluate each top-level expression to get the top-level assignments

This is sys.source in spirit, with a key difference: if evaluating an expression leads to an error, the code here tries to evaluate the remaining expressions.
determine the list of assignments for a given file by comparing the names in the shared evaluation environment before and after evaluating a file's expressions

The main appeal of that approach is that you don't have to account for all the different ways that a top-level assignment could be made (e.g., it would handle an assignment tucked under an if and various function that could assign to the environment).

The downside is that you add the complexity of evaluation. If you get some details wrong, you miss out on the expressions that fail and downstream expressions that depend on the failing expression. But due to the 'evaluate each expression, catching failures' approach you can still get the bulk of the typical assignments.

That complexity affects the tests too. If you just work with the parse data, the logic is self-contained and the test cases are just input files/strings. If you add evaluation to the mix, you need to worry about the evaluation context.

Issues with the current state

There are two high-level issues with the current code:

it uses .BaseNamespaceEnv as the parent environment for evaluation. It's missing the imports:<package> environment that usually comes between for a real package namespace.

In the most common cases, that will just mean some expressions fail to evaluate, though it'd also be possible for an expression to lead to a different assignment due to different behavior between a name that's in .BaseNamespaceEnv and imports:<package>.
it doesn't consider Collate. For packages that specify that because they require the specified load order, some expressions will fail, missing some assignments.

Path forward

In both the cases above, the main consequence is missing somewhat tricky cases, while still catching plain function assignments of interest. However, if we're deciding to evaluate, I think we ought to get these details right. We could set up the environment to include imports and also honor Collate (both pkgload and core R can be referenced here for guidance).

Otherwise, it seems to me the parse data is good enough (and catches some of assignments that would be missed due to the issues above). Here's another example of how we could use the parse data without evaluation. Part of the simplification comes from changing the goal from "all top-level functions" to "all top-level assignments".

get_toplevel_assignments <- function(file) {
  exprs <- tryCatch(parse(file), error = identity)
  if (inherits(exprs, "error")) {
    warning("Failed to parse ", file, ": ", conditionMessage(exprs))
    return(character())
  }

  calls <- purrr::keep(as.list(exprs), function(e) {
    if (is.call(e)) {
      op <- as.character(e[[1]])
      return(length(op) == 1 && op %in% c("<-", "=", "setGeneric"))
    }
    return(FALSE)
  })
  lhs <- purrr::map(calls, function(e) {
    name <- as.character(e[[2]])
    if (length(name) == 1) {
      return(name)
    }
  })

  return(unlist(lhs) %||% character())
}

In summary, I'm fine evaluating or using the parse data, but I think the above issues should be resolved if we're going to evaluate.

@kyleam

…assignments. * This is based on the code and approach proposed by @kyleam in a comment on PR #36 * Uses parse() instead of trying to load the package namespace

seth127 · 2023-09-08T19:26:07Z

I'm coming around to the parse() approach here. Particularly because, per @kyleam comment above, it looks like the parse() approach gets simpler and the ls() approach gets more complicated, if we want achieve our desired behavior.

I started a branch off of this to try it out, and it seems to be working as desired. See draft PR #41 for details.

kyleam · 2023-09-14T20:11:09Z

This was superseded by #42, with the non-eval related changes incorporated there.

barrettk added 5 commits August 28, 2023 15:08

surround eval in tryCatch

f3edc05

- this avoids running into dependency issues used outside of function declarations - fix return if there are no funciton lines

subsitute assign for methods::setPackageName to avoid adding import

c9afcb4

substitute \(x) for function(x) in tests for R 4.0 in drone

2ce3573

barrettk requested a review from seth127 August 31, 2023 20:17

barrettk requested a review from kyleam August 31, 2023 20:25

improved message/error handling

0cdab3c

kyleam reviewed Sep 1, 2023

View reviewed changes

barrettk added 2 commits September 1, 2023 16:04

remove internal R objects. See additional [info](https://cran.r-proje…

a038227

…ct.org/doc/manuals/r-devel/R-ints.html) (meant to remove .__T__ and .__global__) - update docs

barrettk requested a review from kyleam September 1, 2023 21:06

kyleam requested changes Sep 6, 2023

View reviewed changes

seth127 mentioned this pull request Sep 8, 2023

get_toplevel_assignments - use parse approach instead of pkgload #41

Closed

kyleam mentioned this pull request Sep 13, 2023

traceability matrix fixes and extensions #42

Merged

kyleam closed this Sep 14, 2023

kyleam deleted the get_all_functions branch September 14, 2023 20:12

barrettk mentioned this pull request Oct 17, 2023

Release/0.2.0 #49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get all functions #36

Get all functions #36

barrettk commented Aug 31, 2023 •

edited

Loading

barrettk commented Aug 31, 2023 •

edited

Loading

kyleam commented Aug 31, 2023

barrettk commented Aug 31, 2023

barrettk commented Sep 1, 2023

kyleam commented Sep 1, 2023

barrettk commented Sep 1, 2023

kyleam commented Sep 1, 2023

seth127 commented Sep 1, 2023

seth127 commented Sep 1, 2023

kyleam Sep 1, 2023

barrettk Sep 1, 2023

kyleam Sep 6, 2023

kyleam left a comment

seth127 commented Sep 8, 2023

kyleam commented Sep 14, 2023

Get all functions #36

Get all functions #36

Conversation

barrettk commented Aug 31, 2023 • edited Loading

Improved Readability

Actual Changes

New Test Cases

barrettk commented Aug 31, 2023 • edited Loading

kyleam commented Aug 31, 2023

barrettk commented Aug 31, 2023

barrettk commented Sep 1, 2023

kyleam commented Sep 1, 2023

barrettk commented Sep 1, 2023

kyleam commented Sep 1, 2023

seth127 commented Sep 1, 2023

seth127 commented Sep 1, 2023

kyleam Sep 1, 2023

Choose a reason for hiding this comment

barrettk Sep 1, 2023

Choose a reason for hiding this comment

kyleam Sep 6, 2023

Choose a reason for hiding this comment

kyleam left a comment

Choose a reason for hiding this comment

Issues with the current state

Path forward

seth127 commented Sep 8, 2023

kyleam commented Sep 14, 2023

barrettk commented Aug 31, 2023 •

edited

Loading

barrettk commented Aug 31, 2023 •

edited

Loading