Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get all functions #36

Closed
wants to merge 8 commits into from
Closed

Get all functions #36

wants to merge 8 commits into from

Conversation

barrettk
Copy link
Collaborator

@barrettk barrettk commented Aug 31, 2023

Improved Readability

  • moved get_testing_dir down, and get_exports higher (in make-traceability-matrix.R) to improve the flow of building the trace matrix

Actual Changes

  • we now parse and load the functions in an isolated environment. Syntax errors that lead to parsing errors will trigger warnings if verbose is TRUE
  • add new tests for export patterns
  • refactor tests to better isolate "no docs" from syntax errors in the package
  • fix to tests/testthat/setup.R for testing export patterns (package wasn't being set up properly)

New Test Cases

  • get_all_functions() should catch multi-line function declarations
  • get_all_functions() should filter out nested functions
  • get_exports() should handle exportPattern syntax correctly

 - this is needed for use with exportPatterns in the namespace file
 - Instead of regex or parsing file lines, source all functions into a new environment, and determine exports via ls(). This is the same pattern pkgload::load_all() uses for determining exports.
 - I picked apart a lot of the internal functions pkgload::load_all() uses, and only kept what we needed.
 - This code is also refactored to be in more desirable format and map which script the function came from
 - this avoids running into dependency issues used outside of function declarations
 - fix return if there are no funciton lines
 - moved get_testing_dir down, and get_exports higher to improve the flow of building the trace matrix
 - we now parse and load the functions in an isolated environment. Syntax errors that lead to parsing errors will trigger warnings if verbose is TRUE
 - add new tests for export patterns
 - refactor tests to better isolate "no docs" from syntax errors in the package
@barrettk barrettk requested a review from seth127 August 31, 2023 20:17
@barrettk
Copy link
Collaborator Author

barrettk commented Aug 31, 2023

I wasn't able to come up with a method for using getParseData to differentiate between nested vs non-nested functions in a reliable way. I think loading the functions in an isolated namespace is probably preferable anyways, as this mimics how devtools::load_all()/pkgload::load_all() work for determining exports.

@barrettk barrettk requested a review from kyleam August 31, 2023 20:25
@kyleam
Copy link
Collaborator

kyleam commented Aug 31, 2023

I wasn't able to come up with a method for using getParseData to differentiate between nested vs non-nested functions in a reliable way.

Did you review what I posted at #19 (comment)?

@barrettk
Copy link
Collaborator Author

Did you review what I posted at #19 (comment)?

Oh no I had not seen that; Ill give that a spin. That logic does look a bit complicated though: If I get the same results for the packages I test I would probably prefer the current method (thanks for taking a stab at it)

@barrettk
Copy link
Collaborator Author

barrettk commented Sep 1, 2023

FWIW I compared the two methods with the Exact package, where get_functions uses the method @kyleam developed.

> get_functions(pkg_source_path)
# A tibble: 50 × 2
   func                     code_file                   
   <chr>                    <chr>                       
 1 binom.CI                 R/binom.CI.R                
 2 binomialCode             R/binomialCode.R            
 3 checkPairedParam         R/checkPairedParam.R        
 4 checkParam               R/checkParam.R              
 5 chisq_TX                 R/chisq_TX.R                
 6 confInt                  R/confInt.R                 
 7 confIntPaired            R/confIntPaired.R           
 8 convertMethod            R/convertMethod.R           
 9 csmPairedTemp            R/csmPairedTemp.R           
10 csmPairedTemp2sidedDelta R/csmPairedTemp2sidedDelta.R
# ℹ 40 more rows
# ℹ Use `print(n = ...)` to see more rows
> get_all_functions(pkg_source_path)
# A tibble: 48 × 2
   func                     code_file                   
   <chr>                    <chr>                       
 1 binom.CI                 R/binom.CI.R                
 2 binomialCode             R/binomialCode.R            
 3 checkPairedParam         R/checkPairedParam.R        
 4 checkParam               R/checkParam.R              
 5 chisq_TX                 R/chisq_TX.R                
 6 confInt                  R/confInt.R                 
 7 confIntPaired            R/confIntPaired.R           
 8 convertMethod            R/convertMethod.R           
 9 csmPairedTemp            R/csmPairedTemp.R           
10 csmPairedTemp2sidedDelta R/csmPairedTemp2sidedDelta.R
> funcs1 <- get_all_functions(pkg_source_path)$func
> funcs2 <- get_functions(pkg_source_path)$func

> setdiff(funcs2, funcs1)
[1] ".onLoad"   ".onAttach"
> setdiff(funcs1, funcs2)
character(0)

The difference can be attributed to loading the functions in a separate namespace. I dont fully understand what's going on yet, but you can see the difference if you look at methods::setPackageName. This basically assigns the package name to the environment, and is a wrapper for assign(".packageName", name, envir = env). Interestingly, if I was to query this environment (ls(envir = env)), I wouldnt find .packageName in it. Functions that start with . seem to be handled differently, and assumed to be related to the package or something. Will look into it more, but it seems like it would be preferable to ignore these functions. Im hoping it's not all functions that start with ., but rather some more nuanced, recognized convention for ignoring these (but perhaps functions starting with . is reserved for a particular type of function/object?)

@kyleam
Copy link
Collaborator

kyleam commented Sep 1, 2023

if I was to query this environment (ls(envir = env)), I wouldnt find .packageName in it. Functions that start with . seem to be handled differently, and assumed to be related to the package or something

I just skimmed what you wrote, but you're likely overlooking the all.names argument of ls:

all.names: a logical value.  If ‘TRUE’, all object names are returned.
          If ‘FALSE’, names which begin with a ‘.’ are omitted.

@barrettk
Copy link
Collaborator Author

barrettk commented Sep 1, 2023

I just skimmed what you wrote, but you're likely overlooking the all.names argument of ls

Yup - good catch. It seems pkgload::load_all() uses the default of all.names = FALSE. Any thoughts on which is preferable (ignoring vs not ignoring)?

@kyleam
Copy link
Collaborator

kyleam commented Sep 1, 2023

Any thoughts on which is preferable (ignoring vs not ignoring)?

I haven't looked at this PR yet and don't have the context, but, assuming my understanding is correct that you're just getting a list of candidates to prune based on the export list (including patterns), I don't see why you wouldn't follow what loadNamespace does and use all.names = TRUE:

        for (p in nsInfo$exportPatterns)
            exports <- c(ls(env, pattern = p, all.names = TRUE), exports)

@seth127
Copy link
Collaborator

seth127 commented Sep 1, 2023

I looked through this a bit today. I added the three "New Test Cases" bullets to the description. If there are other (from #19 or otherwise) feel free to add them there, and check things off as they get implemented/reviewed.

In terms of the approach, it seems like both this one and the one in #19 (comment) handle the cases we're concerned about. I don't have a strong feeling one way or the other, but I guess I lean towards this one, in part because it actually evaluates the code in the way the loading the package does. This just feels more likely to catch weird cases, but I may be off on that. Also, it has the fringe benefit of catching non-functions (some of which end up being exported) and maps them to the script where they're defined. To be honest, I'm not 100% sure we want to include these, but at this point I think it's the right move.

All of that said, @kyleam can you give this a real review at some point next week? If we can get it in good shape by a week or so from now, I think that gives us plenty of time to use it for the next MPN snapshot.

Let me know if there are other questions and we can connect on this later if needed.

@seth127
Copy link
Collaborator

seth127 commented Sep 1, 2023

Also, it seems like this builds on top of (and replaces some of) what's in #31. @barrettk and @kyleam you may want to discuss whether it's easier to look at these changes, or just merge this into #31 and then review it there. I have no opinion, I just wanted to put it on your radar, before you get too deep into code review.

if(isTRUE(verbose)) warning(warningCondition(w))
w
},
message = function(m) m
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please unlearn this pattern. It will only bring unfortunate bugs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldnt impact covr though? Don't see how that issue is relevant unless we ran this package through covr and it triggered a warning

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see how that issue is relevant unless we ran this package through covr and it triggered a warning

covr isn't a relevant piece. See the demo in the issue I linked (gh-11). tryCatch(..., {message,warning} = ...) is a bad pattern because it halts the execution on the first warning or message received.

If you want to act on a message or warning, you can instead use withCallingHandlers and then resume execution. Depending on the case, you may find it more convenient to use purrr::quietly, which uses withCallingHandlers underneath to handle messages and warnings and also has logic to capture standard output.

 - Will only warn for parsing errors (eval errors/warnings will be coerced to messages when verbose = TRUE (otherwise swallowed)).

 - could be worth looking at the test `make_traceability_matrix - cant link exports integration test`

 - Not sure if this^ is how we want messages/warnings to display for various issues (with verbose = TRUE)
@barrettk barrettk requested a review from kyleam September 1, 2023 21:06
Copy link
Collaborator

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ At this point, I'm going to leave be any inline comments and stick to a high-level review. ]

As I understand it, the approach here is

  • parse each code file under R/

  • for each file, evaluate each top-level expression to get the top-level assignments

    This is sys.source in spirit, with a key difference: if evaluating an expression leads to an error, the code here tries to evaluate the remaining expressions.

  • determine the list of assignments for a given file by comparing the names in the shared evaluation environment before and after evaluating a file's expressions

The main appeal of that approach is that you don't have to account for all the different ways that a top-level assignment could be made (e.g., it would handle an assignment tucked under an if and various function that could assign to the environment).

The downside is that you add the complexity of evaluation. If you get some details wrong, you miss out on the expressions that fail and downstream expressions that depend on the failing expression. But due to the 'evaluate each expression, catching failures' approach you can still get the bulk of the typical assignments.

That complexity affects the tests too. If you just work with the parse data, the logic is self-contained and the test cases are just input files/strings. If you add evaluation to the mix, you need to worry about the evaluation context.

Issues with the current state

There are two high-level issues with the current code:

  • it uses .BaseNamespaceEnv as the parent environment for evaluation. It's missing the imports:<package> environment that usually comes between for a real package namespace.

    In the most common cases, that will just mean some expressions fail to evaluate, though it'd also be possible for an expression to lead to a different assignment due to different behavior between a name that's in .BaseNamespaceEnv and imports:<package>.

  • it doesn't consider Collate. For packages that specify that because they require the specified load order, some expressions will fail, missing some assignments.

Path forward

In both the cases above, the main consequence is missing somewhat tricky cases, while still catching plain function assignments of interest. However, if we're deciding to evaluate, I think we ought to get these details right. We could set up the environment to include imports and also honor Collate (both pkgload and core R can be referenced here for guidance).

Otherwise, it seems to me the parse data is good enough (and catches some of assignments that would be missed due to the issues above). Here's another example of how we could use the parse data without evaluation. Part of the simplification comes from changing the goal from "all top-level functions" to "all top-level assignments".

get_toplevel_assignments <- function(file) {
  exprs <- tryCatch(parse(file), error = identity)
  if (inherits(exprs, "error")) {
    warning("Failed to parse ", file, ": ", conditionMessage(exprs))
    return(character())
  }

  calls <- purrr::keep(as.list(exprs), function(e) {
    if (is.call(e)) {
      op <- as.character(e[[1]])
      return(length(op) == 1 && op %in% c("<-", "=", "setGeneric"))
    }
    return(FALSE)
  })
  lhs <- purrr::map(calls, function(e) {
    name <- as.character(e[[2]])
    if (length(name) == 1) {
      return(name)
    }
  })

  return(unlist(lhs) %||% character())
}

In summary, I'm fine evaluating or using the parse data, but I think the above issues should be resolved if we're going to evaluate.

seth127 added a commit that referenced this pull request Sep 8, 2023
…assignments.

* This is based on the code and approach proposed by @kyleam in a comment on PR #36
* Uses parse() instead of trying to load the package namespace
@seth127
Copy link
Collaborator

seth127 commented Sep 8, 2023

I'm coming around to the parse() approach here. Particularly because, per @kyleam comment above, it looks like the parse() approach gets simpler and the ls() approach gets more complicated, if we want achieve our desired behavior.

I started a branch off of this to try it out, and it seems to be working as desired. See draft PR #41 for details.

@kyleam
Copy link
Collaborator

kyleam commented Sep 14, 2023

This was superseded by #42, with the non-eval related changes incorporated there.

@kyleam kyleam closed this Sep 14, 2023
@kyleam kyleam deleted the get_all_functions branch September 14, 2023 20:12
@barrettk barrettk mentioned this pull request Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants