-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get all functions #36
Conversation
- this is needed for use with exportPatterns in the namespace file - Instead of regex or parsing file lines, source all functions into a new environment, and determine exports via ls(). This is the same pattern pkgload::load_all() uses for determining exports. - I picked apart a lot of the internal functions pkgload::load_all() uses, and only kept what we needed. - This code is also refactored to be in more desirable format and map which script the function came from
- this avoids running into dependency issues used outside of function declarations - fix return if there are no funciton lines
- moved get_testing_dir down, and get_exports higher to improve the flow of building the trace matrix - we now parse and load the functions in an isolated environment. Syntax errors that lead to parsing errors will trigger warnings if verbose is TRUE - add new tests for export patterns - refactor tests to better isolate "no docs" from syntax errors in the package
I wasn't able to come up with a method for using |
Did you review what I posted at #19 (comment)? |
Oh no I had not seen that; Ill give that a spin. That logic does look a bit complicated though: If I get the same results for the packages I test I would probably prefer the current method (thanks for taking a stab at it) |
FWIW I compared the two methods with the Exact package, where > get_functions(pkg_source_path)
# A tibble: 50 × 2
func code_file
<chr> <chr>
1 binom.CI R/binom.CI.R
2 binomialCode R/binomialCode.R
3 checkPairedParam R/checkPairedParam.R
4 checkParam R/checkParam.R
5 chisq_TX R/chisq_TX.R
6 confInt R/confInt.R
7 confIntPaired R/confIntPaired.R
8 convertMethod R/convertMethod.R
9 csmPairedTemp R/csmPairedTemp.R
10 csmPairedTemp2sidedDelta R/csmPairedTemp2sidedDelta.R
# ℹ 40 more rows
# ℹ Use `print(n = ...)` to see more rows
> get_all_functions(pkg_source_path)
# A tibble: 48 × 2
func code_file
<chr> <chr>
1 binom.CI R/binom.CI.R
2 binomialCode R/binomialCode.R
3 checkPairedParam R/checkPairedParam.R
4 checkParam R/checkParam.R
5 chisq_TX R/chisq_TX.R
6 confInt R/confInt.R
7 confIntPaired R/confIntPaired.R
8 convertMethod R/convertMethod.R
9 csmPairedTemp R/csmPairedTemp.R
10 csmPairedTemp2sidedDelta R/csmPairedTemp2sidedDelta.R > funcs1 <- get_all_functions(pkg_source_path)$func
> funcs2 <- get_functions(pkg_source_path)$func
> setdiff(funcs2, funcs1)
[1] ".onLoad" ".onAttach"
> setdiff(funcs1, funcs2)
character(0) The difference can be attributed to loading the functions in a separate namespace. I dont fully understand what's going on yet, but you can see the difference if you look at |
I just skimmed what you wrote, but you're likely overlooking the
|
Yup - good catch. It seems |
I haven't looked at this PR yet and don't have the context, but, assuming my understanding is correct that you're just getting a list of candidates to prune based on the export list (including patterns), I don't see why you wouldn't follow what for (p in nsInfo$exportPatterns)
exports <- c(ls(env, pattern = p, all.names = TRUE), exports) |
I looked through this a bit today. I added the three "New Test Cases" bullets to the description. If there are other (from #19 or otherwise) feel free to add them there, and check things off as they get implemented/reviewed. In terms of the approach, it seems like both this one and the one in #19 (comment) handle the cases we're concerned about. I don't have a strong feeling one way or the other, but I guess I lean towards this one, in part because it actually evaluates the code in the way the loading the package does. This just feels more likely to catch weird cases, but I may be off on that. Also, it has the fringe benefit of catching non-functions (some of which end up being exported) and maps them to the script where they're defined. To be honest, I'm not 100% sure we want to include these, but at this point I think it's the right move. All of that said, @kyleam can you give this a real review at some point next week? If we can get it in good shape by a week or so from now, I think that gives us plenty of time to use it for the next MPN snapshot. Let me know if there are other questions and we can connect on this later if needed. |
Also, it seems like this builds on top of (and replaces some of) what's in #31. @barrettk and @kyleam you may want to discuss whether it's easier to look at these changes, or just merge this into #31 and then review it there. I have no opinion, I just wanted to put it on your radar, before you get too deep into code review. |
if(isTRUE(verbose)) warning(warningCondition(w)) | ||
w | ||
}, | ||
message = function(m) m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please unlearn this pattern. It will only bring unfortunate bugs.
- any warning halts covr run #11
- f6b2efe (coverage: do not catch warnings when running covr, 2023-08-14)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That shouldnt impact covr
though? Don't see how that issue is relevant unless we ran this package through covr and it triggered a warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see how that issue is relevant unless we ran this package through covr and it triggered a warning
covr isn't a relevant piece. See the demo in the issue I linked (gh-11). tryCatch(..., {message,warning} = ...)
is a bad pattern because it halts the execution on the first warning or message received.
If you want to act on a message or warning, you can instead use withCallingHandlers
and then resume execution. Depending on the case, you may find it more convenient to use purrr::quietly
, which uses withCallingHandlers
underneath to handle messages and warnings and also has logic to capture standard output.
…ct.org/doc/manuals/r-devel/R-ints.html) (meant to remove .__T__ and .__global__) - update docs
- Will only warn for parsing errors (eval errors/warnings will be coerced to messages when verbose = TRUE (otherwise swallowed)). - could be worth looking at the test `make_traceability_matrix - cant link exports integration test` - Not sure if this^ is how we want messages/warnings to display for various issues (with verbose = TRUE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ At this point, I'm going to leave be any inline comments and stick to a high-level review. ]
As I understand it, the approach here is
-
parse each code file under
R/
-
for each file, evaluate each top-level expression to get the top-level assignments
This is
sys.source
in spirit, with a key difference: if evaluating an expression leads to an error, the code here tries to evaluate the remaining expressions. -
determine the list of assignments for a given file by comparing the names in the shared evaluation environment before and after evaluating a file's expressions
The main appeal of that approach is that you don't have to account for all the different ways that a top-level assignment could be made (e.g., it would handle an assignment tucked under an if
and various function that could assign to the environment).
The downside is that you add the complexity of evaluation. If you get some details wrong, you miss out on the expressions that fail and downstream expressions that depend on the failing expression. But due to the 'evaluate each expression, catching failures' approach you can still get the bulk of the typical assignments.
That complexity affects the tests too. If you just work with the parse data, the logic is self-contained and the test cases are just input files/strings. If you add evaluation to the mix, you need to worry about the evaluation context.
Issues with the current state
There are two high-level issues with the current code:
-
it uses
.BaseNamespaceEnv
as the parent environment for evaluation. It's missing theimports:<package>
environment that usually comes between for a real package namespace.In the most common cases, that will just mean some expressions fail to evaluate, though it'd also be possible for an expression to lead to a different assignment due to different behavior between a name that's in
.BaseNamespaceEnv
andimports:<package>
. -
it doesn't consider
Collate
. For packages that specify that because they require the specified load order, some expressions will fail, missing some assignments.
Path forward
In both the cases above, the main consequence is missing somewhat tricky cases, while still catching plain function assignments of interest. However, if we're deciding to evaluate, I think we ought to get these details right. We could set up the environment to include imports and also honor Collate
(both pkgload and core R can be referenced here for guidance).
Otherwise, it seems to me the parse data is good enough (and catches some of assignments that would be missed due to the issues above). Here's another example of how we could use the parse data without evaluation. Part of the simplification comes from changing the goal from "all top-level functions" to "all top-level assignments".
get_toplevel_assignments <- function(file) {
exprs <- tryCatch(parse(file), error = identity)
if (inherits(exprs, "error")) {
warning("Failed to parse ", file, ": ", conditionMessage(exprs))
return(character())
}
calls <- purrr::keep(as.list(exprs), function(e) {
if (is.call(e)) {
op <- as.character(e[[1]])
return(length(op) == 1 && op %in% c("<-", "=", "setGeneric"))
}
return(FALSE)
})
lhs <- purrr::map(calls, function(e) {
name <- as.character(e[[2]])
if (length(name) == 1) {
return(name)
}
})
return(unlist(lhs) %||% character())
}
In summary, I'm fine evaluating or using the parse data, but I think the above issues should be resolved if we're going to evaluate.
I'm coming around to the I started a branch off of this to try it out, and it seems to be working as desired. See draft PR #41 for details. |
This was superseded by #42, with the non-eval related changes incorporated there. |
Improved Readability
get_testing_dir
down, andget_exports
higher (inmake-traceability-matrix.R
) to improve the flow of building the trace matrixActual Changes
tests/testthat/setup.R
for testing export patterns (package wasn't being set up properly)New Test Cases
get_all_functions()
should catch multi-line function declarationsget_all_functions()
should filter out nested functionsget_exports()
should handleexportPattern
syntax correctly