Add experimental support for dry run #698

kemzeb · 2023-12-09T08:52:54Z

Works towards #303.

codecov-commenter · 2023-12-09T08:54:01Z

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Comparison is base (dd26cdc) 78.86% compared to head (c7b9817) 78.50%.

Files	Patch %	Lines
pkg/osvscanner/osvscanner.go	23.33%	20 Missing and 3 partials ⚠️
pkg/osv/osv.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #698      +/-   ##
==========================================
- Coverage   78.86%   78.50%   -0.36%     
==========================================
  Files          84       84              
  Lines        5943     5965      +22     
==========================================
- Hits         4687     4683       -4     
- Misses       1054     1076      +22     
- Partials      202      206       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kemzeb · 2023-12-09T09:12:58Z

Decided to make it an experimental feature for now (I guess if user feedback has it change for some reason, though I don't believe users will throw this into scripts or anything like that).

Took some time trying to figure out how to implement this without making many non-DRY changes or having to resort to changing public API; my current approach is to "mock" the http.Client instance that performs these requests by just having an implementation of the http.RoundTripper interface print out data and return a fake HTTP response.

It seems to work fine for the v1/querybatch and v1expermiental/determineversion endpoints:

. . .
Scanning directory for vendored libs: /workspaces/osv-scanner/internal/thirdparty
Scanning potential vendored dir: /workspaces/osv-scanner/internal/thirdparty/ar
POST https://api.osv.dev/v1experimental/determineversion
POST https://api.osv.dev/v1/querybatch
vscode ➜ /workspaces/osv-scanner (support-dry-run) $

but I wish to have a discussion about the following questions that came up:

How should these API calls look when we print it out? Should I use the Reporter here? Also, should I print out the body for POST requests?
Consuming GET /v1/vulns/:id requires the batch request to give us a list of vulnerability IDs for each query, but since we're doing a dry run we can't get this information. What should we do here?
When I manually tested this change by running it against osv-scanner, it seems that the reporter calls a PrintError() somewhere that will resort in a non-zero exit code. Should we just exit early when we are done with dry run mode?
Not quite sure how to test this. Should I create unit tests for osvscanner.DoScan()?
Should I consider the API calls made in the experimental features (e.g. we perform a HEAD/GET request on zip db archive, use gRPC with the deps.dev API)?

Please let me know if I missed any API endpoints

another-rex · 2023-12-11T04:21:44Z

Thanks! I think the current implementation for dry run looks pretty good!

However this does add quite a bit of complexity, and the problem where there are dependencies in the requests like you mentioned in point 2.

That's why I was thinking maybe going with a verbose but not dry run option, which would be easier to implement, but still work for someone trying to learn what info is being sent when running osv-scanner. For users with very sensitive dependencies they will have to use the --local-db / --offline option anyway, so it seems like there isn't a big incentive for a --dry-run option.

How should these API calls look when we print it out? Should I use the Reporter here? Also, should I print out the body for POST requests?

Yes, reporter is the way to go. If we don't need to do the dry-run option, another way to implement this is to extend the reporter to print different verbosity levels (err, warn, info, verbose...) (which I want to do anyway in the future), this could make the implementation for a verbose mode quite straightforward - store the verbosity level in reporter, and add print statements to all the makeRequest functions. This will also make it easy to cover the gRPC requests

Yes I think POST bodies should be printed out, they are probably what someone trying to understand what information osv-scanner is sending out want to see.

Consuming GET /v1/vulns/:id requires the batch request to give us a list of vulnerability IDs for each query, but since we're doing a dry run we can't get this information. What should we do here?

When I manually tested this change by running it against osv-scanner, it seems that the reporter calls a PrintError() somewhere that will resort in a non-zero exit code. Should we just exit early when we are done with dry run mode?

See above about still logging the queries, just without a dry-run. I'm not sure the dry-run feature is worth it to have to cover these cases.

Not quite sure how to test this. Should I create unit tests for osvscanner.DoScan()?

The best way to test this is probably to add tests to cmd/osv-scanner/main_test.go using the fixtures there show the output.

Should I consider the API calls made in the experimental features (e.g. we perform a HEAD/GET request on zip db archive, use gRPC with the deps.dev API)?

Yes, I think we should also print out what data queries are sending too.

another-rex · 2023-12-11T04:38:22Z

By the way, do you also personally have a specific use case in mind for needing dry-run instead of just a verbosity flag as well?

kemzeb · 2023-12-11T07:06:26Z

Thanks for reviewing this @another-rex!

Yes, reporter is the way to go. If we don't need to do the dry-run option, another way to implement this is to extend the reporter to print different verbosity levels (err, warn, info, verbose...) (which I want to do anyway in the future), this could make the implementation for a verbose mode quite straightforward - store the verbosity level in reporter, and add print statements to all the makeRequest functions. This will also make it easy to cover the gRPC requests

I think defining verbosity levels is a great idea compared to the limitations I currently see with implementing a dry run mode (i.e. point 2). I agree with you that a dry run mode may not be useful given that users can now opt to perform their scan locally (it seems this feature was made long after the issue I'm trying to solve was made).

By the way, do you also personally have a specific use case in mind for needing dry-run instead of just a verbosity flag as well?

No use case for wanting a dry run mode, just saw an open issue and wanted to see if I could resolve it :^).

If there is an interest for a verbose option, I could try tackling this in a different PR. From what I understand what should be done here is 2-fold:

Modify the Reporter interface to allow callers to specify a verbosity level (maybe have PrintError() and PrintText() use an "Error" level and "Info" level respectfully; not sure if you would want to remove them since its public API). Expose a new CLI option argument for this ( maybe like --verbose-level info)
Just add print statements to the makeRequest functions and set each to the highest verbose level.

Edit: Actually I think these should be two separate PRs as I now think implementing verbosity levels maybe a tad more complicated then how I describe it here 😅

another-rex · 2023-12-13T06:03:09Z

Technically adding additional functions to the Reporter interface is a breaking change, but I believe there are no other implementations (at least public on github, from a quick code search) of this interface, and there are no good use cases for implementing this manually instead of using one of the preset implementations we provide.

Because of the additional complexity required to keep compatibility let's just go ahead and add those methods directly onto Reporter and update the various reporter presets we have to also implement the new methods. We probably should make the reporter interface itself internal as well to make sure no other implementations exist in the future, but I'll do that in a separate PR.

kemzeb · 2023-12-13T20:03:43Z

Gotcha, I'll close this PR and set my sights on implementing verbosity levels for the Reporter interface and update those who implement it when I get the time.

kemzeb · 2023-12-20T18:51:48Z

Hey @another-rex, FYI I haven't forgot about this but decided to wait until #706 is merged to start tackling this 👍

When I originally implemented `Reporter`, IDEs such as GoLand didn't support custom `Printf` functions so I stuck with plain methods and did the `fmt` formatting on the string; that's changed as of GoLand 2023.3 via [GO-5841](https://youtrack.jetbrains.com/issue/GO-5841) 🎉 Technically adding to `Reporter` is a breaking change but as covered [in this comment](#698 (comment)): > I believe there are no other implementations (at least public on github, from a quick code search) of this interface, and there are no good use cases for implementing this manually instead of using one of the preset implementations we provide. Either way I think it's better to land these ASAP to reduce the blast radius then to carry them around for possibly a lot longer - note that I'm not strictly against deprecating/removing `PrintText` and `PrintError` though I don't think there's a lot of value in keeping them. As penance, I've also added rich method comments for the interface. (having said that, since this is a breaking change already maybe we should just remove `PrintText` and `PrintError` right now)

Inspired by what was discussed in #698, this PR makes the necessary (breaking) changes to the `reporter` package such that the `Reporter` interface now provides methods for printing at certain levels (i.e. ErrorLevel, WarnLevel, InfoLevel, and VerboseLevel). It also adds a --verbosity option so that users/automated tools can take advantage of this too. --------- Co-authored-by: Rex P <[email protected]>

kemzeb added 2 commits December 9, 2023 00:30

Add --experimental-dry-run with batch and hydrate request support

4c2e77b

Support dry run when consuming /v1experimental/determineversion

c7b9817

kemzeb closed this Dec 13, 2023

G-Rath mentioned this pull request Dec 16, 2023

feat: support PrintTextf and PrintErrorf on Reporter #706

Merged

kemzeb deleted the support-dry-run branch December 20, 2023 18:52

kemzeb mentioned this pull request Dec 28, 2023

Add support for verbosity levels #727

Merged

kemzeb mentioned this pull request Jan 15, 2024

Add printouts for network calls #746

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental support for dry run #698

Add experimental support for dry run #698

kemzeb commented Dec 9, 2023

codecov-commenter commented Dec 9, 2023 •

edited

Loading

kemzeb commented Dec 9, 2023 •

edited

Loading

another-rex commented Dec 11, 2023

another-rex commented Dec 11, 2023

kemzeb commented Dec 11, 2023 •

edited

Loading

another-rex commented Dec 13, 2023

kemzeb commented Dec 13, 2023

kemzeb commented Dec 20, 2023 •

edited

Loading

Add experimental support for dry run #698

Add experimental support for dry run #698

Conversation

kemzeb commented Dec 9, 2023

codecov-commenter commented Dec 9, 2023 • edited Loading

Codecov Report

kemzeb commented Dec 9, 2023 • edited Loading

another-rex commented Dec 11, 2023

another-rex commented Dec 11, 2023

kemzeb commented Dec 11, 2023 • edited Loading

another-rex commented Dec 13, 2023

kemzeb commented Dec 13, 2023

kemzeb commented Dec 20, 2023 • edited Loading

codecov-commenter commented Dec 9, 2023 •

edited

Loading

kemzeb commented Dec 9, 2023 •

edited

Loading

kemzeb commented Dec 11, 2023 •

edited

Loading

kemzeb commented Dec 20, 2023 •

edited

Loading