Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental support for dry run #698

Closed
wants to merge 2 commits into from

Conversation

kemzeb
Copy link
Contributor

@kemzeb kemzeb commented Dec 9, 2023

Works towards #303.

@codecov-commenter
Copy link

codecov-commenter commented Dec 9, 2023

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Comparison is base (dd26cdc) 78.86% compared to head (c7b9817) 78.50%.

Files Patch % Lines
pkg/osvscanner/osvscanner.go 23.33% 20 Missing and 3 partials ⚠️
pkg/osv/osv.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #698      +/-   ##
==========================================
- Coverage   78.86%   78.50%   -0.36%     
==========================================
  Files          84       84              
  Lines        5943     5965      +22     
==========================================
- Hits         4687     4683       -4     
- Misses       1054     1076      +22     
- Partials      202      206       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kemzeb
Copy link
Contributor Author

kemzeb commented Dec 9, 2023

Decided to make it an experimental feature for now (I guess if user feedback has it change for some reason, though I don't believe users will throw this into scripts or anything like that).

Took some time trying to figure out how to implement this without making many non-DRY changes or having to resort to changing public API; my current approach is to "mock" the http.Client instance that performs these requests by just having an implementation of the http.RoundTripper interface print out data and return a fake HTTP response.

It seems to work fine for the v1/querybatch and v1expermiental/determineversion endpoints:

. . .
Scanning directory for vendored libs: /workspaces/osv-scanner/internal/thirdparty
Scanning potential vendored dir: /workspaces/osv-scanner/internal/thirdparty/ar
POST https://api.osv.dev/v1experimental/determineversion
POST https://api.osv.dev/v1/querybatch
vscode ➜ /workspaces/osv-scanner (support-dry-run) $ 

but I wish to have a discussion about the following questions that came up:

  1. How should these API calls look when we print it out? Should I use the Reporter here? Also, should I print out the body for POST requests?
  2. Consuming GET /v1/vulns/:id requires the batch request to give us a list of vulnerability IDs for each query, but since we're doing a dry run we can't get this information. What should we do here?
  3. When I manually tested this change by running it against osv-scanner, it seems that the reporter calls a PrintError() somewhere that will resort in a non-zero exit code. Should we just exit early when we are done with dry run mode?
  4. Not quite sure how to test this. Should I create unit tests for osvscanner.DoScan()?
  5. Should I consider the API calls made in the experimental features (e.g. we perform a HEAD/GET request on zip db archive, use gRPC with the deps.dev API)?

Please let me know if I missed any API endpoints

@another-rex
Copy link
Collaborator

Thanks! I think the current implementation for dry run looks pretty good!

However this does add quite a bit of complexity, and the problem where there are dependencies in the requests like you mentioned in point 2.

That's why I was thinking maybe going with a verbose but not dry run option, which would be easier to implement, but still work for someone trying to learn what info is being sent when running osv-scanner. For users with very sensitive dependencies they will have to use the --local-db / --offline option anyway, so it seems like there isn't a big incentive for a --dry-run option.

  1. How should these API calls look when we print it out? Should I use the Reporter here? Also, should I print out the body for POST requests?

Yes, reporter is the way to go. If we don't need to do the dry-run option, another way to implement this is to extend the reporter to print different verbosity levels (err, warn, info, verbose...) (which I want to do anyway in the future), this could make the implementation for a verbose mode quite straightforward - store the verbosity level in reporter, and add print statements to all the makeRequest functions. This will also make it easy to cover the gRPC requests

Yes I think POST bodies should be printed out, they are probably what someone trying to understand what information osv-scanner is sending out want to see.

  1. Consuming GET /v1/vulns/:id requires the batch request to give us a list of vulnerability IDs for each query, but since we're doing a dry run we can't get this information. What should we do here?
  2. When I manually tested this change by running it against osv-scanner, it seems that the reporter calls a PrintError() somewhere that will resort in a non-zero exit code. Should we just exit early when we are done with dry run mode?

See above about still logging the queries, just without a dry-run. I'm not sure the dry-run feature is worth it to have to cover these cases.

  1. Not quite sure how to test this. Should I create unit tests for osvscanner.DoScan()?

The best way to test this is probably to add tests to cmd/osv-scanner/main_test.go using the fixtures there show the output.

  1. Should I consider the API calls made in the experimental features (e.g. we perform a HEAD/GET request on zip db archive, use gRPC with the deps.dev API)?

Yes, I think we should also print out what data queries are sending too.

@another-rex
Copy link
Collaborator

By the way, do you also personally have a specific use case in mind for needing dry-run instead of just a verbosity flag as well?

@kemzeb
Copy link
Contributor Author

kemzeb commented Dec 11, 2023

Thanks for reviewing this @another-rex!

Yes, reporter is the way to go. If we don't need to do the dry-run option, another way to implement this is to extend the reporter to print different verbosity levels (err, warn, info, verbose...) (which I want to do anyway in the future), this could make the implementation for a verbose mode quite straightforward - store the verbosity level in reporter, and add print statements to all the makeRequest functions. This will also make it easy to cover the gRPC requests

I think defining verbosity levels is a great idea compared to the limitations I currently see with implementing a dry run mode (i.e. point 2). I agree with you that a dry run mode may not be useful given that users can now opt to perform their scan locally (it seems this feature was made long after the issue I'm trying to solve was made).

By the way, do you also personally have a specific use case in mind for needing dry-run instead of just a verbosity flag as well?

No use case for wanting a dry run mode, just saw an open issue and wanted to see if I could resolve it :^).

If there is an interest for a verbose option, I could try tackling this in a different PR. From what I understand what should be done here is 2-fold:

  1. Modify the Reporter interface to allow callers to specify a verbosity level (maybe have PrintError() and PrintText() use an "Error" level and "Info" level respectfully; not sure if you would want to remove them since its public API). Expose a new CLI option argument for this ( maybe like --verbose-level info)
  2. Just add print statements to the makeRequest functions and set each to the highest verbose level.

Edit: Actually I think these should be two separate PRs as I now think implementing verbosity levels maybe a tad more complicated then how I describe it here 😅

@another-rex
Copy link
Collaborator

Technically adding additional functions to the Reporter interface is a breaking change, but I believe there are no other implementations (at least public on github, from a quick code search) of this interface, and there are no good use cases for implementing this manually instead of using one of the preset implementations we provide.

Because of the additional complexity required to keep compatibility let's just go ahead and add those methods directly onto Reporter and update the various reporter presets we have to also implement the new methods. We probably should make the reporter interface itself internal as well to make sure no other implementations exist in the future, but I'll do that in a separate PR.

@kemzeb
Copy link
Contributor Author

kemzeb commented Dec 13, 2023

Gotcha, I'll close this PR and set my sights on implementing verbosity levels for the Reporter interface and update those who implement it when I get the time.

@kemzeb
Copy link
Contributor Author

kemzeb commented Dec 20, 2023

Hey @another-rex, FYI I haven't forgot about this but decided to wait until #706 is merged to start tackling this 👍

@kemzeb kemzeb deleted the support-dry-run branch December 20, 2023 18:52
another-rex pushed a commit that referenced this pull request Dec 27, 2023
When I originally implemented `Reporter`, IDEs such as GoLand didn't
support custom `Printf` functions so I stuck with plain methods and did
the `fmt` formatting on the string; that's changed as of GoLand 2023.3
via [GO-5841](https://youtrack.jetbrains.com/issue/GO-5841) 🎉

Technically adding to `Reporter` is a breaking change but as covered [in
this
comment](#698 (comment)):

> I believe there are no other implementations (at least public on
github, from a quick code search) of this interface, and there are no
good use cases for implementing this manually instead of using one of
the preset implementations we provide.

Either way I think it's better to land these ASAP to reduce the blast
radius then to carry them around for possibly a lot longer - note that
I'm not strictly against deprecating/removing `PrintText` and
`PrintError` though I don't think there's a lot of value in keeping
them.

As penance, I've also added rich method comments for the interface.

(having said that, since this is a breaking change already maybe we
should just remove `PrintText` and `PrintError` right now)
another-rex added a commit that referenced this pull request Jan 9, 2024
Inspired by what was discussed in #698, this PR makes the necessary
(breaking) changes to the `reporter` package such that the `Reporter`
interface now provides methods for printing at certain levels (i.e.
ErrorLevel, WarnLevel, InfoLevel, and VerboseLevel). It also adds a
--verbosity option so that users/automated tools can take advantage of
this too.

---------

Co-authored-by: Rex P <[email protected]>
@kemzeb kemzeb mentioned this pull request Jan 15, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants