-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add osv-scalibr as a dependency extraction method for PR remediation #4688
Conversation
Minder Vulnerability Report ✅Minder analyzed this PR and found it does not add any new vulnerable dependencies.
|
@oliverchang -- I'd appreciate your review of the usage of Thanks! |
@erikvarga, @another-rex are the actual OSV-Scalibr experts -- would you be able to help here? |
} | ||
|
||
func inventorySorter(a *extractor.Inventory, b *extractor.Inventory) int { | ||
return cmp.Or(cmp.Compare(a.Name, b.Name), cmp.Compare(a.Version, b.Version)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you get packages whose name and version are the same? (e.g. two identical lockfiles scanned from different files)
We have an existing comparison function for Inventory in SCALIBR that looks at all the relevant fields of the inventory: https://github.com/google/osv-scalibr/blob/main/scalibr.go#L293
If you think it'd beneficial to use for you we can make it public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Good point. Yes, that would be a useful function to expose, though you might benefit from using cmp.Or
and cmp.Compare
over the current implementation. 😁
I'm also wondering if you want to guarantee that extractor.Inventory.Locations
is sorted; as it is, it's possible to construct two Inventories which are equivalent but sort differently.
} | ||
|
||
desiredCaps := scalibr_plugin.Capabilities{ | ||
OS: scalibr_plugin.OSAny, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSAny is supposed to be used for specifying plugin requirements only. When specifying the scan environment's capabilities you should pick a specific OS.
You can use the SCALIBR-prodivded platform.OS() function to determine the OS at runtime: https://github.com/google/osv-scalibr/blob/main/binary/cli/cli.go#L418
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is kinda funny... we actually want to use the platform that the developers would be using for their builds (which is not easy to extract from a repo), but we're probably running on Linux, so I guess we can use that for now.
// All includes Ruby, Dotnet which we're not ready to test yet, so use the more limited Default set. | ||
FilesystemExtractors: list.FilterByCapabilities(list.Default, &desiredCaps), | ||
Capabilities: &desiredCaps, | ||
Stats: scalibr_stats.NoopCollector{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be left empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done
zerolog.Ctx(context.Background()).Warn().Msg("nil ecosystem scanning diffs") | ||
return pbinternal.DepEcosystem_DEP_ECOSYSTEM_UNSPECIFIED | ||
} | ||
// This should be inventory.GetEcosystem()... but there isn't a convenience wrapper yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about inventory.Ecosystem()?
https://github.com/google/osv-scalibr/blob/main/extractor/extractor.go#L86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oop, I somehow missed these when I was reading the code (since godoc isn't published yet). Thanks!
} | ||
|
||
// Sometimes Scalibr uses the string "PyPI" instead of "pypi" when reporting the ecosystem. | ||
switch strings.ToLower(ecosystem) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're interested in the PURL-defined package type you should use converter.ToPURL(inventory).Type or inventory.Extractor.ToPURL(inventory).Type (https://github.com/google/osv-scalibr/blob/main/converter/converter.go#L45)
Ecosystem is specific to the OSV schema and sometimes use different strings than PURL package types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we should probably provide an Inventory.PURL() member function similar to Inventory.Ecosystem() so people don't have to import converter.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find constants for the OSV Ecosystem types -- it would be great if there were some type hints in the interface that would point me towards the correct constants (and then enforced the use of those constants in the rest of the codebase). I'll switch to using ToPURL
for now -- is there a "preferred" set of values? (It seems like a shame to assemble an entire PURL just to extract the Type
parameter.)
I'm also wondering if it makes sense to remove the error
parameter from the Ecosystem
, ToPURL
and ToCPEs
interface methods -- the only place where it is used in is in the extractor_dummy.go
files, where the Extract
method that returns []*extractor.Inventory
already returns an error and a nil
list. Removing the error parameter would simplify a lot of consumer code and coverage checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, at the moment we don't really have any cases where we'd need to return an error. I'll discuss internally if we ever would want to do that or if we can just remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oliverchang are you aware of any documentation about the list of OSV Ecosystem values apart from https://ossf.github.io/osv-schema/#affectedpackage-field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll discuss internally if we ever would want to do that or if we can just remove it.
Update: We ended up removing it: google/osv-scalibr@9f86538
So if you pin SCALIBR to after this commit you should be able to get rid of the error handling in your code too.
case "", pb.DiffTypeDep: | ||
allDiffs := make([]*pbinternal.PrDependencies_ContextualDependency, 0) | ||
for { | ||
if pr.Number > math.MaxInt { | ||
return nil, fmt.Errorf("pr number is too large") | ||
} | ||
prFiles, resp, err := di.cli.ListFiles(ctx, pr.RepoOwner, pr.RepoName, int(pr.Number), prFilesPerPage, page) | ||
if err != nil { | ||
return nil, fmt.Errorf("error getting pull request files: %w", err) | ||
} | ||
return di.getDepTypeDiff(ctx, prNumber, pr) | ||
|
||
for _, file := range prFiles { | ||
fileDiffs, err := di.ingestFileForDepDiff(file.GetFilename(), file.GetPatch(), file.GetRawURL(), logger) | ||
if err != nil { | ||
return nil, fmt.Errorf("error ingesting file %s: %w", file.GetFilename(), err) | ||
} | ||
allDiffs = append(allDiffs, fileDiffs...) | ||
} | ||
case pb.DiffTypeNewDeps: | ||
// TODO: once we've tested some, convert DiffTypeDep to use this algorithm. | ||
return di.getScalibrTypeDiff(ctx, prNumber, pr) | ||
|
||
if resp.NextPage == 0 { | ||
break | ||
} | ||
case pb.DiffTypeFull: | ||
return di.getFullTypeDiff(ctx, prNumber, pr) | ||
|
||
page = resp.NextPage | ||
} | ||
default: | ||
return nil, fmt.Errorf("unknown diff type") | ||
} | ||
} | ||
|
||
return &engif.Result{ | ||
Object: &pbinternal.PrDependencies{ | ||
Pr: pr, | ||
Deps: allDiffs, | ||
}, | ||
// NOTE: At this point we're only retrieving the timestamp as the checkpoint. | ||
Checkpoint: checkpoints.NewCheckpointV1(time.Now()), | ||
}, nil | ||
func (di *Diff) getDepTypeDiff(ctx context.Context, prNumber int, pr *pb.PullRequest) (*engif.Result, error) { | ||
deps := pbinternal.PrDependencies{Pr: pr} | ||
page := 0 | ||
|
||
case pb.DiffTypeFull: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was getting "big" from a cyclomatic complexity point of view, so I moved the code for each switch case into a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I can pull this out into a separate PR if it would help for reviewing)
func inventorySorter(a *extractor.Inventory, b *extractor.Inventory) int { | ||
return cmp.Or(cmp.Compare(a.Name, b.Name), cmp.Compare(a.Version, b.Version)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have this in the extractor
module, but it's not too hard to have it here.
} | ||
|
||
// Sometimes Scalibr uses the string "PyPI" instead of "pypi" when reporting the ecosystem. | ||
switch strings.ToLower(ecosystem) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find constants for the OSV Ecosystem types -- it would be great if there were some type hints in the interface that would point me towards the correct constants (and then enforced the use of those constants in the rest of the codebase). I'll switch to using ToPURL
for now -- is there a "preferred" set of values? (It seems like a shame to assemble an entire PURL just to extract the Type
parameter.)
I'm also wondering if it makes sense to remove the error
parameter from the Ecosystem
, ToPURL
and ToCPEs
interface methods -- the only place where it is used in is in the extractor_dummy.go
files, where the Extract
method that returns []*extractor.Inventory
already returns an error and a nil
list. Removing the error parameter would simplify a lot of consumer code and coverage checks.
zerolog.Ctx(context.Background()).Warn().Msg("nil ecosystem scanning diffs") | ||
return pbinternal.DepEcosystem_DEP_ECOSYSTEM_UNSPECIFIED | ||
} | ||
// This should be inventory.GetEcosystem()... but there isn't a convenience wrapper yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oop, I somehow missed these when I was reading the code (since godoc isn't published yet). Thanks!
} | ||
|
||
func inventorySorter(a *extractor.Inventory, b *extractor.Inventory) int { | ||
return cmp.Or(cmp.Compare(a.Name, b.Name), cmp.Compare(a.Version, b.Version)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Good point. Yes, that would be a useful function to expose, though you might benefit from using cmp.Or
and cmp.Compare
over the current implementation. 😁
I'm also wondering if you want to guarantee that extractor.Inventory.Locations
is sorted; as it is, it's possible to construct two Inventories which are equivalent but sort differently.
} | ||
|
||
desiredCaps := scalibr_plugin.Capabilities{ | ||
OS: scalibr_plugin.OSAny, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is kinda funny... we actually want to use the platform that the developers would be using for their builds (which is not easy to extract from a repo), but we're probably running on Linux, so I guess we can use that for now.
// All includes Ruby, Dotnet which we're not ready to test yet, so use the more limited Default set. | ||
FilesystemExtractors: list.FilterByCapabilities(list.Default, &desiredCaps), | ||
Capabilities: &desiredCaps, | ||
Stats: scalibr_stats.NoopCollector{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done
osv-scalibr does have a 'package.json' extractor at: extractor/filesystem/language/javascript/packagejson/extractor.go |
Yes, but that extractor does not unpack the |
Similarly, there is an extractor for |
Adding or modifying extractors in SCALIBR is relatively simple (https://github.com/google/osv-scalibr/blob/main/docs/new_extractor.md) so we're happy to merge in PRs for this functionality. |
BTW, not Scalibr's fault, but it turns out that |
}, | ||
File: &pbinternal.PrDependencies_ContextualDependency_FilePatch{ | ||
Name: filename, | ||
PatchUrl: "", // TODO: do we need this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to diff the deps, but we'd need this in case we wanted to provide the inline reviews. (Which in their current form where we just propose the version bump w/o proposing also any possible transitive deps is...demo quality)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I want to think through what the abstract interface is for proposing patch comments upstream, but I need to take a look at the GitHub PR and GitLab MR comment APIs before I propose something concrete. (In any case, I think this is a place where we want to be able to leverage the providers so this can be re-used across multiple repo providers.)
Summary
Switches from a home-rolled dependency parser to the (in-development) OSV-Scalibr library from Google, which already has parsers for our current set of dependencies.†
† Except that OSV-Scalibr doesn't seem to parse
dependencies
frompackage.json
, and only parsespackage-lock.json
. We'd prefer to get both dependency locations, so we can patch both of them.I have not documented the
new-dep
PR ingestion type, as I'd like to replace the existingdep
type (via experiment) over time, but wasn't quite ready to wire that up in the same PR.This also moves us to development branches of both
osv-scalibr
andgo-billy
to take advantage of the in-flight code.osv-scalibr
's development branch has a new API that the Google team recommended, andgo-billy
has an adapter from thebilly.Filesystem
type toio/fs.FS
native Golang (read-only) filesystem.Change Type
Mark the type of change your PR introduces:
Testing
I did a bunch of manual testing with
cmd/dev
on an actual in-flight PR, and then added unit tests to cover the languages we currently support.Review Checklist: