-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaned up ingestion #117
Cleaned up ingestion #117
Conversation
Caution Review failedThe pull request is closed. WalkthroughThe pull request introduces significant changes to the ingestion and processing of Software Bill of Materials (SBOM) and vulnerability data across multiple files. Key modifications include the restructuring of command functionalities, removal of obsolete files, and updates to test cases. The Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
f42dd74
to
a8b3fcc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.
Actionable comments posted: 12
🧹 Outside diff range and nitpick comments (22)
cmd/ingest/sbom/sbom.go (2)
21-25
: Enhance error handling and input validationThe error handling could be more specific, and input validation is missing.
Consider these improvements:
func (o *options) Run(_ *cobra.Command, args []string) error { sbomPath := args[0] + if o.storage == nil { + return fmt.Errorf("storage is not initialized") + } + if _, err := os.Stat(sbomPath); err != nil { + return fmt.Errorf("invalid SBOM path %q: %w", sbomPath, err) + } result, err := ingest.LoadDataFromPath(o.storage, sbomPath) if err != nil { - return fmt.Errorf("failed to ingest SBOM: %w", err) + return fmt.Errorf("failed to load SBOM data from %q: %w", sbomPath, err) }
32-32
: Enhance success message with statisticsThe success message could provide more value by including statistics about the ingestion.
- fmt.Println("\nSBOM ingested successfully") + fmt.Printf("\nSuccessfully ingested %d SBOMs from %s\n", len(result), tools.TruncateString(sbomPath, 50))pkg/tools/ingest/vuln_test.go (2)
36-46
: LGTM! Consider enhancing test coverage.The vulnerability ingestion implementation follows the same clean pattern as SBOM ingestion. However, consider adding test cases for:
- Empty/malformed vulnerability data
- Edge cases in the vulnerability structure
Would you like me to provide example test cases for these scenarios?
Line range hint
8-57
: Consider improving test isolation.The test currently combines SBOM and vulnerability ingestion in a single test function. Consider splitting these into separate test functions for better isolation and maintenance.
This would:
- Make test failures more specific
- Allow for independent testing of each ingestion type
- Improve test maintainability
pkg/tools/ingest/sbom_test.go (1)
40-52
: Refactor duplicate error handling logicThe error checking logic is duplicated between the LoadDataFromPath call and the SBOM processing loop. Consider refactoring to simplify the error handling.
result, err := LoadDataFromPath(storage, test.sbomPath) - if test.wantErr != (err != nil) { - t.Errorf("Sbom() error = %v, wantErr = %v", err, test.wantErr) - } + if err != nil { + if !test.wantErr { + t.Errorf("LoadDataFromPath() error = %v, test expects no error", err) + } + return + } for _, data := range result { if err := SBOM(storage, data.Data); err != nil { - if test.wantErr != (err != nil) { - t.Errorf("Sbom() error = %v, wantErr = %v", err, test.wantErr) - } + t.Errorf("SBOM() error = %v", err) + return } }pkg/storages/e2e_test.go (1)
23-29
: Consider enhancing error handling and progress tracking.The new SBOM ingestion using
LoadDataFromPath
is a good improvement, but consider these enhancements:
- Add batch processing with progress tracking for large datasets
- Consider collecting errors instead of failing fast, to report all issues at once
Here's a suggested improvement:
result, err := ingest.LoadDataFromPath(redisStorage, sbomPath) assert.NoError(t, err) +var errors []error for _, data := range result { if err := ingest.SBOM(redisStorage, data.Data); err != nil { - t.Fatalf("Failed to load SBOM from data: %v", err) + errors = append(errors, fmt.Errorf("failed to load SBOM: %w", err)) } } +if len(errors) > 0 { + t.Fatalf("Failed to load %d SBOMs:\n%v", len(errors), errors) +}api/v1/service_test.go (2)
75-80
: Consider enhancing error handling for batch processing.The new approach using
LoadDataFromPath
and iterating over results is a good improvement for structured data handling. However, consider collecting errors from all iterations instead of stopping at the first error to provide more comprehensive test feedback.Consider this improvement:
result, err := ingest.LoadDataFromPath(s.storage, "../../testdata/osv-sboms/google_agi.sbom.json") require.NoError(t, err) +var errors []error for _, data := range result { - err = ingest.SBOM(s.storage, data.Data) - require.NoError(t, err) + if err := ingest.SBOM(s.storage, data.Data); err != nil { + errors = append(errors, err) + } } +require.Empty(t, errors, "encountered errors while processing SBOM data")
84-85
: Consider validating vulnerability data before processing.The simplified approach to vulnerability ingestion is good, but consider adding validation of the file content before processing to make the test more robust.
Consider this improvement:
content, err := os.ReadFile("../../testdata/osv-vulns/GHSA-cx63-2mw6-8hw5.json") require.NoError(t, err) +require.NotEmpty(t, content, "vulnerability file should not be empty") +require.Contains(t, string(content), "GHSA-cx63-2mw6-8hw5", "vulnerability file should contain expected data") err = ingest.Vulnerabilities(s.storage, content) require.NoError(t, err)cmd/ingest/scorecard/scorecard.go (4)
16-17
: Consider removing or documenting the emptyAddFlags
methodThe
AddFlags
method is currently empty but is called in theNew
function at line 51 (o.AddFlags(cmd)
). If there are no flags to add, you might consider removing both theAddFlags
method and its invocation. If it's intended as a placeholder for future options, please add a comment to clarify its purpose.Also applies to: 51-51
21-22
: Remove or update the outdated commentThe comment
// Define the progress callback function
appears to be outdated or irrelevant since there is no progress callback function defined in the code. Consider removing this comment or updating it to accurately reflect the current implementation.
36-36
: Align the success message with the ingested dataThe success message
fmt.Println("\nSBOM ingested successfully")
may be misleading if the command primarily ingests Scorecard data. To avoid confusion, update the message to accurately reflect the data that has been ingested.
56-58
: Handle ANSI escape codes for cross-platform compatibilityThe
printProgress
function uses ANSI escape codes to colorize output. While this enhances readability on Unix-like terminals, it may not display correctly in environments like Windows Command Prompt or when output is redirected to a file or logging system.Consider using a cross-platform library, such as
github.com/fatih/color
, which handles colorized output more gracefully. Alternatively, detect if the output supports ANSI codes and disable colorization when it does not.cmd/ingest/osv/osv.go (2)
29-30
: Optimize progress output by utilizing the unusedprintProgress
functionThe
printProgress
function defined on lines 50-52 is currently unused. Instead of formatting the progress message directly within the loop, you can use this helper function to improve code readability and maintainability.Apply this diff to utilize the
printProgress
function:for index, data := range result { if err := ingest.Vulnerabilities(o.storage, data.Data); err != nil { fmt.Printf("Error ingesting vulnerabilities from %s: %v\n", data.Path, err) continue } - fmt.Printf("\r\033[K\033[1;36mGraphed %d vulnerabilities\033[0m | \033[1;34mCurrent: %s\033[0m", index+1, tools.TruncateString(data.Path, 50)) + fmt.Printf("\r\033[K%s", printProgress(index+1, data.Path)) }
40-41
: Update command usage and description to reflect functionality accuratelyThe command's
Use
is set to"graph"
, and theShort
description mentions "Graph vuln data". Since the command specifically handles OSV vulnerability data, consider making the command usage and description more specific for clarity.Consider updating as follows:
-Use: "graph", -Short: "Graph vuln data into the graph, and connect it to existing library nodes", +Use: "osv-graph", +Short: "Graph OSV vulnerability data into the graph and connect it to existing library nodes",pkg/tools/ingest/sbom.go (1)
22-22
: Consistent error messaging improves clarityEnsure all error messages accurately reflect that the function processes data rather than files. This will prevent confusion for users and maintain consistency throughout the codebase.
pkg/tools/ingest/loader.go (1)
67-67
: Enhance error aggregation for clearer reportingAggregating errors using
%v
may not format them clearly. To improve readability, consider usingerrors.Join
(Go 1.20+) to combine errors or iterate over them to format each error individually.Example using
errors.Join
:if len(errors) > 0 { return nil, fmt.Errorf("errors occurred during data ingestion: %w", errors.Join(errors...)) }pkg/tools/ingest/scorecard.go (1)
45-46
: Align function comment with function nameThe function is named
Scorecards
, but the comment refers toScorecard
. Consider updating the comment to match the function name for consistency.pkg/tools/ingest/vuln.go (5)
77-81
: Unused Type Definition:PairedVuln
The type
PairedVuln
is declared but not used anywhere in the code. Unused code can lead to confusion and should be removed to improve code readability and maintainability.Consider removing the unused type:
-type PairedVuln struct { - ID string - Vuln Vulnerability -}
90-90
: Typo in Error MessageThe error message refers to "vulnerabilityType data," but the correct term should be "vulnerability data." This will improve clarity when debugging.
Apply this fix:
-return fmt.Errorf("failed to unmarshal vulnerabilityType data: %w", err) +return fmt.Errorf("failed to unmarshal vulnerability data: %w", err)
101-103
: Redundant Error CheckThe
errors
slice is checked for length, but no errors are appended to it before this point. This condition will always be false, making the check redundant.Consider removing the unnecessary error check:
-if len(errors) > 0 { - return fmt.Errorf("errors occurred during vulnerabilities ingestion: %v", errors) -}
Line range hint
203-210
: Constants Should Be Defined at Package LevelDefining constants inside the
compareVersions
function limits their scope unnecessarily and can affect readability.Move the constants outside the function:
-const ( - EventTypeSEMVER = "SEMVER" - EventTypeECOSYSTEM = "ECOSYSTEM" - EventTypeGIT = "GIT" -)Place them at the package level:
const ( EventTypeSEMVER = "SEMVER" EventTypeECOSYSTEM = "ECOSYSTEM" EventTypeGIT = "GIT" )
Line range hint
232-236
: Implement Ecosystem-Specific Version ComparisonThe
compareEcosystemVersions
function contains a placeholder implementation. This may lead to incorrect version comparisons for different ecosystems.Would you like assistance in implementing the ecosystem-specific version comparison logic? I can help draft the necessary code based on the target ecosystems.
🛑 Comments failed to post (12)
cmd/ingest/sbom/sbom.go (1)
26-31: 🛠️ Refactor suggestion
Improve progress display and error handling in ingestion loop
The current implementation has several areas for improvement:
- Progress display doesn't show total count
- Error messages don't identify which SBOM failed
- No batch processing for better performance
Consider this improved implementation:
+ total := len(result) for index, data := range result { if err := ingest.SBOM(o.storage, data.Data); err != nil { - return fmt.Errorf("failed to ingest SBOM: %w", err) + return fmt.Errorf("failed to ingest SBOM %q (%d/%d): %w", data.Path, index+1, total, err) } - fmt.Printf("\r\033[1;36mIngested %d SBOMs\033[0m | \033[1;34m%s\033[0m", index+1, tools.TruncateString(data.Path, 50)) + fmt.Printf("\r\033[1;36mIngested %d/%d SBOMs\033[0m | \033[1;34m%s\033[0m", index+1, total, tools.TruncateString(data.Path, 50)) } + fmt.Println()Consider implementing batch processing for better performance:
const batchSize = 100 // Process SBOMs in batches for i := 0; i < len(result); i += batchSize { end := i + batchSize if end > len(result) { end = len(result) } batch := result[i:end] // Process batch concurrently }pkg/storages/e2e_test.go (1)
30-36: 🛠️ Refactor suggestion
Consider extracting a helper function to reduce code duplication.
The vulnerability ingestion code is nearly identical to the SBOM ingestion code. Consider extracting a helper function to handle both cases.
Here's a suggested refactor:
+func ingestData(t *testing.T, storage *RedisStorage, path string, processor func(*RedisStorage, []byte) error, dataType string) { + result, err := ingest.LoadDataFromPath(storage, path) + assert.NoError(t, err) + var errors []error + for _, data := range result { + if err := processor(storage, data.Data); err != nil { + errors = append(errors, fmt.Errorf("failed to load %s: %w", dataType, err)) + } + } + if len(errors) > 0 { + t.Fatalf("Failed to load %d %s:\n%v", len(errors), dataType, errors) + } +} // Then use it like this: -result, err := ingest.LoadDataFromPath(redisStorage, sbomPath) -assert.NoError(t, err) -for _, data := range result { - if err := ingest.SBOM(redisStorage, data.Data); err != nil { - t.Fatalf("Failed to load SBOM from data: %v", err) - } -} +ingestData(t, redisStorage, sbomPath, ingest.SBOM, "SBOM") -result, err = ingest.LoadDataFromPath(redisStorage, vulnsPath) -assert.NoError(t, err) -for _, data := range result { - if err := ingest.Vulnerabilities(redisStorage, data.Data); err != nil { - t.Fatalf("Failed to load vulnerabilities from data: %v", err) - } -} +ingestData(t, redisStorage, vulnsPath, ingest.Vulnerabilities, "vulnerability")Committable suggestion skipped: line range outside the PR's diff.
cmd/ingest/scorecard/scorecard.go (1)
29-34:
⚠️ Potential issueClarify the data being ingested: SBOMs vs. Scorecards
In the loop starting at line 29, the code calls
ingest.Scorecards(o.storage, data.Data)
, which suggests it's ingesting Scorecard data. However, the command usage is"sbom [sbomPath]"
, and earlier code references SBOM ingestion. This inconsistency could cause confusion.Please clarify whether the command is intended to ingest SBOMs, Scorecards, or both. If it processes Scorecards derived from SBOMs, consider updating function names, variable names, and user messages to reflect this. Ensuring consistency will help users understand the command's purpose.
cmd/ingest/osv/osv.go (3)
18-19:
⚠️ Potential issueValidate command-line arguments to prevent potential runtime errors
The
args
slice is accessed without checking its length, which could lead to anindex out of range
panic if no arguments are provided.Add a check to ensure that the required argument is provided:
func (o *options) Run(_ *cobra.Command, args []string) error { + if len(args) == 0 { + return fmt.Errorf("vulnerabilities directory path is required") + } vulnsDir := args[0]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.func (o *options) Run(_ *cobra.Command, args []string) error { if len(args) == 0 { return fmt.Errorf("vulnerabilities directory path is required") } vulnsDir := args[0]
50-52: 🛠️ Refactor suggestion
Remove or repurpose the unused
printProgress
functionThe
printProgress
function is defined but not used in the current code. Unused code can lead to confusion and should be removed or repurposed to enhance the program.If you decide not to use the
printProgress
function, remove it to clean up the codebase:-func printProgress(count int, path string) string { - return fmt.Sprintf("\033[1;36mGraphed %d vulnerabilities\033[0m | \033[1;34mCurrent: %s\033[0m", count, tools.TruncateString(path, 50)) -}Alternatively, if you plan to use it (as suggested in the previous comment), ensure it's correctly integrated.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
🧰 Tools
🪛 golangci-lint
50-50: func
printProgress
is unused(unused)
21-29:
⚠️ Potential issueHandle errors during vulnerability ingestion more robustly
Currently, if an error occurs while ingesting vulnerabilities in the loop, the function returns immediately. This could halt the entire ingestion process upon encountering a single faulty data entry, potentially missing out on ingesting subsequent valid data.
Consider modifying the loop to continue processing subsequent data even if an error occurs for a particular entry. You can collect errors and report them after the loop or log them accordingly. Here's a possible refactor:
for index, data := range result { if err := ingest.Vulnerabilities(o.storage, data.Data); err != nil { - return fmt.Errorf("failed to graph vuln data: %w", err) + fmt.Printf("Error ingesting vulnerabilities from %s: %v\n", data.Path, err) + continue } fmt.Printf("\r\033[K\033[1;36mGraphed %d vulnerabilities\033[0m | \033[1;34mCurrent: %s\033[0m", index+1, tools.TruncateString(data.Path, 50)) }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.result, err := ingest.LoadDataFromPath(o.storage, vulnsDir) if err != nil { return fmt.Errorf("failed to load vulnerabilities: %w", err) } for index, data := range result { if err := ingest.Vulnerabilities(o.storage, data.Data); err != nil { fmt.Printf("Error ingesting vulnerabilities from %s: %v\n", data.Path, err) continue } fmt.Printf("\r\033[K\033[1;36mGraphed %d vulnerabilities\033[0m | \033[1;34mCurrent: %s\033[0m", index + 1, tools.TruncateString(data.Path, 50))
pkg/tools/ingest/loader.go (3)
86-98:
⚠️ Potential issueSanitize file paths to prevent directory traversal attacks
When extracting files from a zip archive, malicious file names could lead to files being written outside the intended directory (Zip Slip vulnerability). To safeguard against this, validate the extracted file paths.
Implement path sanitization before extracting files:
extractedFilePath := filepath.Join(tempDir, f.Name) // Prevent Zip Slip by ensuring the path is within tempDir if !strings.HasPrefix(filepath.Clean(extractedFilePath), filepath.Clean(tempDir)+string(os.PathSeparator)) { return nil, fmt.Errorf("invalid file path in zip: %s", f.Name) }Make sure to import the
"strings"
package.
55-55:
⚠️ Potential issueAppend results instead of overwriting to prevent data loss
In the
.zip
case, you are assigningresult = subResult
, which overwrites any existing data inresult
. This can lead to data loss when processing multiple files. To ensure all data is captured, append thesubResult
toresult
.Apply this diff to append the results:
- result = subResult + result = append(result, subResult...)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.result = append(result, subResult...)
89-101: 🛠️ Refactor suggestion
Avoid deferring closures in loops to prevent resource exhaustion
Using
defer
inside loops delays the closure of resources until the function exits, which can exhaust system resources when processing many files. Close the resources immediately after use.Refactor the code to close files explicitly:
rc, err := f.Open() if err != nil { return nil, fmt.Errorf("failed to open file %s in zip: %w", f.Name, err) } - defer rc.Close() extractedFilePath := filepath.Join(tempDir, f.Name) // ... (sanitization code) outFile, err := os.Create(extractedFilePath) if err != nil { return nil, fmt.Errorf("failed to create file %s: %w", extractedFilePath, err) } - defer outFile.Close() _, err = io.Copy(outFile, rc) if err != nil { return nil, fmt.Errorf("failed to copy file %s: %w", extractedFilePath, err) } + rc.Close() + outFile.Close()Alternatively, encapsulate the processing logic in a separate function where defers are acceptable.
Committable suggestion skipped: line range outside the PR's diff.
pkg/tools/ingest/scorecard.go (1)
82-83:
⚠️ Potential issuePotential mismatch in key usage when accessing
scorecardResults
You're using
purl.Name
as the key to accessscorecardResults
, but when populating the map, you usedresult.PURL
as the key. Ensure thatpurl.Name
corresponds correctly toresult.PURL
, or consider usingpurl.String()
if necessary, to avoid missing or incorrect data retrieval.pkg/tools/ingest/vuln.go (2)
130-134:
⚠️ Potential issueAccumulated Errors Not Handled
Errors appended to the
errors
slice within the loop are not handled after the loop concludes. This may cause silent failures and make debugging difficult.Consider returning the accumulated errors at the end of the function:
+if len(errors) > 0 { + return fmt.Errorf("errors occurred during vulnerabilities ingestion: %v", errors) +}Committable suggestion skipped: line range outside the PR's diff.
115-117:
⚠️ Potential issueUndefined Variable
pkg
The variable
pkg
used instrings.HasPrefix(node.Name, pkg)
is not defined within the scope of theVulnerabilities
function. This will result in a compilation error.Define the
pkg
variable or adjust the condition:-if node.Type == tools.LibraryType && strings.HasPrefix(node.Name, pkg) { +if node.Type == tools.LibraryType {If
pkg
is intended to be a package prefix, ensure it is defined correctly.Committable suggestion skipped: line range outside the PR's diff.
Codecov ReportAttention: Patch coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🧹 Outside diff range and nitpick comments (18)
cmd/ingest/sbom/sbom.go (3)
21-22
: Consider enhancing error handling and validation.The error handling could be more robust:
- Validate that storage is properly initialized
- Add context to the error for better debugging
- Consider using context.Context for potential timeout/cancellation
func (o *options) Run(_ *cobra.Command, args []string) error { sbomPath := args[0] + if o.storage == nil { + return fmt.Errorf("storage is not initialized") + } - result, err := ingest.LoadDataFromPath(o.storage, sbomPath) + result, err := ingest.LoadDataFromPath(o.storage, sbomPath) if err != nil { - return fmt.Errorf("failed to ingest SBOM: %w", err) + return fmt.Errorf("failed to load SBOM from %s: %w", sbomPath, err) }
26-31
: Consider implementing batch processing for better performance.The current implementation processes SBOMs one at a time. For large datasets, batch processing could improve performance.
Consider implementing a batch processor that:
- Processes multiple SBOMs concurrently
- Uses worker pools for parallel processing
- Implements backpressure mechanisms
30-30
: Improve progress display robustness.The current progress display uses raw ANSI codes and doesn't handle terminal width limitations.
Consider using a progress bar library like
github.com/schollz/progressbar/v3
which would handle:
- Terminal width adaptation
- Clean ANSI handling
- Better progress visualization
- fmt.Printf("\r\033[1;36mIngested %d SBOMs\033[0m | \033[1;34m%s\033[0m", index+1, tools.TruncateString(data.Path, 50)) + // Initialize progress bar in the beginning of the loop + // Update progress bar herepkg/tools/ingest/vuln_test.go (1)
13-13
: Improve error message clarity and consider using test helpers for paths.The error message could be more descriptive to help with debugging. Also, consider extracting the test data paths to constants or test helpers.
- sbomDir := "../../../testdata/osv-sboms" + const testDataPath = "../../../testdata" + sbomDir := filepath.Join(testDataPath, "osv-sboms") - t.Fatalf("Expected SBOM to be ingested, got %d", len(result)) + t.Fatalf("Expected at least one SBOM to be ingested from %s, got %d", sbomDir, len(result))Also applies to: 15-21
cmd/ingest/scorecard/scorecard.go (4)
12-14
: Good use of dependency injection pattern.The options struct cleanly encapsulates the storage dependency, making the code more testable and maintainable.
Consider adding validation methods if more fields are added to the options struct in the future.
16-16
: Document the empty AddFlags method.Add a comment explaining that this method is intentionally empty and may be used for future flag additions.
+// AddFlags adds command-specific flags. Currently no flags are required. func (o *options) AddFlags(_ *cobra.Command) {}
18-38
: Enhance error handling and input validation.While the implementation is functional, consider these improvements:
- Validate the SBOM path exists before processing
- Add more context to error messages
- Consider adding a progress bar for large SBOMs
Here's a suggested enhancement:
func (o *options) Run(_ *cobra.Command, args []string) error { sbomPath := args[0] + + // Validate SBOM path + if _, err := os.Stat(sbomPath); err != nil { + return fmt.Errorf("invalid SBOM path %q: %w", sbomPath, err) + } // Ingest SBOM result, err := ingest.LoadDataFromPath(o.storage, sbomPath) if err != nil { - return fmt.Errorf("failed to ingest SBOM: %w", err) + return fmt.Errorf("failed to ingest SBOM from %q: %w", sbomPath, err) } for index, data := range result { if err := ingest.Scorecards(o.storage, data.Data); err != nil { - return fmt.Errorf("failed to ingest Scorecard: %w", err) + return fmt.Errorf("failed to ingest Scorecard for %q: %w", data.Path, err) } fmt.Printf("\r\033[K%s", printProgress(index+1, data.Path)) } fmt.Println("\nSBOM ingested successfully") return nil }
56-58
: LGTM: Good progress feedback implementation.The progress printing function provides clear, color-coded feedback with appropriate path truncation.
Consider extracting the ANSI color codes into constants for better maintainability:
+const ( + colorCyan = "\033[1;36m" + colorBlue = "\033[1;34m" + colorReset = "\033[0m" +) + func printProgress(count int, path string) string { - return fmt.Sprintf("\033[1;36mIngested %d SBOMs\033[0m | \033[1;34m%s\033[0m", count, tools.TruncateString(path, 50)) + return fmt.Sprintf("%sIngested %d SBOMs%s | %s%s%s", + colorCyan, count, colorReset, + colorBlue, tools.TruncateString(path, 50), colorReset) }cmd/ingest/osv/osv.go (2)
16-16
: Consider adding configuration flags.The empty AddFlags method suggests no configurable options. Consider adding flags for customization (e.g., batch size, concurrency limits, output format).
21-24
: Enhance error message with context.The error message could be more specific by including the directory path.
- return fmt.Errorf("failed to load vulnerabilities: %w", err) + return fmt.Errorf("failed to load vulnerabilities from %s: %w", vulnsDir, err)pkg/tools/ingest/sbom.go (3)
13-14
: Add Test Coverage for Empty Data ValidationThe empty data validation lacks test coverage according to codecov report.
Would you like me to help generate test cases for this validation check?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 14-14: pkg/tools/ingest/sbom.go#L14
Added line #L14 was not covered by tests
20-22
: Enhance Error Message ContextThe error message could be more descriptive to aid in debugging.
Consider this improvement:
- return fmt.Errorf("failed to parse SBOM file: %w", err) + return fmt.Errorf("failed to parse SBOM data (size: %d bytes): %w", len(data), err)🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 22-22: pkg/tools/ingest/sbom.go#L22
Added line #L22 was not covered by tests
Line range hint
34-67
: Consider Refactoring Error Handling LogicThe nested error handling blocks could be simplified to improve readability and maintainability.
Consider extracting the edge processing logic into a separate helper function:
func processEdge(storage graph.Storage, edge *Edge, nameToId map[string]uint32) error { fromNode, err := storage.GetNode(nameToId[edge.From]) if err != nil { return fmt.Errorf("failed to get from node %s: %w", edge.From, err) } for _, to := range edge.To { if err := processEdgeTarget(storage, fromNode, to, nameToId); err != nil { return err } } return nil } func processEdgeTarget(storage graph.Storage, fromNode *Node, to string, nameToId map[string]uint32) error { toNode, err := storage.GetNode(nameToId[to]) if err != nil { return fmt.Errorf("failed to get to node %s: %w", to, err) } if fromNode.ID != toNode.ID { if err := fromNode.SetDependency(storage, toNode); err != nil { return fmt.Errorf("failed to add edge %s -> %s: %w", fromNode.ID, to, err) } } return nil }🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 14-14: pkg/tools/ingest/sbom.go#L14
Added line #L14 was not covered by tests
[warning] 22-22: pkg/tools/ingest/sbom.go#L22
Added line #L22 was not covered by testspkg/tools/ingest/loader.go (1)
21-24
: Consider using errors.Join for error aggregationThe function uses a slice to collect errors, but Go 1.20's
errors.Join
would be more idiomatic for combining multiple errors.pkg/tools/ingest/scorecard.go (2)
57-64
: Consider adding debug logging for skipped results.While filtering out unsuccessful results is correct, adding debug logging would help with troubleshooting ingestion issues.
for _, result := range results { if !result.Success { + log.Debugf("Skipping unsuccessful scorecard result for PURL: %s, error: %s", result.PURL, result.Error) continue } scorecardResults[result.PURL] = append(scorecardResults[result.PURL], result) }
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 57-61: pkg/tools/ingest/scorecard.go#L57-L61
Added lines #L57 - L61 were not covered by tests
[warning] 63-63: pkg/tools/ingest/scorecard.go#L63
Added line #L63 was not covered by tests
83-84
: Consider potential concurrency and performance implications.A few considerations for this section:
- There's a potential race condition if the storage is modified during iteration. Consider adding a comment about thread safety requirements.
- The nested loops (nodes → scorecardData → scorecardResult) could be optimized by pre-filtering scorecard results by version.
Example optimization:
scorecardData, ok := scorecardResults[purl.Name] if !ok { continue } +// Pre-filter matching versions to avoid nested iteration +matchingResults := make([]ScorecardResult, 0) +for _, result := range scorecardData { + if result.Success { + scorecardPurl, err := PURLToPackage(result.PURL) + if err == nil && scorecardPurl.Version == purl.Version { + matchingResults = append(matchingResults, result) + } + } +} -for _, scorecardResult := range scorecardData { +for _, scorecardResult := range matchingResults { - if scorecardResult.Success { - scorecardPurl, err := PURLToPackage(scorecardResult.PURL) - if err != nil { - continue - } - if scorecardPurl.Version == purl.Version {Also applies to: 88-88
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 83-84: pkg/tools/ingest/scorecard.go#L83-L84
Added lines #L83 - L84 were not covered by testspkg/tools/ingest/vuln.go (2)
121-133
: LGTM with a minor optimization suggestion.The vulnerability lookup and processing logic is well-structured. Consider pre-marshaling the vulnerabilities when building the vulnMap to avoid repeated marshaling operations.
type PairedVuln struct { ID string - Vuln Vulnerability + Data []byte } vulnMap := map[string][]Vulnerability{} + marshaledVulnMap := map[string][]PairedVuln{} for _, affected := range vuln.Affected { - vulnMap[affected.Package.Name] = append(vulnMap[affected.Package.Name], vuln) + vulnData, err := json.Marshal(vuln) + if err != nil { + errors = append(errors, err) + continue + } + marshaledVulnMap[affected.Package.Name] = append( + marshaledVulnMap[affected.Package.Name], + PairedVuln{ID: vuln.ID, Data: vulnData}, + ) }🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 132-133: pkg/tools/ingest/vuln.go#L132-L133
Added lines #L132 - L133 were not covered by tests
Line range hint
82-143
: Add test coverage for error scenarios.Several error paths lack test coverage according to codecov:
- Empty data validation
- JSON unmarshaling errors
- Error collection during processing
Consider adding test cases for these scenarios to ensure robust error handling.
Would you like me to help generate comprehensive test cases for these scenarios?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 132-133: pkg/tools/ingest/vuln.go#L132-L133
Added lines #L132 - L133 were not covered by tests
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
gen/api/v1/service.pb.go
is excluded by!**/*.pb.go
,!**/gen/**
📒 Files selected for processing (15)
api/v1/service_test.go
(1 hunks)cmd/ingest/osv/graph/graph.go
(0 hunks)cmd/ingest/osv/load/load.go
(0 hunks)cmd/ingest/osv/osv.go
(1 hunks)cmd/ingest/sbom/sbom.go
(1 hunks)cmd/ingest/scorecard/graph/graph.go
(0 hunks)cmd/ingest/scorecard/load/load.go
(0 hunks)cmd/ingest/scorecard/scorecard.go
(1 hunks)pkg/storages/e2e_test.go
(1 hunks)pkg/tools/ingest/loader.go
(1 hunks)pkg/tools/ingest/sbom.go
(1 hunks)pkg/tools/ingest/sbom_test.go
(2 hunks)pkg/tools/ingest/scorecard.go
(3 hunks)pkg/tools/ingest/vuln.go
(3 hunks)pkg/tools/ingest/vuln_test.go
(2 hunks)
💤 Files with no reviewable changes (4)
- cmd/ingest/osv/graph/graph.go
- cmd/ingest/osv/load/load.go
- cmd/ingest/scorecard/graph/graph.go
- cmd/ingest/scorecard/load/load.go
🚧 Files skipped from review as they are similar to previous changes (3)
- api/v1/service_test.go
- pkg/storages/e2e_test.go
- pkg/tools/ingest/sbom_test.go
🧰 Additional context used
🪛 golangci-lint
cmd/ingest/osv/osv.go
50-50: func printProgress
is unused
(unused)
🪛 GitHub Check: codecov/patch
pkg/tools/ingest/loader.go
[warning] 34-34: pkg/tools/ingest/loader.go#L34
Added line #L34 was not covered by tests
[warning] 40-40: pkg/tools/ingest/loader.go#L40
Added line #L40 was not covered by tests
[warning] 50-51: pkg/tools/ingest/loader.go#L50-L51
Added lines #L50 - L51 were not covered by tests
[warning] 55-55: pkg/tools/ingest/loader.go#L55
Added line #L55 was not covered by tests
[warning] 60-60: pkg/tools/ingest/loader.go#L60
Added line #L60 was not covered by tests
[warning] 67-67: pkg/tools/ingest/loader.go#L67
Added line #L67 was not covered by tests
[warning] 73-73: pkg/tools/ingest/loader.go#L73
Added line #L73 was not covered by tests
[warning] 76-76: pkg/tools/ingest/loader.go#L76
Added line #L76 was not covered by tests
[warning] 82-82: pkg/tools/ingest/loader.go#L82
Added line #L82 was not covered by tests
[warning] 89-89: pkg/tools/ingest/loader.go#L89
Added line #L89 was not covered by tests
[warning] 96-96: pkg/tools/ingest/loader.go#L96
Added line #L96 was not covered by tests
[warning] 101-101: pkg/tools/ingest/loader.go#L101
Added line #L101 was not covered by tests
[warning] 105-105: pkg/tools/ingest/loader.go#L105
Added line #L105 was not covered by tests
pkg/tools/ingest/sbom.go
[warning] 14-14: pkg/tools/ingest/sbom.go#L14
Added line #L14 was not covered by tests
[warning] 22-22: pkg/tools/ingest/sbom.go#L22
Added line #L22 was not covered by tests
pkg/tools/ingest/scorecard.go
[warning] 47-50: pkg/tools/ingest/scorecard.go#L47-L50
Added lines #L47 - L50 were not covered by tests
[warning] 52-55: pkg/tools/ingest/scorecard.go#L52-L55
Added lines #L52 - L55 were not covered by tests
[warning] 57-61: pkg/tools/ingest/scorecard.go#L57-L61
Added lines #L57 - L61 were not covered by tests
[warning] 63-63: pkg/tools/ingest/scorecard.go#L63
Added line #L63 was not covered by tests
[warning] 83-84: pkg/tools/ingest/scorecard.go#L83-L84
Added lines #L83 - L84 were not covered by tests
[warning] 88-88: pkg/tools/ingest/scorecard.go#L88
Added line #L88 was not covered by tests
pkg/tools/ingest/vuln.go
[warning] 85-86: pkg/tools/ingest/vuln.go#L85-L86
Added lines #L85 - L86 were not covered by tests
[warning] 90-91: pkg/tools/ingest/vuln.go#L90-L91
Added lines #L90 - L91 were not covered by tests
[warning] 102-103: pkg/tools/ingest/vuln.go#L102-L103
Added lines #L102 - L103 were not covered by tests
[warning] 132-133: pkg/tools/ingest/vuln.go#L132-L133
Added lines #L132 - L133 were not covered by tests
🔇 Additional comments (10)
cmd/ingest/sbom/sbom.go (1)
21-31
: Verify all callers are updated for breaking changes.
The PR indicates this is a breaking change. Let's verify all callers are properly updated.
pkg/tools/ingest/vuln_test.go (2)
35-35
: LGTM!
Good spacing improvement for better code readability.
36-46
: 🛠️ Refactor suggestion
Consider making the vulnerability test more maintainable.
The vulnerability ingestion logic looks good, but there are a few improvements that could make the test more maintainable:
- Document or extract the expected node count increment (3) as a named constant with an explanation
- Consider making the test more resilient to changes in test data
- Use consistent path handling with the SBOM section
+ // Expected number of new nodes when ingesting test vulnerabilities
+ const expectedNewNodes = 3 // Represents: [explain what these 3 nodes represent]
+
result, err = LoadDataFromPath(storage, vulnsDir)
if err != nil {
- t.Fatalf("Failed to load vulnerabilities from directory %s: %v", vulnsDir, err)
+ t.Fatalf("Failed to load vulnerabilities from %s: %v", vulnsDir, err)
}
if len(result) == 0 {
- t.Fatalf("Expected vulnerabilities to be ingested, got %d", len(result))
+ t.Fatalf("Expected at least one vulnerability to be ingested from %s, got %d", vulnsDir, len(result))
}
- if len(keys) != numberOfNodes+3 {
- t.Fatalf("Expected number of nodes to be %d, got %d", numberOfNodes+3, len(keys))
+ if len(keys) != numberOfNodes+expectedNewNodes {
+ t.Fatalf("Expected %d new nodes after vulnerability ingestion (total: %d), got %d",
+ expectedNewNodes, numberOfNodes+expectedNewNodes, len(keys))
}
Additionally, verify that the test data is consistent with our expectations:
cmd/ingest/scorecard/scorecard.go (2)
4-8
: LGTM: Imports are appropriate for the functionality.
The new imports support the enhanced SBOM ingestion functionality and progress reporting.
40-53
: LGTM: Well-structured command configuration.
The command is properly configured with clear usage instructions and argument validation.
cmd/ingest/osv/osv.go (2)
4-14
: LGTM! Clean structure and appropriate imports.
The options struct with graph.Storage is a good pattern for command configuration.
40-41
: Verify command name change impact.
The command name has changed from "osv" to "graph". This is a breaking change that needs to be verified.
✅ Verification successful
Based on the search results, I can now provide a final response:
Command name change from "osv" to "graph" appears safe
The command is part of the osv
package but was previously not using "osv" as the command name in the CLI interface. The search results show:
- No documentation references to "osv" as a command
- No CLI usage examples mentioning the "osv" command
- The command is properly integrated under the main "ingest" command (
cmd/ingest/ingest.go
adds it viaosv.New(storage)
) - Other commands like "sbom" and "scorecard" follow similar patterns
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any references to the old command name and verify documentation is updated
# Check for references to the old command name
echo "Checking for references to 'osv' command..."
rg -l "osv\s+command|command\s+osv" --type md
# Check if README or documentation is updated
echo "Checking documentation files..."
fd -e md -x grep -l "osv" {}
Length of output: 254
Script:
#!/bin/bash
# Let's try a broader search to understand the command usage and documentation
# Search for any mentions of 'osv' in the codebase
echo "Searching for 'osv' mentions..."
rg -i "osv" -A 3 -B 3
# Search for cobra command definitions to understand command structure
echo -e "\nSearching for cobra command definitions..."
ast-grep --pattern 'cobra.Command {
$$$
}'
# Check for any CLI usage examples or help text
echo -e "\nChecking for CLI usage or help text..."
rg -i "usage:|example:|cli" --type md
Length of output: 10308
pkg/tools/ingest/sbom.go (1)
12-14
: Breaking API Change: Review Impact on Consumers
The function signature has changed significantly:
- Removed
sbomPath
parameter and progress callback - Now accepts raw byte data instead of file path
Let's check for potential impacts:
Consider:
- Adding a wrapper function with the old signature for backward compatibility
- Documenting the migration path for existing consumers
✅ Verification successful
Based on the search results, I can now generate the final response since I have sufficient information about all the SBOM function calls in the codebase.
Breaking API Change: No Migration Required
All existing callers have already been updated to use the new signature with (storage, data)
parameters. The changes are consistent across:
- Test files:
pkg/storages/e2e_test.go
,pkg/tools/ingest/vuln_test.go
,pkg/tools/ingest/sbom_test.go
- API handlers:
api/v1/service_test.go
- CLI commands:
cmd/ingest/sbom/sbom.go
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Find all calls to SBOM function to verify migration
rg -l "SBOM\(" --type go | while read -r file; do
echo "=== $file ==="
rg -A 2 "SBOM\(" "$file"
done
Length of output: 1639
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 14-14: pkg/tools/ingest/sbom.go#L14
Added line #L14 was not covered by tests
pkg/tools/ingest/vuln.go (2)
95-99
: LGTM! Good optimization.
The new approach of mapping vulnerabilities by package name improves lookup efficiency during node processing.
82-91
: Improve error messages and add test coverage.
The empty data check and JSON unmarshaling error handling are good additions, but:
- The error messages could be more descriptive (e.g., include context about the vulnerability data)
- These code paths aren't covered by tests according to codecov
Consider improving the error messages:
- return fmt.Errorf("data is empty")
+ return fmt.Errorf("vulnerability data is empty")
- return fmt.Errorf("failed to unmarshal vulnerabilityType data: %w", err)
+ return fmt.Errorf("failed to unmarshal vulnerability data (schema version %s): %w", vuln.SchemaVersion, err)
#!/bin/bash
# Check for existing test cases covering these scenarios
rg -A 5 "TestVulnerabilities.*empty|TestVulnerabilities.*invalid.*json" --type go
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 85-86: pkg/tools/ingest/vuln.go#L85-L86
Added lines #L85 - L86 were not covered by tests
[warning] 90-91: pkg/tools/ingest/vuln.go#L90-L91
Added lines #L90 - L91 were not covered by tests
- Breaking change : United the graph and load ingestions for osv and scorecard - Changed the sbom ingestion to use LoadDataFromPath for filedata fetching Signed-off-by: neilnaveen <[email protected]>
a8b3fcc
to
4f2aef8
Compare
This reverts commit df68d54.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Refactor
Tests