Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaned up ingestion #117

Merged
merged 1 commit into from
Nov 11, 2024
Merged

Cleaned up ingestion #117

merged 1 commit into from
Nov 11, 2024

Conversation

neilnaveen
Copy link
Member

@neilnaveen neilnaveen commented Nov 10, 2024

  • Breaking change : United the graph and load ingestions for osv and scorecard
  • Changed the sbom ingestion to use LoadDataFromPath for filedata fetching

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a streamlined process for loading and ingesting SBOM and vulnerability data.
    • Added a new command for SBOM ingestion with improved error handling and user feedback.
  • Bug Fixes

    • Enhanced error handling during data loading processes to provide clearer messages.
  • Refactor

    • Refactored ingestion methods to focus on structured data handling rather than counts and progress callbacks.
    • Removed deprecated commands and streamlined command usage for better clarity.
    • Updated the logic for processing vulnerabilities and scorecards to improve organization and efficiency.
  • Tests

    • Updated test cases to reflect changes in data loading and ingestion processes, improving error handling.

Copy link
Contributor

coderabbitai bot commented Nov 10, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

The pull request introduces significant changes to the ingestion and processing of Software Bill of Materials (SBOM) and vulnerability data across multiple files. Key modifications include the restructuring of command functionalities, removal of obsolete files, and updates to test cases. The LoadDataFromPath function has been enhanced to return structured data instead of counts, while the SBOM and vulnerability ingestion processes have been streamlined. The changes aim to improve error handling, simplify the data loading workflow, and ensure consistency in how data is processed and tested.

Changes

File Path Change Summary
api/v1/service_test.go Updated TestQueriesAndCache to use ingest.LoadDataFromPath for SBOM and vulnerability data.
cmd/ingest/osv/graph/graph.go Deleted file; functionality for graphing scorecard data removed.
cmd/ingest/osv/load/load.go Deleted file; functionality for loading scorecard data removed.
cmd/ingest/osv/osv.go Restructured command to focus on graphing vulnerability data; added new options struct.
cmd/ingest/sbom/sbom.go Refactored SBOM ingestion to use ingest.LoadDataFromPath; removed progress callback.
cmd/ingest/scorecard/graph/graph.go Deleted file; functionality for graphing scorecard data removed.
cmd/ingest/scorecard/load/load.go Deleted file; functionality for loading scorecard data removed.
cmd/ingest/scorecard/scorecard.go Introduced new command for SBOM ingestion; improved error handling.
pkg/storages/e2e_test.go Updated TestParseAndExecute_E2E to streamline ingestion process and error handling.
pkg/tools/ingest/loader.go Updated LoadDataFromPath to return structured data; removed progress reporting.
pkg/tools/ingest/sbom.go Refactored SBOM function to process data directly from byte slice; removed file path handling.
pkg/tools/ingest/sbom_test.go Updated TestIngestSBOM for new ingestion logic; removed invalid SBOM test case.
pkg/tools/ingest/scorecard.go Modified Scorecards function to focus on direct data handling; removed unnecessary complexity.
pkg/tools/ingest/vuln.go Updated Vulnerabilities function to process data directly; introduced PairedVuln type.
pkg/tools/ingest/vuln_test.go Removed TestVulnerabilitiesToStorage; refactored TestVulnerabilities for new ingestion logic.

Possibly related PRs

  • Cleaned up ingestion #117: The changes in this PR directly relate to the main PR as both involve modifications to the SBOM ingestion process, specifically transitioning to using ingest.LoadDataFromPath for loading data, which aligns with the updates made in the TestQueriesAndCache function in the main PR.

🐰 "In the fields where data flows,
We’ve streamlined how the SBOM grows.
With vulnerabilities in sight,
We now process them just right.
So hop along, let’s celebrate,
For clearer paths and less of fate!" 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@neilnaveen neilnaveen force-pushed the neil/cleanupingestion branch from f42dd74 to a8b3fcc Compare November 10, 2024 23:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (22)
cmd/ingest/sbom/sbom.go (2)

21-25: Enhance error handling and input validation

The error handling could be more specific, and input validation is missing.

Consider these improvements:

 func (o *options) Run(_ *cobra.Command, args []string) error {
     sbomPath := args[0]
+    if o.storage == nil {
+        return fmt.Errorf("storage is not initialized")
+    }
+    if _, err := os.Stat(sbomPath); err != nil {
+        return fmt.Errorf("invalid SBOM path %q: %w", sbomPath, err)
+    }
     result, err := ingest.LoadDataFromPath(o.storage, sbomPath)
     if err != nil {
-        return fmt.Errorf("failed to ingest SBOM: %w", err)
+        return fmt.Errorf("failed to load SBOM data from %q: %w", sbomPath, err)
     }

32-32: Enhance success message with statistics

The success message could provide more value by including statistics about the ingestion.

-    fmt.Println("\nSBOM ingested successfully")
+    fmt.Printf("\nSuccessfully ingested %d SBOMs from %s\n", len(result), tools.TruncateString(sbomPath, 50))
pkg/tools/ingest/vuln_test.go (2)

36-46: LGTM! Consider enhancing test coverage.

The vulnerability ingestion implementation follows the same clean pattern as SBOM ingestion. However, consider adding test cases for:

  • Empty/malformed vulnerability data
  • Edge cases in the vulnerability structure

Would you like me to provide example test cases for these scenarios?


Line range hint 8-57: Consider improving test isolation.

The test currently combines SBOM and vulnerability ingestion in a single test function. Consider splitting these into separate test functions for better isolation and maintenance.

This would:

  1. Make test failures more specific
  2. Allow for independent testing of each ingestion type
  3. Improve test maintainability
pkg/tools/ingest/sbom_test.go (1)

40-52: Refactor duplicate error handling logic

The error checking logic is duplicated between the LoadDataFromPath call and the SBOM processing loop. Consider refactoring to simplify the error handling.

 			result, err := LoadDataFromPath(storage, test.sbomPath)
-			if test.wantErr != (err != nil) {
-				t.Errorf("Sbom() error = %v, wantErr = %v", err, test.wantErr)
-			}
+			if err != nil {
+				if !test.wantErr {
+					t.Errorf("LoadDataFromPath() error = %v, test expects no error", err)
+				}
+				return
+			}
 
 			for _, data := range result {
 				if err := SBOM(storage, data.Data); err != nil {
-					if test.wantErr != (err != nil) {
-						t.Errorf("Sbom() error = %v, wantErr = %v", err, test.wantErr)
-					}				
+					t.Errorf("SBOM() error = %v", err)
+					return			
 				}
 			}
pkg/storages/e2e_test.go (1)

23-29: Consider enhancing error handling and progress tracking.

The new SBOM ingestion using LoadDataFromPath is a good improvement, but consider these enhancements:

  1. Add batch processing with progress tracking for large datasets
  2. Consider collecting errors instead of failing fast, to report all issues at once

Here's a suggested improvement:

 result, err := ingest.LoadDataFromPath(redisStorage, sbomPath)
 assert.NoError(t, err)
+var errors []error
 for _, data := range result {
   if err := ingest.SBOM(redisStorage, data.Data); err != nil {
-    t.Fatalf("Failed to load SBOM from data: %v", err)
+    errors = append(errors, fmt.Errorf("failed to load SBOM: %w", err))
   }
 }
+if len(errors) > 0 {
+  t.Fatalf("Failed to load %d SBOMs:\n%v", len(errors), errors)
+}
api/v1/service_test.go (2)

75-80: Consider enhancing error handling for batch processing.

The new approach using LoadDataFromPath and iterating over results is a good improvement for structured data handling. However, consider collecting errors from all iterations instead of stopping at the first error to provide more comprehensive test feedback.

Consider this improvement:

 result, err := ingest.LoadDataFromPath(s.storage, "../../testdata/osv-sboms/google_agi.sbom.json")
 require.NoError(t, err)
+var errors []error
 for _, data := range result {
-    err = ingest.SBOM(s.storage, data.Data)
-    require.NoError(t, err)
+    if err := ingest.SBOM(s.storage, data.Data); err != nil {
+        errors = append(errors, err)
+    }
 }
+require.Empty(t, errors, "encountered errors while processing SBOM data")

84-85: Consider validating vulnerability data before processing.

The simplified approach to vulnerability ingestion is good, but consider adding validation of the file content before processing to make the test more robust.

Consider this improvement:

 content, err := os.ReadFile("../../testdata/osv-vulns/GHSA-cx63-2mw6-8hw5.json")
 require.NoError(t, err)
+require.NotEmpty(t, content, "vulnerability file should not be empty")
+require.Contains(t, string(content), "GHSA-cx63-2mw6-8hw5", "vulnerability file should contain expected data")
 err = ingest.Vulnerabilities(s.storage, content)
 require.NoError(t, err)
cmd/ingest/scorecard/scorecard.go (4)

16-17: Consider removing or documenting the empty AddFlags method

The AddFlags method is currently empty but is called in the New function at line 51 (o.AddFlags(cmd)). If there are no flags to add, you might consider removing both the AddFlags method and its invocation. If it's intended as a placeholder for future options, please add a comment to clarify its purpose.

Also applies to: 51-51


21-22: Remove or update the outdated comment

The comment // Define the progress callback function appears to be outdated or irrelevant since there is no progress callback function defined in the code. Consider removing this comment or updating it to accurately reflect the current implementation.


36-36: Align the success message with the ingested data

The success message fmt.Println("\nSBOM ingested successfully") may be misleading if the command primarily ingests Scorecard data. To avoid confusion, update the message to accurately reflect the data that has been ingested.


56-58: Handle ANSI escape codes for cross-platform compatibility

The printProgress function uses ANSI escape codes to colorize output. While this enhances readability on Unix-like terminals, it may not display correctly in environments like Windows Command Prompt or when output is redirected to a file or logging system.

Consider using a cross-platform library, such as github.com/fatih/color, which handles colorized output more gracefully. Alternatively, detect if the output supports ANSI codes and disable colorization when it does not.

cmd/ingest/osv/osv.go (2)

29-30: Optimize progress output by utilizing the unused printProgress function

The printProgress function defined on lines 50-52 is currently unused. Instead of formatting the progress message directly within the loop, you can use this helper function to improve code readability and maintainability.

Apply this diff to utilize the printProgress function:

 for index, data := range result {
     if err := ingest.Vulnerabilities(o.storage, data.Data); err != nil {
         fmt.Printf("Error ingesting vulnerabilities from %s: %v\n", data.Path, err)
         continue
     }
-    fmt.Printf("\r\033[K\033[1;36mGraphed %d vulnerabilities\033[0m | \033[1;34mCurrent: %s\033[0m", index+1, tools.TruncateString(data.Path, 50))
+    fmt.Printf("\r\033[K%s", printProgress(index+1, data.Path))
 }

40-41: Update command usage and description to reflect functionality accurately

The command's Use is set to "graph", and the Short description mentions "Graph vuln data". Since the command specifically handles OSV vulnerability data, consider making the command usage and description more specific for clarity.

Consider updating as follows:

-Use:               "graph",
-Short:             "Graph vuln data into the graph, and connect it to existing library nodes",
+Use:               "osv-graph",
+Short:             "Graph OSV vulnerability data into the graph and connect it to existing library nodes",
pkg/tools/ingest/sbom.go (1)

22-22: Consistent error messaging improves clarity

Ensure all error messages accurately reflect that the function processes data rather than files. This will prevent confusion for users and maintain consistency throughout the codebase.

pkg/tools/ingest/loader.go (1)

67-67: Enhance error aggregation for clearer reporting

Aggregating errors using %v may not format them clearly. To improve readability, consider using errors.Join (Go 1.20+) to combine errors or iterate over them to format each error individually.

Example using errors.Join:

if len(errors) > 0 {
	return nil, fmt.Errorf("errors occurred during data ingestion: %w", errors.Join(errors...))
}
pkg/tools/ingest/scorecard.go (1)

45-46: Align function comment with function name

The function is named Scorecards, but the comment refers to Scorecard. Consider updating the comment to match the function name for consistency.

pkg/tools/ingest/vuln.go (5)

77-81: Unused Type Definition: PairedVuln

The type PairedVuln is declared but not used anywhere in the code. Unused code can lead to confusion and should be removed to improve code readability and maintainability.

Consider removing the unused type:

-type PairedVuln struct {
-	ID   string
-	Vuln Vulnerability
-}

90-90: Typo in Error Message

The error message refers to "vulnerabilityType data," but the correct term should be "vulnerability data." This will improve clarity when debugging.

Apply this fix:

-return fmt.Errorf("failed to unmarshal vulnerabilityType data: %w", err)
+return fmt.Errorf("failed to unmarshal vulnerability data: %w", err)

101-103: Redundant Error Check

The errors slice is checked for length, but no errors are appended to it before this point. This condition will always be false, making the check redundant.

Consider removing the unnecessary error check:

-if len(errors) > 0 {
-	return fmt.Errorf("errors occurred during vulnerabilities ingestion: %v", errors)
-}

Line range hint 203-210: Constants Should Be Defined at Package Level

Defining constants inside the compareVersions function limits their scope unnecessarily and can affect readability.

Move the constants outside the function:

-const (
-	EventTypeSEMVER    = "SEMVER"
-	EventTypeECOSYSTEM = "ECOSYSTEM"
-	EventTypeGIT       = "GIT"
-)

Place them at the package level:

const (
	EventTypeSEMVER    = "SEMVER"
	EventTypeECOSYSTEM = "ECOSYSTEM"
	EventTypeGIT       = "GIT"
)

Line range hint 232-236: Implement Ecosystem-Specific Version Comparison

The compareEcosystemVersions function contains a placeholder implementation. This may lead to incorrect version comparisons for different ecosystems.

Would you like assistance in implementing the ecosystem-specific version comparison logic? I can help draft the necessary code based on the target ecosystems.

🛑 Comments failed to post (12)
cmd/ingest/sbom/sbom.go (1)

26-31: 🛠️ Refactor suggestion

Improve progress display and error handling in ingestion loop

The current implementation has several areas for improvement:

  1. Progress display doesn't show total count
  2. Error messages don't identify which SBOM failed
  3. No batch processing for better performance

Consider this improved implementation:

+    total := len(result)
     for index, data := range result {
         if err := ingest.SBOM(o.storage, data.Data); err != nil {
-            return fmt.Errorf("failed to ingest SBOM: %w", err)
+            return fmt.Errorf("failed to ingest SBOM %q (%d/%d): %w", data.Path, index+1, total, err)
         }
-        fmt.Printf("\r\033[1;36mIngested %d SBOMs\033[0m | \033[1;34m%s\033[0m", index+1, tools.TruncateString(data.Path, 50))
+        fmt.Printf("\r\033[1;36mIngested %d/%d SBOMs\033[0m | \033[1;34m%s\033[0m", index+1, total, tools.TruncateString(data.Path, 50))
     }
+    fmt.Println()

Consider implementing batch processing for better performance:

const batchSize = 100
// Process SBOMs in batches
for i := 0; i < len(result); i += batchSize {
    end := i + batchSize
    if end > len(result) {
        end = len(result)
    }
    batch := result[i:end]
    // Process batch concurrently
}
pkg/storages/e2e_test.go (1)

30-36: 🛠️ Refactor suggestion

Consider extracting a helper function to reduce code duplication.

The vulnerability ingestion code is nearly identical to the SBOM ingestion code. Consider extracting a helper function to handle both cases.

Here's a suggested refactor:

+func ingestData(t *testing.T, storage *RedisStorage, path string, processor func(*RedisStorage, []byte) error, dataType string) {
+  result, err := ingest.LoadDataFromPath(storage, path)
+  assert.NoError(t, err)
+  var errors []error
+  for _, data := range result {
+    if err := processor(storage, data.Data); err != nil {
+      errors = append(errors, fmt.Errorf("failed to load %s: %w", dataType, err))
+    }
+  }
+  if len(errors) > 0 {
+    t.Fatalf("Failed to load %d %s:\n%v", len(errors), dataType, errors)
+  }
+}

// Then use it like this:
-result, err := ingest.LoadDataFromPath(redisStorage, sbomPath)
-assert.NoError(t, err)
-for _, data := range result {
-  if err := ingest.SBOM(redisStorage, data.Data); err != nil {
-    t.Fatalf("Failed to load SBOM from data: %v", err)
-  }
-}
+ingestData(t, redisStorage, sbomPath, ingest.SBOM, "SBOM")

-result, err = ingest.LoadDataFromPath(redisStorage, vulnsPath)
-assert.NoError(t, err)
-for _, data := range result {
-  if err := ingest.Vulnerabilities(redisStorage, data.Data); err != nil {
-    t.Fatalf("Failed to load vulnerabilities from data: %v", err)
-  }
-}
+ingestData(t, redisStorage, vulnsPath, ingest.Vulnerabilities, "vulnerability")

Committable suggestion skipped: line range outside the PR's diff.

cmd/ingest/scorecard/scorecard.go (1)

29-34: ⚠️ Potential issue

Clarify the data being ingested: SBOMs vs. Scorecards

In the loop starting at line 29, the code calls ingest.Scorecards(o.storage, data.Data), which suggests it's ingesting Scorecard data. However, the command usage is "sbom [sbomPath]", and earlier code references SBOM ingestion. This inconsistency could cause confusion.

Please clarify whether the command is intended to ingest SBOMs, Scorecards, or both. If it processes Scorecards derived from SBOMs, consider updating function names, variable names, and user messages to reflect this. Ensuring consistency will help users understand the command's purpose.

cmd/ingest/osv/osv.go (3)

18-19: ⚠️ Potential issue

Validate command-line arguments to prevent potential runtime errors

The args slice is accessed without checking its length, which could lead to an index out of range panic if no arguments are provided.

Add a check to ensure that the required argument is provided:

func (o *options) Run(_ *cobra.Command, args []string) error {
+    if len(args) == 0 {
+        return fmt.Errorf("vulnerabilities directory path is required")
+    }
     vulnsDir := args[0]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

func (o *options) Run(_ *cobra.Command, args []string) error {
    if len(args) == 0 {
        return fmt.Errorf("vulnerabilities directory path is required")
    }
	vulnsDir := args[0]

50-52: 🛠️ Refactor suggestion

Remove or repurpose the unused printProgress function

The printProgress function is defined but not used in the current code. Unused code can lead to confusion and should be removed or repurposed to enhance the program.

If you decide not to use the printProgress function, remove it to clean up the codebase:

-func printProgress(count int, path string) string {
-	return fmt.Sprintf("\033[1;36mGraphed %d vulnerabilities\033[0m | \033[1;34mCurrent: %s\033[0m", count, tools.TruncateString(path, 50))
-}

Alternatively, if you plan to use it (as suggested in the previous comment), ensure it's correctly integrated.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.


🧰 Tools
🪛 golangci-lint

50-50: func printProgress is unused

(unused)


21-29: ⚠️ Potential issue

Handle errors during vulnerability ingestion more robustly

Currently, if an error occurs while ingesting vulnerabilities in the loop, the function returns immediately. This could halt the entire ingestion process upon encountering a single faulty data entry, potentially missing out on ingesting subsequent valid data.

Consider modifying the loop to continue processing subsequent data even if an error occurs for a particular entry. You can collect errors and report them after the loop or log them accordingly. Here's a possible refactor:

 for index, data := range result {
     if err := ingest.Vulnerabilities(o.storage, data.Data); err != nil {
-        return fmt.Errorf("failed to graph vuln data: %w", err)
+        fmt.Printf("Error ingesting vulnerabilities from %s: %v\n", data.Path, err)
+        continue
     }
     fmt.Printf("\r\033[K\033[1;36mGraphed %d vulnerabilities\033[0m | \033[1;34mCurrent: %s\033[0m", index+1, tools.TruncateString(data.Path, 50))
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	result, err := ingest.LoadDataFromPath(o.storage, vulnsDir)
	if err != nil {
		return fmt.Errorf("failed to load vulnerabilities: %w", err)
	}
	for index, data := range result {
		if err := ingest.Vulnerabilities(o.storage, data.Data); err != nil {
			fmt.Printf("Error ingesting vulnerabilities from %s: %v\n", data.Path, err)
			continue
		}
		fmt.Printf("\r\033[K\033[1;36mGraphed %d vulnerabilities\033[0m | \033[1;34mCurrent: %s\033[0m", index + 1, tools.TruncateString(data.Path, 50))
pkg/tools/ingest/loader.go (3)

86-98: ⚠️ Potential issue

Sanitize file paths to prevent directory traversal attacks

When extracting files from a zip archive, malicious file names could lead to files being written outside the intended directory (Zip Slip vulnerability). To safeguard against this, validate the extracted file paths.

Implement path sanitization before extracting files:

extractedFilePath := filepath.Join(tempDir, f.Name)

// Prevent Zip Slip by ensuring the path is within tempDir
if !strings.HasPrefix(filepath.Clean(extractedFilePath), filepath.Clean(tempDir)+string(os.PathSeparator)) {
	return nil, fmt.Errorf("invalid file path in zip: %s", f.Name)
}

Make sure to import the "strings" package.


55-55: ⚠️ Potential issue

Append results instead of overwriting to prevent data loss

In the .zip case, you are assigning result = subResult, which overwrites any existing data in result. This can lead to data loss when processing multiple files. To ensure all data is captured, append the subResult to result.

Apply this diff to append the results:

-				result = subResult
+				result = append(result, subResult...)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

				result = append(result, subResult...)

89-101: 🛠️ Refactor suggestion

Avoid deferring closures in loops to prevent resource exhaustion

Using defer inside loops delays the closure of resources until the function exits, which can exhaust system resources when processing many files. Close the resources immediately after use.

Refactor the code to close files explicitly:

		rc, err := f.Open()
		if err != nil {
			return nil, fmt.Errorf("failed to open file %s in zip: %w", f.Name, err)
		}
-		defer rc.Close()

		extractedFilePath := filepath.Join(tempDir, f.Name)
		// ... (sanitization code)

		outFile, err := os.Create(extractedFilePath)
		if err != nil {
			return nil, fmt.Errorf("failed to create file %s: %w", extractedFilePath, err)
		}
-		defer outFile.Close()

		_, err = io.Copy(outFile, rc)
		if err != nil {
			return nil, fmt.Errorf("failed to copy file %s: %w", extractedFilePath, err)
		}

+		rc.Close()
+		outFile.Close()

Alternatively, encapsulate the processing logic in a separate function where defers are acceptable.

Committable suggestion skipped: line range outside the PR's diff.

pkg/tools/ingest/scorecard.go (1)

82-83: ⚠️ Potential issue

Potential mismatch in key usage when accessing scorecardResults

You're using purl.Name as the key to access scorecardResults, but when populating the map, you used result.PURL as the key. Ensure that purl.Name corresponds correctly to result.PURL, or consider using purl.String() if necessary, to avoid missing or incorrect data retrieval.

pkg/tools/ingest/vuln.go (2)

130-134: ⚠️ Potential issue

Accumulated Errors Not Handled

Errors appended to the errors slice within the loop are not handled after the loop concludes. This may cause silent failures and make debugging difficult.

Consider returning the accumulated errors at the end of the function:

+if len(errors) > 0 {
+	return fmt.Errorf("errors occurred during vulnerabilities ingestion: %v", errors)
+}

Committable suggestion skipped: line range outside the PR's diff.


115-117: ⚠️ Potential issue

Undefined Variable pkg

The variable pkg used in strings.HasPrefix(node.Name, pkg) is not defined within the scope of the Vulnerabilities function. This will result in a compilation error.

Define the pkg variable or adjust the condition:

-if node.Type == tools.LibraryType && strings.HasPrefix(node.Name, pkg) {
+if node.Type == tools.LibraryType {

If pkg is intended to be a package prefix, ensure it is defined correctly.

Committable suggestion skipped: line range outside the PR's diff.

Copy link

codecov bot commented Nov 10, 2024

Codecov Report

Attention: Patch coverage is 36.36364% with 49 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/tools/ingest/loader.go 39.28% 14 Missing and 3 partials ⚠️
pkg/tools/ingest/scorecard.go 0.00% 17 Missing ⚠️
pkg/tools/ingest/vuln.go 55.55% 8 Missing and 4 partials ⚠️
pkg/tools/ingest/sbom.go 40.00% 2 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
pkg/tools/ingest/sbom.go 47.82% <40.00%> (-11.44%) ⬇️
pkg/tools/ingest/vuln.go 63.84% <55.55%> (+4.66%) ⬆️
pkg/tools/ingest/loader.go 29.33% <39.28%> (+29.33%) ⬆️
pkg/tools/ingest/scorecard.go 0.00% <0.00%> (ø)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Outside diff range and nitpick comments (18)
cmd/ingest/sbom/sbom.go (3)

21-22: Consider enhancing error handling and validation.

The error handling could be more robust:

  1. Validate that storage is properly initialized
  2. Add context to the error for better debugging
  3. Consider using context.Context for potential timeout/cancellation
 func (o *options) Run(_ *cobra.Command, args []string) error {
 	sbomPath := args[0]
+	if o.storage == nil {
+		return fmt.Errorf("storage is not initialized")
+	}
-	result, err := ingest.LoadDataFromPath(o.storage, sbomPath)
+	result, err := ingest.LoadDataFromPath(o.storage, sbomPath)
 	if err != nil {
-		return fmt.Errorf("failed to ingest SBOM: %w", err)
+		return fmt.Errorf("failed to load SBOM from %s: %w", sbomPath, err)
 	}

26-31: Consider implementing batch processing for better performance.

The current implementation processes SBOMs one at a time. For large datasets, batch processing could improve performance.

Consider implementing a batch processor that:

  1. Processes multiple SBOMs concurrently
  2. Uses worker pools for parallel processing
  3. Implements backpressure mechanisms

30-30: Improve progress display robustness.

The current progress display uses raw ANSI codes and doesn't handle terminal width limitations.

Consider using a progress bar library like github.com/schollz/progressbar/v3 which would handle:

  • Terminal width adaptation
  • Clean ANSI handling
  • Better progress visualization
-		fmt.Printf("\r\033[1;36mIngested %d SBOMs\033[0m | \033[1;34m%s\033[0m", index+1, tools.TruncateString(data.Path, 50))
+		// Initialize progress bar in the beginning of the loop
+		// Update progress bar here
pkg/tools/ingest/vuln_test.go (1)

13-13: Improve error message clarity and consider using test helpers for paths.

The error message could be more descriptive to help with debugging. Also, consider extracting the test data paths to constants or test helpers.

-	sbomDir := "../../../testdata/osv-sboms"
+	const testDataPath = "../../../testdata"
+	sbomDir := filepath.Join(testDataPath, "osv-sboms")

-		t.Fatalf("Expected SBOM to be ingested, got %d", len(result))
+		t.Fatalf("Expected at least one SBOM to be ingested from %s, got %d", sbomDir, len(result))

Also applies to: 15-21

cmd/ingest/scorecard/scorecard.go (4)

12-14: Good use of dependency injection pattern.

The options struct cleanly encapsulates the storage dependency, making the code more testable and maintainable.

Consider adding validation methods if more fields are added to the options struct in the future.


16-16: Document the empty AddFlags method.

Add a comment explaining that this method is intentionally empty and may be used for future flag additions.

+// AddFlags adds command-specific flags. Currently no flags are required.
 func (o *options) AddFlags(_ *cobra.Command) {}

18-38: Enhance error handling and input validation.

While the implementation is functional, consider these improvements:

  1. Validate the SBOM path exists before processing
  2. Add more context to error messages
  3. Consider adding a progress bar for large SBOMs

Here's a suggested enhancement:

 func (o *options) Run(_ *cobra.Command, args []string) error {
     sbomPath := args[0]
+
+    // Validate SBOM path
+    if _, err := os.Stat(sbomPath); err != nil {
+        return fmt.Errorf("invalid SBOM path %q: %w", sbomPath, err)
+    }

     // Ingest SBOM
     result, err := ingest.LoadDataFromPath(o.storage, sbomPath)
     if err != nil {
-        return fmt.Errorf("failed to ingest SBOM: %w", err)
+        return fmt.Errorf("failed to ingest SBOM from %q: %w", sbomPath, err)
     }

     for index, data := range result {
         if err := ingest.Scorecards(o.storage, data.Data); err != nil {
-            return fmt.Errorf("failed to ingest Scorecard: %w", err)
+            return fmt.Errorf("failed to ingest Scorecard for %q: %w", data.Path, err)
         }
         fmt.Printf("\r\033[K%s", printProgress(index+1, data.Path))
     }

     fmt.Println("\nSBOM ingested successfully")
     return nil
 }

56-58: LGTM: Good progress feedback implementation.

The progress printing function provides clear, color-coded feedback with appropriate path truncation.

Consider extracting the ANSI color codes into constants for better maintainability:

+const (
+    colorCyan  = "\033[1;36m"
+    colorBlue  = "\033[1;34m"
+    colorReset = "\033[0m"
+)
+
 func printProgress(count int, path string) string {
-    return fmt.Sprintf("\033[1;36mIngested %d SBOMs\033[0m | \033[1;34m%s\033[0m", count, tools.TruncateString(path, 50))
+    return fmt.Sprintf("%sIngested %d SBOMs%s | %s%s%s", 
+        colorCyan, count, colorReset,
+        colorBlue, tools.TruncateString(path, 50), colorReset)
 }
cmd/ingest/osv/osv.go (2)

16-16: Consider adding configuration flags.

The empty AddFlags method suggests no configurable options. Consider adding flags for customization (e.g., batch size, concurrency limits, output format).


21-24: Enhance error message with context.

The error message could be more specific by including the directory path.

-		return fmt.Errorf("failed to load vulnerabilities: %w", err)
+		return fmt.Errorf("failed to load vulnerabilities from %s: %w", vulnsDir, err)
pkg/tools/ingest/sbom.go (3)

13-14: Add Test Coverage for Empty Data Validation

The empty data validation lacks test coverage according to codecov report.

Would you like me to help generate test cases for this validation check?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 14-14: pkg/tools/ingest/sbom.go#L14
Added line #L14 was not covered by tests


20-22: Enhance Error Message Context

The error message could be more descriptive to aid in debugging.

Consider this improvement:

-		return fmt.Errorf("failed to parse SBOM file: %w", err)
+		return fmt.Errorf("failed to parse SBOM data (size: %d bytes): %w", len(data), err)
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 22-22: pkg/tools/ingest/sbom.go#L22
Added line #L22 was not covered by tests


Line range hint 34-67: Consider Refactoring Error Handling Logic

The nested error handling blocks could be simplified to improve readability and maintainability.

Consider extracting the edge processing logic into a separate helper function:

func processEdge(storage graph.Storage, edge *Edge, nameToId map[string]uint32) error {
    fromNode, err := storage.GetNode(nameToId[edge.From])
    if err != nil {
        return fmt.Errorf("failed to get from node %s: %w", edge.From, err)
    }

    for _, to := range edge.To {
        if err := processEdgeTarget(storage, fromNode, to, nameToId); err != nil {
            return err
        }
    }
    return nil
}

func processEdgeTarget(storage graph.Storage, fromNode *Node, to string, nameToId map[string]uint32) error {
    toNode, err := storage.GetNode(nameToId[to])
    if err != nil {
        return fmt.Errorf("failed to get to node %s: %w", to, err)
    }

    if fromNode.ID != toNode.ID {
        if err := fromNode.SetDependency(storage, toNode); err != nil {
            return fmt.Errorf("failed to add edge %s -> %s: %w", fromNode.ID, to, err)
        }
    }
    return nil
}
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 14-14: pkg/tools/ingest/sbom.go#L14
Added line #L14 was not covered by tests


[warning] 22-22: pkg/tools/ingest/sbom.go#L22
Added line #L22 was not covered by tests

pkg/tools/ingest/loader.go (1)

21-24: Consider using errors.Join for error aggregation

The function uses a slice to collect errors, but Go 1.20's errors.Join would be more idiomatic for combining multiple errors.

pkg/tools/ingest/scorecard.go (2)

57-64: Consider adding debug logging for skipped results.

While filtering out unsuccessful results is correct, adding debug logging would help with troubleshooting ingestion issues.

 for _, result := range results {
   if !result.Success {
+    log.Debugf("Skipping unsuccessful scorecard result for PURL: %s, error: %s", result.PURL, result.Error)
     continue
   }
   scorecardResults[result.PURL] = append(scorecardResults[result.PURL], result)
 }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 57-61: pkg/tools/ingest/scorecard.go#L57-L61
Added lines #L57 - L61 were not covered by tests


[warning] 63-63: pkg/tools/ingest/scorecard.go#L63
Added line #L63 was not covered by tests


83-84: Consider potential concurrency and performance implications.

A few considerations for this section:

  1. There's a potential race condition if the storage is modified during iteration. Consider adding a comment about thread safety requirements.
  2. The nested loops (nodes → scorecardData → scorecardResult) could be optimized by pre-filtering scorecard results by version.

Example optimization:

 scorecardData, ok := scorecardResults[purl.Name]
 if !ok {
   continue
 }
+// Pre-filter matching versions to avoid nested iteration
+matchingResults := make([]ScorecardResult, 0)
+for _, result := range scorecardData {
+  if result.Success {
+    scorecardPurl, err := PURLToPackage(result.PURL)
+    if err == nil && scorecardPurl.Version == purl.Version {
+      matchingResults = append(matchingResults, result)
+    }
+  }
+}
-for _, scorecardResult := range scorecardData {
+for _, scorecardResult := range matchingResults {
-  if scorecardResult.Success {
-    scorecardPurl, err := PURLToPackage(scorecardResult.PURL)
-    if err != nil {
-      continue
-    }
-    if scorecardPurl.Version == purl.Version {

Also applies to: 88-88

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 83-84: pkg/tools/ingest/scorecard.go#L83-L84
Added lines #L83 - L84 were not covered by tests

pkg/tools/ingest/vuln.go (2)

121-133: LGTM with a minor optimization suggestion.

The vulnerability lookup and processing logic is well-structured. Consider pre-marshaling the vulnerabilities when building the vulnMap to avoid repeated marshaling operations.

 type PairedVuln struct {
 	ID   string
-	Vuln Vulnerability
+	Data []byte
 }

 	vulnMap := map[string][]Vulnerability{}
+	marshaledVulnMap := map[string][]PairedVuln{}

 	for _, affected := range vuln.Affected {
-		vulnMap[affected.Package.Name] = append(vulnMap[affected.Package.Name], vuln)
+		vulnData, err := json.Marshal(vuln)
+		if err != nil {
+			errors = append(errors, err)
+			continue
+		}
+		marshaledVulnMap[affected.Package.Name] = append(
+			marshaledVulnMap[affected.Package.Name],
+			PairedVuln{ID: vuln.ID, Data: vulnData},
+		)
 	}
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 132-133: pkg/tools/ingest/vuln.go#L132-L133
Added lines #L132 - L133 were not covered by tests


Line range hint 82-143: Add test coverage for error scenarios.

Several error paths lack test coverage according to codecov:

  • Empty data validation
  • JSON unmarshaling errors
  • Error collection during processing

Consider adding test cases for these scenarios to ensure robust error handling.

Would you like me to help generate comprehensive test cases for these scenarios?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 132-133: pkg/tools/ingest/vuln.go#L132-L133
Added lines #L132 - L133 were not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f42dd74 and a8b3fcc.

⛔ Files ignored due to path filters (1)
  • gen/api/v1/service.pb.go is excluded by !**/*.pb.go, !**/gen/**
📒 Files selected for processing (15)
  • api/v1/service_test.go (1 hunks)
  • cmd/ingest/osv/graph/graph.go (0 hunks)
  • cmd/ingest/osv/load/load.go (0 hunks)
  • cmd/ingest/osv/osv.go (1 hunks)
  • cmd/ingest/sbom/sbom.go (1 hunks)
  • cmd/ingest/scorecard/graph/graph.go (0 hunks)
  • cmd/ingest/scorecard/load/load.go (0 hunks)
  • cmd/ingest/scorecard/scorecard.go (1 hunks)
  • pkg/storages/e2e_test.go (1 hunks)
  • pkg/tools/ingest/loader.go (1 hunks)
  • pkg/tools/ingest/sbom.go (1 hunks)
  • pkg/tools/ingest/sbom_test.go (2 hunks)
  • pkg/tools/ingest/scorecard.go (3 hunks)
  • pkg/tools/ingest/vuln.go (3 hunks)
  • pkg/tools/ingest/vuln_test.go (2 hunks)
💤 Files with no reviewable changes (4)
  • cmd/ingest/osv/graph/graph.go
  • cmd/ingest/osv/load/load.go
  • cmd/ingest/scorecard/graph/graph.go
  • cmd/ingest/scorecard/load/load.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • api/v1/service_test.go
  • pkg/storages/e2e_test.go
  • pkg/tools/ingest/sbom_test.go
🧰 Additional context used
🪛 golangci-lint
cmd/ingest/osv/osv.go

50-50: func printProgress is unused

(unused)

🪛 GitHub Check: codecov/patch
pkg/tools/ingest/loader.go

[warning] 34-34: pkg/tools/ingest/loader.go#L34
Added line #L34 was not covered by tests


[warning] 40-40: pkg/tools/ingest/loader.go#L40
Added line #L40 was not covered by tests


[warning] 50-51: pkg/tools/ingest/loader.go#L50-L51
Added lines #L50 - L51 were not covered by tests


[warning] 55-55: pkg/tools/ingest/loader.go#L55
Added line #L55 was not covered by tests


[warning] 60-60: pkg/tools/ingest/loader.go#L60
Added line #L60 was not covered by tests


[warning] 67-67: pkg/tools/ingest/loader.go#L67
Added line #L67 was not covered by tests


[warning] 73-73: pkg/tools/ingest/loader.go#L73
Added line #L73 was not covered by tests


[warning] 76-76: pkg/tools/ingest/loader.go#L76
Added line #L76 was not covered by tests


[warning] 82-82: pkg/tools/ingest/loader.go#L82
Added line #L82 was not covered by tests


[warning] 89-89: pkg/tools/ingest/loader.go#L89
Added line #L89 was not covered by tests


[warning] 96-96: pkg/tools/ingest/loader.go#L96
Added line #L96 was not covered by tests


[warning] 101-101: pkg/tools/ingest/loader.go#L101
Added line #L101 was not covered by tests


[warning] 105-105: pkg/tools/ingest/loader.go#L105
Added line #L105 was not covered by tests

pkg/tools/ingest/sbom.go

[warning] 14-14: pkg/tools/ingest/sbom.go#L14
Added line #L14 was not covered by tests


[warning] 22-22: pkg/tools/ingest/sbom.go#L22
Added line #L22 was not covered by tests

pkg/tools/ingest/scorecard.go

[warning] 47-50: pkg/tools/ingest/scorecard.go#L47-L50
Added lines #L47 - L50 were not covered by tests


[warning] 52-55: pkg/tools/ingest/scorecard.go#L52-L55
Added lines #L52 - L55 were not covered by tests


[warning] 57-61: pkg/tools/ingest/scorecard.go#L57-L61
Added lines #L57 - L61 were not covered by tests


[warning] 63-63: pkg/tools/ingest/scorecard.go#L63
Added line #L63 was not covered by tests


[warning] 83-84: pkg/tools/ingest/scorecard.go#L83-L84
Added lines #L83 - L84 were not covered by tests


[warning] 88-88: pkg/tools/ingest/scorecard.go#L88
Added line #L88 was not covered by tests

pkg/tools/ingest/vuln.go

[warning] 85-86: pkg/tools/ingest/vuln.go#L85-L86
Added lines #L85 - L86 were not covered by tests


[warning] 90-91: pkg/tools/ingest/vuln.go#L90-L91
Added lines #L90 - L91 were not covered by tests


[warning] 102-103: pkg/tools/ingest/vuln.go#L102-L103
Added lines #L102 - L103 were not covered by tests


[warning] 132-133: pkg/tools/ingest/vuln.go#L132-L133
Added lines #L132 - L133 were not covered by tests

🔇 Additional comments (10)
cmd/ingest/sbom/sbom.go (1)

21-31: Verify all callers are updated for breaking changes.

The PR indicates this is a breaking change. Let's verify all callers are properly updated.

pkg/tools/ingest/vuln_test.go (2)

35-35: LGTM!

Good spacing improvement for better code readability.


36-46: 🛠️ Refactor suggestion

Consider making the vulnerability test more maintainable.

The vulnerability ingestion logic looks good, but there are a few improvements that could make the test more maintainable:

  1. Document or extract the expected node count increment (3) as a named constant with an explanation
  2. Consider making the test more resilient to changes in test data
  3. Use consistent path handling with the SBOM section
+	// Expected number of new nodes when ingesting test vulnerabilities
+	const expectedNewNodes = 3 // Represents: [explain what these 3 nodes represent]
+
 	result, err = LoadDataFromPath(storage, vulnsDir)
 	if err != nil {
-		t.Fatalf("Failed to load vulnerabilities from directory %s: %v", vulnsDir, err)
+		t.Fatalf("Failed to load vulnerabilities from %s: %v", vulnsDir, err)
 	}
 	if len(result) == 0 {
-		t.Fatalf("Expected vulnerabilities to be ingested, got %d", len(result))
+		t.Fatalf("Expected at least one vulnerability to be ingested from %s, got %d", vulnsDir, len(result))
 	}

-	if len(keys) != numberOfNodes+3 {
-		t.Fatalf("Expected number of nodes to be %d, got %d", numberOfNodes+3, len(keys))
+	if len(keys) != numberOfNodes+expectedNewNodes {
+		t.Fatalf("Expected %d new nodes after vulnerability ingestion (total: %d), got %d", 
+			expectedNewNodes, numberOfNodes+expectedNewNodes, len(keys))
 	}

Additionally, verify that the test data is consistent with our expectations:

cmd/ingest/scorecard/scorecard.go (2)

4-8: LGTM: Imports are appropriate for the functionality.

The new imports support the enhanced SBOM ingestion functionality and progress reporting.


40-53: LGTM: Well-structured command configuration.

The command is properly configured with clear usage instructions and argument validation.

cmd/ingest/osv/osv.go (2)

4-14: LGTM! Clean structure and appropriate imports.

The options struct with graph.Storage is a good pattern for command configuration.


40-41: Verify command name change impact.

The command name has changed from "osv" to "graph". This is a breaking change that needs to be verified.

✅ Verification successful

Based on the search results, I can now provide a final response:

Command name change from "osv" to "graph" appears safe

The command is part of the osv package but was previously not using "osv" as the command name in the CLI interface. The search results show:

  • No documentation references to "osv" as a command
  • No CLI usage examples mentioning the "osv" command
  • The command is properly integrated under the main "ingest" command (cmd/ingest/ingest.go adds it via osv.New(storage))
  • Other commands like "sbom" and "scorecard" follow similar patterns
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any references to the old command name and verify documentation is updated

# Check for references to the old command name
echo "Checking for references to 'osv' command..."
rg -l "osv\s+command|command\s+osv" --type md

# Check if README or documentation is updated
echo "Checking documentation files..."
fd -e md -x grep -l "osv" {}

Length of output: 254


Script:

#!/bin/bash
# Let's try a broader search to understand the command usage and documentation

# Search for any mentions of 'osv' in the codebase
echo "Searching for 'osv' mentions..."
rg -i "osv" -A 3 -B 3

# Search for cobra command definitions to understand command structure
echo -e "\nSearching for cobra command definitions..."
ast-grep --pattern 'cobra.Command {
  $$$
}'

# Check for any CLI usage examples or help text
echo -e "\nChecking for CLI usage or help text..."
rg -i "usage:|example:|cli" --type md

Length of output: 10308

pkg/tools/ingest/sbom.go (1)

12-14: Breaking API Change: Review Impact on Consumers

The function signature has changed significantly:

  • Removed sbomPath parameter and progress callback
  • Now accepts raw byte data instead of file path

Let's check for potential impacts:

Consider:

  1. Adding a wrapper function with the old signature for backward compatibility
  2. Documenting the migration path for existing consumers
✅ Verification successful

Based on the search results, I can now generate the final response since I have sufficient information about all the SBOM function calls in the codebase.

Breaking API Change: No Migration Required

All existing callers have already been updated to use the new signature with (storage, data) parameters. The changes are consistent across:

  • Test files: pkg/storages/e2e_test.go, pkg/tools/ingest/vuln_test.go, pkg/tools/ingest/sbom_test.go
  • API handlers: api/v1/service_test.go
  • CLI commands: cmd/ingest/sbom/sbom.go
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Find all calls to SBOM function to verify migration
rg -l "SBOM\(" --type go | while read -r file; do
  echo "=== $file ==="
  rg -A 2 "SBOM\(" "$file"
done

Length of output: 1639

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 14-14: pkg/tools/ingest/sbom.go#L14
Added line #L14 was not covered by tests

pkg/tools/ingest/vuln.go (2)

95-99: LGTM! Good optimization.

The new approach of mapping vulnerabilities by package name improves lookup efficiency during node processing.


82-91: Improve error messages and add test coverage.

The empty data check and JSON unmarshaling error handling are good additions, but:

  1. The error messages could be more descriptive (e.g., include context about the vulnerability data)
  2. These code paths aren't covered by tests according to codecov

Consider improving the error messages:

-		return fmt.Errorf("data is empty")
+		return fmt.Errorf("vulnerability data is empty")
-		return fmt.Errorf("failed to unmarshal vulnerabilityType data: %w", err)
+		return fmt.Errorf("failed to unmarshal vulnerability data (schema version %s): %w", vuln.SchemaVersion, err)
#!/bin/bash
# Check for existing test cases covering these scenarios
rg -A 5 "TestVulnerabilities.*empty|TestVulnerabilities.*invalid.*json" --type go
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 85-86: pkg/tools/ingest/vuln.go#L85-L86
Added lines #L85 - L86 were not covered by tests


[warning] 90-91: pkg/tools/ingest/vuln.go#L90-L91
Added lines #L90 - L91 were not covered by tests

cmd/ingest/sbom/sbom.go Show resolved Hide resolved
cmd/ingest/osv/osv.go Show resolved Hide resolved
cmd/ingest/osv/osv.go Show resolved Hide resolved
pkg/tools/ingest/loader.go Show resolved Hide resolved
pkg/tools/ingest/loader.go Show resolved Hide resolved
pkg/tools/ingest/loader.go Show resolved Hide resolved
pkg/tools/ingest/scorecard.go Show resolved Hide resolved
pkg/tools/ingest/scorecard.go Show resolved Hide resolved
pkg/tools/ingest/vuln.go Show resolved Hide resolved
pkg/tools/ingest/vuln.go Show resolved Hide resolved
- Breaking change : United the graph and load ingestions for osv and scorecard
- Changed the sbom ingestion to use LoadDataFromPath for filedata
  fetching

Signed-off-by: neilnaveen <[email protected]>
@naveensrinivasan naveensrinivasan merged commit df68d54 into main Nov 11, 2024
6 checks passed
@naveensrinivasan naveensrinivasan deleted the neil/cleanupingestion branch November 11, 2024 02:20
@coderabbitai coderabbitai bot mentioned this pull request Nov 11, 2024
@neilnaveen neilnaveen restored the neil/cleanupingestion branch November 11, 2024 16:03
neilnaveen added a commit that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants