Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not allow total all kind of snapshots to be bigger than 250 #1405

Merged
merged 1 commit into from
Feb 9, 2025

Conversation

PhanLe1010
Copy link
Contributor

@PhanLe1010 PhanLe1010 commented Feb 8, 2025

longhorn/longhorn#10308

Special notes for your reviewer:

The dependent PR is at longhorn/types#45

Additional documentation or context

This PR is an alternative proposal to the PR #1404. Why?

  1. This PR will fix the total snapshot count issue
  2. It still keeping the previous behavior of the snapshot enforcement feature the same. A.K.A the following case still work:
    1. User create a volume with backing image
    2. User set maximum snapshot count for the volume to be 2
    3. Longhorn allows user to take 2 more snapshot. If we use the PR fix: not allow more than 250 snapshots including all kind of snapshots #1404, Longhorn would not allow user to take any more snapshot which might be a behavior change, or confusion, or regression

Copy link

coderabbitai bot commented Feb 8, 2025

Walkthrough

This pull request revises how snapshot counts are managed across components. Hardcoded values for the "snapshot-max-count" flag have been replaced with a dynamic constant (types.MaximumTotalSnapshotCount), and an additional integer (countTotal) is now returned in methods that report snapshot metrics. In several places, the method signatures and error returns have been updated accordingly. Furthermore, new fields have been added to the replica information structures and the RPC response to include the total snapshot count, with a new constant defined to standardize the snapshot limit.

Changes

File(s) Change Summary
app/cmd/controller.go, app/cmd/replica.go Replaced hardcoded snapshot-max-count value (250) with types.MaximumTotalSnapshotCount.
pkg/backend/file/file.go, pkg/backend/remote/remote.go, pkg/controller/control.go, pkg/controller/replicator.go, pkg/types/types.go Updated GetSnapshotCountAndSizeUsage methods (and corresponding Backend interface) to return an extra integer (countTotal) along with the original values; adjusted method signatures and error handling to provide more detailed snapshot metrics.
pkg/replica/client/client.go, pkg/replica/rpc/server.go, pkg/types/resource.go, pkg/replica/replica.go Enhanced replica snapshot reporting by adding a SnapshotCountTotal field to ReplicaInfo structures; updated the getReplica RPC method and renamed GetSnapshotCountUsage to GetSnapshotCount to return both current usage and total snapshot counts.
pkg/types/types.go Added MaximumTotalSnapshotCount constant to standardize the maximum snapshot count across the codebase.

Sequence Diagram(s)

sequenceDiagram
    participant C as Controller
    participant B as Backend
    participant T as Types

    C->>B: Call GetSnapshotCountAndSizeUsage()
    B-->>C: Return (countUsage, countTotal, sizeUsage, error)
    C->>C: Check if countTotal > T.MaximumTotalSnapshotCount
    alt Snapshot count exceeds limit
        C-->>C: Return error ("snapshot count too high")
    else Within allowed limit
        C-->>C: Proceed with snapshot operation
    end
Loading
sequenceDiagram
    participant RC as ReplicaClient
    participant RS as ReplicaServer
    participant R as Replica

    RC->>RS: Request replica information
    RS->>R: Call GetSnapshotCount()
    R-->>RS: Return (snapCountUsage, snapCountTotal)
    RS->>RS: Convert values & populate ReplicaInfo (SnapshotCountUsage, SnapshotCountTotal)
    RS-->>RC: Return complete ReplicaInfo
Loading

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b676d05 and 780a475.

⛔ Files ignored due to path filters (4)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/types/pkg/generated/enginerpc/replica.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (11)
  • app/cmd/controller.go (1 hunks)
  • app/cmd/replica.go (1 hunks)
  • pkg/backend/file/file.go (1 hunks)
  • pkg/backend/remote/remote.go (1 hunks)
  • pkg/controller/control.go (3 hunks)
  • pkg/controller/replicator.go (1 hunks)
  • pkg/replica/client/client.go (1 hunks)
  • pkg/replica/replica.go (3 hunks)
  • pkg/replica/rpc/server.go (1 hunks)
  • pkg/types/resource.go (1 hunks)
  • pkg/types/types.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/types/resource.go
  • pkg/replica/client/client.go
  • app/cmd/replica.go
  • pkg/backend/file/file.go
  • app/cmd/controller.go
  • pkg/controller/control.go
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Build ARM64 binaries
  • GitHub Check: Build AMD64 binaries
  • GitHub Check: Summary
🔇 Additional comments (8)
pkg/types/types.go (2)

50-50: LGTM! The constant value aligns with the PR objective.

The constant MaximumTotalSnapshotCount is set to 250, which correctly implements the PR's goal of limiting the total number of snapshots.


110-110: LGTM! Method signature updated consistently.

The method signature for GetSnapshotCountAndSizeUsage() has been updated to return (int, int, int64, error), which enables tracking both usage and total snapshot counts.

pkg/replica/rpc/server.go (1)

87-90: LGTM! Snapshot count handling updated correctly.

The code properly retrieves and sets both the snapshot count usage and total count from the replica.

pkg/controller/replicator.go (1)

356-388: LGTM! Snapshot count tracking implemented correctly.

The method now properly tracks both usage and total counts across replicas, taking the maximum values encountered. The error handling is also consistent with the new return signature.

pkg/backend/remote/remote.go (1)

225-235: LGTM! Method updated with proper error handling.

The method correctly returns both snapshot count usage and total count, with proper error handling that returns zero values when appropriate.

pkg/replica/replica.go (3)

152-152: LGTM! Replaced hardcoded value with constant.

Good change to use types.MaximumTotalSnapshotCount instead of a hardcoded value, improving maintainability.


1051-1053: LGTM! Added validation for maximum total snapshot count.

The check ensures that the live chain length doesn't exceed the maximum allowed total snapshot count, aligning with the PR objective.


1399-1408: LGTM! Enhanced snapshot count reporting.

The method now returns both usage and total counts, providing better visibility into snapshot metrics. The implementation is clean and straightforward.

Let's verify that all callers of the old method signature have been updated:

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Search for any remaining calls to GetSnapshotCountUsage
rg "GetSnapshotCountUsage"

Length of output: 147


Snapshot Count Reporting Update Verified

The only remaining reference to "GetSnapshotCountUsage" is in a vendor-generated file (vendor/github.com/longhorn/types/pkg/generated/enginerpc/replica.pb.go), which indicates external, generated code rather than an active caller in our codebase. There are no additional instances in our repository that need updating.

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
pkg/backend/file/file.go (1)

102-103: Add documentation for dummy implementation.

The method returns dummy values (1, 1, 0, nil) as file backend doesn't support real snapshots. Consider adding a comment to explain this, similar to the comment on line 88 for GetHeadFileSize.

+// GetSnapshotCountAndSizeUsage uses dummy values for file backend
 func (f *Wrapper) GetSnapshotCountAndSizeUsage() (int, int, int64, error) {
   return 1, 1, 0, nil
app/cmd/controller.go (1)

87-89: LGTM! Good replacement of magic number with constant.

The change improves maintainability by using types.MaximumTotalSnapshotCount instead of a hardcoded value.

Consider adding a comment explaining why this specific limit was chosen and what happens when it's reached.

pkg/replica/rpc/server.go (1)

87-89: LGTM! Good implementation of the new snapshot counting logic.

The code correctly retrieves and sets both the snapshot usage and total count.

Consider adding error handling for the type conversion to int32 to prevent potential overflow for large snapshot counts:

-		snapCountUsage, snapCountTotal := r.GetSnapshotCount()
-		replica.SnapshotCountUsage = int32(snapCountUsage)
-		replica.SnapshotCountTotal = int32(snapCountTotal)
+		snapCountUsage, snapCountTotal := r.GetSnapshotCount()
+		if snapCountUsage > math.MaxInt32 || snapCountTotal > math.MaxInt32 {
+			logrus.Warnf("Snapshot count exceeds int32 max value: usage=%d, total=%d", snapCountUsage, snapCountTotal)
+		}
+		replica.SnapshotCountUsage = int32(snapCountUsage)
+		replica.SnapshotCountTotal = int32(snapCountTotal)
pkg/replica/replica.go (1)

1399-1408: Consider adding error handling for edge cases.

The new GetSnapshotCount method correctly returns both usage and total counts, but consider adding error handling for potential edge cases such as nil maps or corrupted disk data.

 func (r *Replica) GetSnapshotCount() (int, int) {
 	r.RLock()
 	defer r.RUnlock()
 
-	return r.getSnapshotCountUsage(), r.getSnapshotCountTotal()
+	if r.diskData == nil {
+		return 0, 0
+	}
+	return r.getSnapshotCountUsage(), r.getSnapshotCountTotal()
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77974e7 and b676d05.

⛔ Files ignored due to path filters (4)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/types/pkg/generated/enginerpc/replica.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (11)
  • app/cmd/controller.go (1 hunks)
  • app/cmd/replica.go (1 hunks)
  • pkg/backend/file/file.go (1 hunks)
  • pkg/backend/remote/remote.go (1 hunks)
  • pkg/controller/control.go (3 hunks)
  • pkg/controller/replicator.go (1 hunks)
  • pkg/replica/client/client.go (1 hunks)
  • pkg/replica/replica.go (3 hunks)
  • pkg/replica/rpc/server.go (1 hunks)
  • pkg/types/resource.go (1 hunks)
  • pkg/types/types.go (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Build ARM64 binaries
  • GitHub Check: Build AMD64 binaries
  • GitHub Check: Summary
🔇 Additional comments (12)
pkg/types/types.go (2)

50-50: LGTM! Constant definition aligns with PR objective.

The constant MaximumTotalSnapshotCount is appropriately defined with the value 250, matching the PR's goal of limiting total snapshots.


110-110: LGTM! Method signature updated to support total snapshot count.

The GetSnapshotCountAndSizeUsage method now returns (int, int, int64, error) to accommodate the total snapshot count.

pkg/types/resource.go (1)

21-21: LGTM! Field addition follows existing patterns.

The SnapshotCountTotal field is appropriately placed with other snapshot-related fields and follows the established JSON tag naming convention.

app/cmd/replica.go (1)

79-79: LGTM! Flag value now uses the centralized constant.

The snapshot-max-count flag appropriately uses types.MaximumTotalSnapshotCount, ensuring consistency across the codebase.

pkg/controller/replicator.go (1)

356-388: LGTM! Good error handling and max value calculation.

The method correctly:

  • Returns appropriate zero values on error
  • Tracks whether a valid result was found
  • Calculates maximum values across backends
pkg/backend/remote/remote.go (1)

225-235: LGTM! Clean implementation of the updated method.

The changes correctly:

  • Update the method signature
  • Handle error cases
  • Check replica state before returning values
  • Return values in the expected order
pkg/replica/client/client.go (1)

171-171: LGTM!

The addition of SnapshotCountTotal field to ReplicaInfo struct is clean and properly handles the type conversion from the gRPC response.

pkg/replica/replica.go (2)

152-152: LGTM!

Replacing hardcoded value with types.MaximumTotalSnapshotCount in NewReadOnly improves maintainability by centralizing the constant definition.


1051-1053: LGTM!

Using types.MaximumTotalSnapshotCount in openLiveChain ensures consistent snapshot limit enforcement across the codebase.

pkg/controller/control.go (3)

289-298: LGTM!

The implementation correctly validates both snapshot usage and total count limits. The error messages are clear and informative.


764-765: LGTM!

The method correctly adapts to the new return value structure from GetSnapshotCountAndSizeUsage.


802-803: LGTM!

The method correctly adapts to the new return value structure from GetSnapshotCountAndSizeUsage.

Comment on lines 376 to 378
if countTotal < currentCountTotal {
countUsage = currentCountTotal
}
Copy link

@coderabbitai coderabbitai bot Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix critical bug in countTotal assignment.

There's a bug where countUsage is incorrectly used instead of countTotal when updating the maximum total count.

Apply this fix:

-			countUsage = currentCountTotal
+			countTotal = currentCountTotal
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if countTotal < currentCountTotal {
countUsage = currentCountTotal
}
if countTotal < currentCountTotal {
countTotal = currentCountTotal
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Thank you! Fixed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@PhanLe1010 Great! The fix has been applied correctly. 🎉


🏁 Script executed:

#!/bin/bash
# Check if the fix has been applied correctly
rg -A 1 -B 1 "if countTotal < currentCountTotal" pkg/controller/replicator.go

Length of output: 155


@PhanLe1010 The fix looks perfect! The code is now correctly updating the countTotal variable. 👍


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. No shout-out for you :)))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PhanLe1010 No worries! I'll keep helping you catch those bugs either way! 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol. Just kidding. I don't use X so sorry

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PhanLe1010 No problem at all! Thanks for the friendly banter and for fixing the bug! 😊

Copy link
Member

@derekbit derekbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekbit derekbit merged commit 8bf0b1d into longhorn:master Feb 9, 2025
14 checks passed
@derekbit
Copy link
Member

derekbit commented Feb 9, 2025

@mergify backport v1.8.x v1.7.x v1.6.x

Copy link

mergify bot commented Feb 9, 2025

backport v1.8.x v1.7.x v1.6.x

✅ Backports have been created

@@ -4,6 +4,8 @@ go 1.23.0

toolchain go1.23.5

replace github.com/longhorn/types => ../types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this. Do we need the replacement? @PhanLe1010

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the PR description, we need to merge this dependent PR first longhorn/types#45 then remove the replace directive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants