Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mqbc::IncoreCSL: Rollover fixes and improvements #595

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

@kaikulimu kaikulimu requested a review from a team as a code owner February 3, 2025 23:58
@kaikulimu kaikulimu requested a review from dorjesinpo February 3, 2025 23:58
Copy link
Collaborator

@dorjesinpo dorjesinpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Some questions inline.


bsl::shared_ptr<bdlbb::Blob> record = d_blobSpPool_p->getObject();
ClusterStateRecordType::Enum recordType =
info.d_clusterMessage.choice().isLeaderAdvisoryValue()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the case of uncommitted LeaderAdvisory?

Copy link
Collaborator Author

@kaikulimu kaikulimu Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, when we apply a LeaderAdvisory as a new leader, we always wait until CSL_CMT_SUCCESS before declaring ourselves healed. Thus, other advisories that might cause rollover will be halted.

However, there is one special case where if a follower node joins late, the leader will apply another snapshot LeaderAdvisory to inform that node of latest cluster state. In this specific case, other advisories can come afterwards, and it's possible to have a rollover while that LeaderAdvisory is still uncommitted.

advisories.erase(iter);
<< ": Applying a recovered record from IncoreCSL: "
<< clusterMessage << ".";
apply(state, clusterMessage, clusterData);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs clarification (and comment).
We see an uncommitted advisory and apply it as if it is committed. What is the reasoning? If we are Primary and do_applyCSLSelf, we can assume all advisories are synchronized and that serves as the "commit"?
Is there a case when we apply uncommitted advisory as Replica?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the case where the leader applies an advisory, receives enough acks, and commits the advisory, but then crashes before the followers have a chance to write the commit. One of the followers becomes the new leader. The new leader and the remaining followers, will see this as an uncommitted advisory; they must carry out the last wish of the previous leader and commit this advisory. That is why upon ClusterUtil::load, we apply the uncommitted advisories knowing that they are about to be committed. Note that the three callers of ClusterUtil::load: validateClusterStateLedger, do_applyCSLSelf, and do_sendFollowerClusterStateResponse all load into a temporary state, and that temporary state is either used to validate the ledger or to construct the snapshot advisory to be applied by the new leader. The new leader's snapshot advisory will soon be committed.

We apply uncommitted advisories as replica in validateClusterStateLedger and do_sendFollowerClusterStateResponse. validateClusterStateLedger is okay because it's just for validation. do_sendFollowerClusterStateResponse is okay because the FollowerClusterStateResponse is sent from the highest LSN follower to the leader to tell the leader what to put in its snapshot advisory.

Will add some comments in the code too to help clarify.

@@ -471,6 +472,7 @@
<element name='maxDataFileSize' type='unsignedLong'/>
<element name='maxJournalFileSize' type='unsignedLong'/>
<element name='maxQlistFileSize' type='unsignedLong'/>
<element name='maxCSLFileSize' type='unsignedLong' default='67108864'/>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is maxCSLFileSize used / planned to be used?

@@ -282,7 +273,14 @@ class ClusterStateLedger : public ElectorInfoObserver {
virtual int
apply(const bmqp_ctrlmsg::QueueUnassignedAdvisory& advisory) = 0;
virtual int apply(const bmqp_ctrlmsg::QueueUpdateAdvisory& advisory) = 0;
virtual int apply(const bmqp_ctrlmsg::LeaderAdvisory& advisory) = 0;

/// Apply the specified `advisory` to self and replicate to followers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is ClusterUtil::apply, same name, different meaning. That one gets called on commit, is that correct?

bmqp_ctrlmsg::LeaderAdvisory leaderAdvisory;
leaderAdvisory.sequenceNumber() = snapshotLSN;

// The snapshot will have the same sequence number as the record which
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment.
We insert this leaderAdvisory before the first uncommitted advisory thus breaking the monotonically increasing order.
Is it guaranteed that after this leaderAdvisory there are no other LeaderAdvisory records?

// caused rollover. We do not want to bump up leader sequence number
// because the snapshot will not be broadcasted, Note that since we write
// the snapshot before the uncommitted records, the records won't be in
// monotonically increasing order. That is okay because we will make a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point out, where is the "special case for rollover records"?

@dorjesinpo dorjesinpo assigned kaikulimu and unassigned dorjesinpo Feb 5, 2025
@kaikulimu kaikulimu changed the title Fix mqbc::IncoreCSL: Rollover fixes and improvements Fix mqbc::IncoreCSL: Rollover fixes and improvements [178057288] Feb 5, 2025
@kaikulimu kaikulimu changed the title Fix mqbc::IncoreCSL: Rollover fixes and improvements [178057288] Fix mqbc::IncoreCSL: Rollover fixes and improvements Feb 5, 2025
@kaikulimu
Copy link
Collaborator Author

I just realize part of IncoreClusterStateLedger::onLogRolloverCb can be refactored with ClusterUtil::loadPartitionsInfo and ClusterUtil::loadQueuesInfo. Will make a commit for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants