Fix mqbc::IncoreCSL: Rollover fixes and improvements #595

kaikulimu · 2025-02-03T23:57:59Z

Key changes:

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

dorjesinpo

Thank you. Some questions inline.

dorjesinpo · 2025-02-05T02:11:03Z

src/groups/mqb/mqbc/mqbc_incoreclusterstateledger.cpp

+
+        bsl::shared_ptr<bdlbb::Blob> record = d_blobSpPool_p->getObject();
+        ClusterStateRecordType::Enum recordType =
+            info.d_clusterMessage.choice().isLeaderAdvisoryValue()


What is the case of uncommitted LeaderAdvisory?

Normally, when we apply a LeaderAdvisory as a new leader, we always wait until CSL_CMT_SUCCESS before declaring ourselves healed. Thus, other advisories that might cause rollover will be halted.

However, there is one special case where if a follower node joins late, the leader will apply another snapshot LeaderAdvisory to inform that node of latest cluster state. In this specific case, other advisories can come afterwards, and it's possible to have a rollover while that LeaderAdvisory is still uncommitted.

dorjesinpo · 2025-02-05T14:17:31Z

src/groups/mqb/mqbc/mqbc_clusterutil.cpp

-            advisories.erase(iter);
+                          << ": Applying a recovered record from IncoreCSL: "
+                          << clusterMessage << ".";
+            apply(state, clusterMessage, clusterData);


This needs clarification (and comment).
We see an uncommitted advisory and apply it as if it is committed. What is the reasoning? If we are Primary and do_applyCSLSelf, we can assume all advisories are synchronized and that serves as the "commit"?
Is there a case when we apply uncommitted advisory as Replica?

Consider the case where the leader applies an advisory, receives enough acks, and commits the advisory, but then crashes before the followers have a chance to write the commit. One of the followers becomes the new leader. The new leader and the remaining followers, will see this as an uncommitted advisory; they must carry out the last wish of the previous leader and commit this advisory. That is why upon ClusterUtil::load, we apply the uncommitted advisories knowing that they are about to be committed. Note that the three callers of ClusterUtil::load: validateClusterStateLedger, do_applyCSLSelf, and do_sendFollowerClusterStateResponse all load into a temporary state, and that temporary state is either used to validate the ledger or to construct the snapshot advisory to be applied by the new leader. The new leader's snapshot advisory will soon be committed.

We apply uncommitted advisories as replica in validateClusterStateLedger and do_sendFollowerClusterStateResponse. validateClusterStateLedger is okay because it's just for validation. do_sendFollowerClusterStateResponse is okay because the FollowerClusterStateResponse is sent from the highest LSN follower to the leader to tell the leader what to put in its snapshot advisory.

Will add some comments in the code too to help clarify.

dorjesinpo · 2025-02-05T14:23:21Z

src/groups/mqb/mqbcfg/mqbcfg.xsd

@@ -471,6 +472,7 @@
      <element name='maxDataFileSize'     type='unsignedLong'/>
      <element name='maxJournalFileSize'  type='unsignedLong'/>
      <element name='maxQlistFileSize'    type='unsignedLong'/>
+      <element name='maxCSLFileSize'      type='unsignedLong' default='67108864'/>


Where is maxCSLFileSize used / planned to be used?

dorjesinpo · 2025-02-05T16:04:40Z

src/groups/mqb/mqbc/mqbc_clusterstateledger.h

@@ -282,7 +273,14 @@ class ClusterStateLedger : public ElectorInfoObserver {
    virtual int
    apply(const bmqp_ctrlmsg::QueueUnassignedAdvisory& advisory)         = 0;
    virtual int apply(const bmqp_ctrlmsg::QueueUpdateAdvisory& advisory) = 0;
-    virtual int apply(const bmqp_ctrlmsg::LeaderAdvisory& advisory)      = 0;
+
+    /// Apply the specified `advisory` to self and replicate to followers.


There is ClusterUtil::apply, same name, different meaning. That one gets called on commit, is that correct?

dorjesinpo · 2025-02-05T16:08:00Z

src/groups/mqb/mqbc/mqbc_incoreclusterstateledger.cpp

    bmqp_ctrlmsg::LeaderAdvisory leaderAdvisory;
-    leaderAdvisory.sequenceNumber() = snapshotLSN;
+
+    // The snapshot will have the same sequence number as the record which


Thank you for the comment.
We insert this leaderAdvisory before the first uncommitted advisory thus breaking the monotonically increasing order.
Is it guaranteed that after this leaderAdvisory there are no other LeaderAdvisory records?

dorjesinpo · 2025-02-05T16:18:02Z

src/groups/mqb/mqbc/mqbc_incoreclusterstateledger.cpp

+    // caused rollover.  We do not want to bump up leader sequence number
+    // because the snapshot will not be broadcasted,  Note that since we write
+    // the snapshot before the uncommitted records, the records won't be in
+    // monotonically increasing order.  That is okay because we will make a


Can you point out, where is the "special case for rollover records"?

kaikulimu · 2025-02-05T21:06:56Z

I just realize part of IncoreClusterStateLedger::onLogRolloverCb can be refactored with ClusterUtil::loadPartitionsInfo and ClusterUtil::loadQueuesInfo. Will make a commit for that.

kaikulimu requested a review from a team as a code owner February 3, 2025 23:58

kaikulimu requested a review from dorjesinpo February 3, 2025 23:58

kaikulimu added 8 commits February 3, 2025 19:00

mqbcfg.xsd: Add maxCSLFileSize to PartitionConfig

5c9fa2b

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

mqbc::IncoreCSL: Use maxCSLFileSize instead of maxQlistFileSize

b67493c

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

mqbc::IncoreCSL: Populate appIds during rollover

8581fb0

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

mqbsl::Ledger: Handle rollover callback failure

b270821

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

mqbc::IncoreCSL: Always populate cluster state snapshot as first record

baed418

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

mqbc::ClusterUtil: Apply uncommitted advisories when loading from CSL

3720fe2

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

mqbc::IncoreCSL.t: Properly test persistence across rollover

32f9f53

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

test_rollover_csl.py [new]

0ada70c

Signed-off-by: Yuan Jing Vincent Yan <[email protected]>

kaikulimu force-pushed the it-csl-rollover branch from b81564b to 0ada70c Compare February 4, 2025 00:01

kaikulimu assigned dorjesinpo Feb 4, 2025

dorjesinpo reviewed Feb 5, 2025

View reviewed changes

dorjesinpo assigned kaikulimu and unassigned dorjesinpo Feb 5, 2025

kaikulimu changed the title ~~Fix mqbc::IncoreCSL: Rollover fixes and improvements~~ Fix mqbc::IncoreCSL: Rollover fixes and improvements [178057288] Feb 5, 2025

kaikulimu changed the title ~~Fix mqbc::IncoreCSL: Rollover fixes and improvements [178057288]~~ Fix mqbc::IncoreCSL: Rollover fixes and improvements Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mqbc::IncoreCSL: Rollover fixes and improvements #595

Fix mqbc::IncoreCSL: Rollover fixes and improvements #595

kaikulimu commented Feb 3, 2025 •

edited

Loading

dorjesinpo left a comment

dorjesinpo Feb 5, 2025

kaikulimu Feb 5, 2025 •

edited

Loading

dorjesinpo Feb 5, 2025

kaikulimu Feb 6, 2025

dorjesinpo Feb 5, 2025

dorjesinpo Feb 5, 2025

dorjesinpo Feb 5, 2025

dorjesinpo Feb 5, 2025

kaikulimu commented Feb 5, 2025

Fix mqbc::IncoreCSL: Rollover fixes and improvements #595

Are you sure you want to change the base?

Fix mqbc::IncoreCSL: Rollover fixes and improvements #595

Conversation

kaikulimu commented Feb 3, 2025 • edited Loading

dorjesinpo left a comment

Choose a reason for hiding this comment

dorjesinpo Feb 5, 2025

Choose a reason for hiding this comment

kaikulimu Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

dorjesinpo Feb 5, 2025

Choose a reason for hiding this comment

kaikulimu Feb 6, 2025

Choose a reason for hiding this comment

dorjesinpo Feb 5, 2025

Choose a reason for hiding this comment

dorjesinpo Feb 5, 2025

Choose a reason for hiding this comment

dorjesinpo Feb 5, 2025

Choose a reason for hiding this comment

dorjesinpo Feb 5, 2025

Choose a reason for hiding this comment

kaikulimu commented Feb 5, 2025

kaikulimu commented Feb 3, 2025 •

edited

Loading

kaikulimu Feb 5, 2025 •

edited

Loading