-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mqbc::IncoreCSL: Rollover fixes and improvements #595
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
Signed-off-by: Yuan Jing Vincent Yan <[email protected]>
b81564b
to
0ada70c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Some questions inline.
|
||
bsl::shared_ptr<bdlbb::Blob> record = d_blobSpPool_p->getObject(); | ||
ClusterStateRecordType::Enum recordType = | ||
info.d_clusterMessage.choice().isLeaderAdvisoryValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the case of uncommitted LeaderAdvisory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, when we apply a LeaderAdvisory as a new leader, we always wait until CSL_CMT_SUCCESS
before declaring ourselves healed. Thus, other advisories that might cause rollover will be halted.
However, there is one special case where if a follower node joins late, the leader will apply another snapshot LeaderAdvisory to inform that node of latest cluster state. In this specific case, other advisories can come afterwards, and it's possible to have a rollover while that LeaderAdvisory is still uncommitted.
advisories.erase(iter); | ||
<< ": Applying a recovered record from IncoreCSL: " | ||
<< clusterMessage << "."; | ||
apply(state, clusterMessage, clusterData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs clarification (and comment).
We see an uncommitted advisory and apply
it as if it is committed. What is the reasoning? If we are Primary and do_applyCSLSelf
, we can assume all advisories are synchronized and that serves as the "commit"?
Is there a case when we apply
uncommitted advisory as Replica?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the case where the leader applies an advisory, receives enough acks, and commits the advisory, but then crashes before the followers have a chance to write the commit. One of the followers becomes the new leader. The new leader and the remaining followers, will see this as an uncommitted advisory; they must carry out the last wish of the previous leader and commit this advisory. That is why upon ClusterUtil::load
, we apply the uncommitted advisories knowing that they are about to be committed. Note that the three callers of ClusterUtil::load
: validateClusterStateLedger
, do_applyCSLSelf
, and do_sendFollowerClusterStateResponse
all load into a temporary state, and that temporary state is either used to validate the ledger or to construct the snapshot advisory to be applied by the new leader. The new leader's snapshot advisory will soon be committed.
We apply uncommitted advisories as replica in validateClusterStateLedger
and do_sendFollowerClusterStateResponse
. validateClusterStateLedger
is okay because it's just for validation. do_sendFollowerClusterStateResponse
is okay because the FollowerClusterStateResponse
is sent from the highest LSN follower to the leader to tell the leader what to put in its snapshot advisory.
Will add some comments in the code too to help clarify.
@@ -471,6 +472,7 @@ | |||
<element name='maxDataFileSize' type='unsignedLong'/> | |||
<element name='maxJournalFileSize' type='unsignedLong'/> | |||
<element name='maxQlistFileSize' type='unsignedLong'/> | |||
<element name='maxCSLFileSize' type='unsignedLong' default='67108864'/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is maxCSLFileSize
used / planned to be used?
@@ -282,7 +273,14 @@ class ClusterStateLedger : public ElectorInfoObserver { | |||
virtual int | |||
apply(const bmqp_ctrlmsg::QueueUnassignedAdvisory& advisory) = 0; | |||
virtual int apply(const bmqp_ctrlmsg::QueueUpdateAdvisory& advisory) = 0; | |||
virtual int apply(const bmqp_ctrlmsg::LeaderAdvisory& advisory) = 0; | |||
|
|||
/// Apply the specified `advisory` to self and replicate to followers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is ClusterUtil::apply
, same name, different meaning. That one gets called on commit, is that correct?
bmqp_ctrlmsg::LeaderAdvisory leaderAdvisory; | ||
leaderAdvisory.sequenceNumber() = snapshotLSN; | ||
|
||
// The snapshot will have the same sequence number as the record which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment.
We insert this leaderAdvisory
before the first uncommitted advisory thus breaking the monotonically increasing order.
Is it guaranteed that after this leaderAdvisory
there are no other LeaderAdvisory
records?
// caused rollover. We do not want to bump up leader sequence number | ||
// because the snapshot will not be broadcasted, Note that since we write | ||
// the snapshot before the uncommitted records, the records won't be in | ||
// monotonically increasing order. That is okay because we will make a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point out, where is the "special case for rollover records"?
I just realize part of |
Key changes:
mqbc::IncoreCSL: Populate appIds during rollover
mqbc::IncoreCSL: Always populate cluster state snapshot as first record
mqbc::ClusterUtil: Apply uncommitted advisories when loading from CSL