Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Meta fencing #7022

Merged
merged 47 commits into from
Dec 29, 2022
Merged

feat: Meta fencing #7022

merged 47 commits into from
Dec 29, 2022

Conversation

CAJan93
Copy link
Contributor

@CAJan93 CAJan93 commented Dec 22, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

#6786

Please only merge this after #6937 is merged

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.

Refer to a related PR or issue link (optional)

@CAJan93 CAJan93 marked this pull request as ready for review December 22, 2022 16:36
@CAJan93 CAJan93 requested a review from yezizp2012 December 22, 2022 16:36
@yezizp2012
Copy link
Member

I will review it after #6937. Currently most of the code is from that pr and it's a little bit difficult to review.

@CAJan93
Copy link
Contributor Author

CAJan93 commented Dec 23, 2022

I will review it after #6937. Currently most of the code is from that pr and it's a little bit difficult to review.

For sure. I think this PR will be quick to review, once #6937 got merged.


#[tokio::test]
async fn test_fencing_5() {
for params in vec![(true, true), (true, false), (false, true)] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if the sequential approach is too slow. As an alternative I can also introduce 3 functions for each combination here.

Comment on lines 215 to 219
// TODO: Enabling this causes the recovery test to fail
// See https://buildkite.com/risingwavelabs/pull-request/builds/14686#01855869-b9d7-432b-9ea8-b835830f1a8e
// tracing::info!("Waiting, to give former leaders fencing mechanism time to trigger");
// sleep(Duration::from_millis(lease_interval_secs * 1000 + 500));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have liked to introduce a break here to give the fencing mechanism time to kill the old leader before the new leader takes over. Unfortunately this seems to break the recovery test. Should we still try to introduce a wait time here @yezizp2012 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recovery test failed due to timeout, because sleep here will increase the overall test time.

I thought about it for a while, since the ticker for manage_term(renew_lease) is set to Duration::from_millis(lease_time_sec * 500 + rng.gen_range(1..500)), lease_time_sec is also used for lease expire time. If a follower get elected, it means that the old leader didn't renew the lease in the past lease_time_sec which should be done if the old leader is alive. It works when there's no network isolation, but not work when the old leader are directly network isolated from etcd using the wrapped transactional interface. Because there's a retry logic inside, which will significantly increase the time of renew request. During this time there will be a split-brain situation with 2 leaders.
Anyway, when integrating the official api, we should not have retry logic and need to have a reasonable timeout to avoid this. Cc @shanicky . For this PR, let's remove it for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small correction from above statement: We do not elect a new leader directly after the lease expires. We give the old leader a little extra time. The current implementation looks like this:

// delete lease and run new election if lease is expired for some time
let some_time = lease_time_sec / 2;
let (_, lease_info) = leader_lease_result.unwrap();
if lease_info.get_lease_expire_time() + some_time < since_epoch().as_secs() {
    tracing::warn!("Detected that leader is down");

// Current implementation panics, if leader looses lease.
// Alternative implementation that panics only if leader is unable to re-acquire lease:
// if was_leader && !is_leader {
if was_leader {
Copy link
Member

@yezizp2012 yezizp2012 Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why handling exit in the print status loop? Turning this part of the code from display logic to display and control mixed. I suggest to exit directly inside the election.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I will go for that. As a note: With that we now have a slightly different behaviour, since we can do failover from leader to follower if it is caused by election re-runs and not during the term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing failover from leader to follower is too dangerous, because the time of graceful shutdown the leader services and sub-tasks is uncertain and likely to exceed the time of the lease expire time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will introduce another green thread that calls process::exit if it notices that node lost leadership. Kind of like before, but seperate from node status green thread.

Copy link
Contributor Author

@CAJan93 CAJan93 Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed it in 671a5dd. The only downside now is that we have two places in the code that handle fencing. If we only want one place to handle fencing, we would need to use the previous approach of sending a dummy update or use you suggestion. Let me know which way you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the leadership is taken, we should exit the leader logic as soon as possible, it's better to exit as soon as the leader is taken

Comment on lines 215 to 219
// TODO: Enabling this causes the recovery test to fail
// See https://buildkite.com/risingwavelabs/pull-request/builds/14686#01855869-b9d7-432b-9ea8-b835830f1a8e
// tracing::info!("Waiting, to give former leaders fencing mechanism time to trigger");
// sleep(Duration::from_millis(lease_interval_secs * 1000 + 500));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recovery test failed due to timeout, because sleep here will increase the overall test time.

I thought about it for a while, since the ticker for manage_term(renew_lease) is set to Duration::from_millis(lease_time_sec * 500 + rng.gen_range(1..500)), lease_time_sec is also used for lease expire time. If a follower get elected, it means that the old leader didn't renew the lease in the past lease_time_sec which should be done if the old leader is alive. It works when there's no network isolation, but not work when the old leader are directly network isolated from etcd using the wrapped transactional interface. Because there's a retry logic inside, which will significantly increase the time of renew request. During this time there will be a split-brain situation with 2 leaders.
Anyway, when integrating the official api, we should not have retry logic and need to have a reasonable timeout to avoid this. Cc @shanicky . For this PR, let's remove it for now.

Copy link
Member

@yezizp2012 yezizp2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@yezizp2012
Copy link
Member

Don't worry, The failure of scaling test is a known issue under fixing. FYI.

@mergify mergify bot merged commit 7ea4a82 into main Dec 29, 2022
@mergify mergify bot deleted the fencing2 branch December 29, 2022 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants