Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Raft] make me crazy #3172

Merged
merged 15 commits into from
Nov 18, 2021
Merged

[Raft] make me crazy #3172

merged 15 commits into from
Nov 18, 2021

Conversation

critical27
Copy link
Contributor

@critical27 critical27 commented Oct 21, 2021

Origins from and rely on #3031, some code may be conflicted as well, wait that merge first.

Change notes:

  1. Cherry-pick Raft problem #2483 and apply to 2.0
  2. Since Raft problem #2483, we could remove the weird code in Host (magic code), for example:
    if (logId <= lastLogIdSent_) {
      LOG(INFO) << idStr_ << "The log " << logId << " has been sended"
                << ", lastLogIdSent " << lastLogIdSent_;
      cpp2::AppendLogResponse r;
      r.set_error_code(cpp2::ErrorCode::SUCCEEDED);
      return r;
    }

and many

              } else if (self->lastLogIdSent_ == resp.get_last_log_id()) {
                VLOG(2) << self->idStr_ << "We send nothing in the last request"
                        << ", so we don't send the same logs again";
                self->lastLogIdSent_ = resp.get_last_log_id();
                self->lastLogTermSent_ = resp.get_last_log_term();
                self->followerCommittedLogId_ = resp.get_committed_log_id();
                cpp2::AppendLogResponse r;
                r.set_error_code(cpp2::ErrorCode::SUCCEEDED);
                self->setResponse(r);
  1. We can't send the wals after snapshot, because if all the wals after snapshot has been sent, and we reboot during the snapshot is sending, the partition could be elected as leader (only check lastLogId). Just delete these logic in the code, follower only can append more logs after snapshot is received.
  2. Fix a partition is reset way more than expected times during waiting for snapshot.
  3. Many logic in Host has been unified. I will describe the invariant later.

Todo:

  1. Follow-up of separate thenValue/thenError when rpc fail to handle timeout #3031, add a back-off time when leader send logs timeout, to avoid make thing worse.
  2. I tried to use different eventbase for each Host when appendLogs, but it seems failed (crashed a lot), still working on it.

@critical27 critical27 changed the title Fix [Raft] make me crazy Oct 21, 2021
@critical27 critical27 added the ready-for-testing PR: ready for the CI test label Oct 22, 2021
@critical27 critical27 force-pushed the fix branch 2 times, most recently from 08cd286 to cef4526 Compare October 26, 2021 03:18
@critical27 critical27 removed the ready-for-testing PR: ready for the CI test label Oct 26, 2021
@critical27 critical27 force-pushed the fix branch 2 times, most recently from e37c4f9 to 65c2997 Compare October 29, 2021 05:54
@critical27 critical27 added the ready-for-testing PR: ready for the CI test label Nov 8, 2021
@critical27
Copy link
Contributor Author

Close #3018, #3019, #3147

@kikimo
Copy link
Contributor

kikimo commented Nov 16, 2021

LGTM

Copy link
Contributor

@kikimo kikimo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

Codecov Report

Merging #3172 (b170bf9) into master (7f711bb) will increase coverage by 0.02%.
The diff coverage is 78.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3172      +/-   ##
==========================================
+ Coverage   85.20%   85.23%   +0.02%     
==========================================
  Files        1304     1288      -16     
  Lines      119897   119876      -21     
==========================================
+ Hits       102164   102173       +9     
+ Misses      17733    17703      -30     
Impacted Files Coverage Δ
src/clients/meta/MetaClient.cpp 76.40% <ø> (+1.13%) ⬆️
src/clients/meta/MetaClient.h 95.65% <ø> (ø)
src/common/datatypes/Edge.h 100.00% <ø> (ø)
src/common/expression/PathBuildExpression.h 100.00% <ø> (ø)
...c/common/expression/test/ExpressionContextMock.cpp 100.00% <ø> (ø)
src/common/utils/Types.h 100.00% <ø> (ø)
src/graph/context/Iterator.h 73.79% <ø> (-0.69%) ⬇️
src/graph/executor/admin/SubmitJobExecutor.h 100.00% <ø> (ø)
src/graph/planner/match/MatchClausePlanner.h 100.00% <ø> (ø)
src/graph/planner/match/MatchSolver.cpp 47.05% <0.00%> (-50.00%) ⬇️
... and 149 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07d5e21...b170bf9. Read the comment docs.

@critical27 critical27 merged commit 9ebc49d into vesoft-inc:master Nov 18, 2021
@critical27 critical27 deleted the fix branch November 18, 2021 01:41
@Sophie-Xie Sophie-Xie added the doc affected PR: improvements or additions to documentation label Dec 6, 2021
@cooper-lzy cooper-lzy requested review from cooper-lzy and removed request for bright-starry-sky January 5, 2022 03:50
@foesa-yang foesa-yang removed the doc affected PR: improvements or additions to documentation label Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants