Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: optimize storage access for term when leader tries to commit #139609

Conversation

hakuuww
Copy link
Contributor

@hakuuww hakuuww commented Jan 22, 2025

When the raft leader wants to commit an entry(advance commit index), it must check the term of the entry it wants to commit and ensure the term number is the same as the current leader term(term of itself).
Previously, this process may involve a storage access for the term number if the entry has been persisted on stable storage and no longer in the unstable log of the leader node.

In fact this above scenario happens quite often. Which makes an optimization improving leader commit latency fruitful.

This PR introduces an optimization that utilizes an invariant that is logically equivalent to checking whether the want-to-be-committed entry has the same term as the leader doing the commit.
This is done by keeping an entry index in leader node's memory that signifies the point in time where the current leader became leader. When the leader wants to commit an entry, the leader compares the want-to-be-committed entry's index(available in memory) with the newly added index in this PR to determine whether the term matches.
Thus completely eliminating storage access on this code path.

A corresponding unit test is also modified to account for the elimination of storage access when leader tries to commit.

A previous related PR(merged): #139907
Jira issue: CRDB-45763
Fixes: #137826

Release note (performance improvement): removed a potential storage read from raft commit pipeline. This reduces the worst-case KV write latency.

@hakuuww hakuuww added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-replication Relating to Raft, consensus, and coordination. T-kv KV Team T-kv-replication labels Jan 22, 2025
@hakuuww hakuuww self-assigned this Jan 22, 2025
@hakuuww hakuuww requested a review from a team as a code owner January 22, 2025 22:53
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pav-kv pav-kv self-requested a review January 22, 2025 23:24
pkg/raft/raft.go Outdated Show resolved Hide resolved
pkg/raft/rawnode_test.go Outdated Show resolved Hide resolved
pkg/raft/rawnode_test.go Outdated Show resolved Hide resolved
pkg/raft/rawnode_test.go Outdated Show resolved Hide resolved
pkg/raft/rawnode_test.go Outdated Show resolved Hide resolved
pkg/raft/rawnode_test.go Outdated Show resolved Hide resolved
pkg/raft/rawnode_test.go Outdated Show resolved Hide resolved
pkg/raft/rawnode_test.go Outdated Show resolved Hide resolved
pkg/raft/raft.go Outdated Show resolved Hide resolved
pkg/raft/raft.go Outdated Show resolved Hide resolved
@hakuuww hakuuww marked this pull request as draft January 23, 2025 17:07
@hakuuww hakuuww force-pushed the optimizeLeaderCommitTermByAvoidingStorageAccess branch 2 times, most recently from bd7dc5b to 3c697a7 Compare January 28, 2025 20:51
@hakuuww hakuuww requested a review from pav-kv January 28, 2025 20:53
@hakuuww hakuuww force-pushed the optimizeLeaderCommitTermByAvoidingStorageAccess branch 2 times, most recently from e172a3c to d51426d Compare January 29, 2025 15:36
@hakuuww hakuuww marked this pull request as ready for review January 29, 2025 15:38
@hakuuww hakuuww changed the title raft: optimize leader commit term check with unit test raft: optimize leader commit term check Jan 29, 2025
@hakuuww hakuuww changed the title raft: optimize leader commit term check raft: optimize storage access for term when leader tries to commit Jan 29, 2025
Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits. Also, please squash the PR into one commit.

pkg/raft/raft.go Outdated Show resolved Hide resolved
pkg/raft/raft.go Outdated Show resolved Hide resolved
pkg/raft/raft.go Outdated Show resolved Hide resolved
@hakuuww hakuuww force-pushed the optimizeLeaderCommitTermByAvoidingStorageAccess branch 2 times, most recently from ac9be23 to 2f92236 Compare January 29, 2025 16:33
@pav-kv
Copy link
Collaborator

pav-kv commented Jan 29, 2025

The code LGTM, but the release note seems off:

Release note (performance improvement): improved the internal raft commit logic. This slightly reduces the latency of a new client write request if a raft leader crash occurred after the write request was submitted

Please either remove it, or note something like:

Release note (performance improvement): removed a potential storage read from raft
commit pipeline. This reduces the worst-case KV write latency.

(both in the commit and PR description)

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go once release notes and nits are fixed.

pkg/raft/raft.go Outdated Show resolved Hide resolved
When the raft leader wants to commit an entry(advance commit index), it must check the term of the entry it wants to commit and ensure the term number is the same as the current leader term(term of itself).
Previously, this process may involve a storage access for the term number if the entry has been persisted on stable storage and no longer in the unstable log of the leader node.

In fact this above scenario happens quite often. Which makes an optimization improving leader commit latency fruitful.

This PR introduces an optimization that utilizes an invariant that is logically equivalent to checking whether the want-to-be-committed entry has the same term as the leader doing the commit.
This is done by keeping an entry index in leader node's memory that signifies the point in time where the current leader became leader. When the leader wants to commit an entry, the leader compares the want-to-be-committed entry's index(available in memory) with the newly added index in this PR to determine whether the term matches.
Thus completely eliminating storage access on this code path.

A corresponding unit test is also modified to account for the elimination of storage access when leader tries to commit.

Jira issue: CRDB-45763
Fixes: cockroachdb#137826

Release note (performance improvement): Removed a potential storage read from raft commit pipeline. This reduces the worst-case KV write latency.
@hakuuww hakuuww force-pushed the optimizeLeaderCommitTermByAvoidingStorageAccess branch from 2f92236 to 465e772 Compare January 29, 2025 20:49
@hakuuww
Copy link
Contributor Author

hakuuww commented Jan 29, 2025

TYFTR!

bors r=pav-kv

craig bot pushed a commit that referenced this pull request Jan 29, 2025
139609: raft: optimize storage access for term when leader tries to commit r=pav-kv a=hakuuww

When the raft leader wants to commit an entry(advance commit index), it must check the term of the entry it wants to commit and ensure the term number is the same as the current leader term(term of itself).
Previously, this process may involve a storage access for the term number if the entry has been persisted on stable storage and no longer in the unstable log of the leader node.

In fact this above scenario happens quite often. Which makes an optimization improving leader commit latency fruitful.

This PR introduces an optimization that utilizes an invariant that is logically equivalent to checking whether the want-to-be-committed entry has the same term as the leader doing the commit.
This is done by keeping an entry index in leader node's memory that signifies the point in time where the current leader became leader. When the leader wants to commit an entry, the leader compares the want-to-be-committed entry's index(available in memory) with the newly added index in this PR to determine whether the term matches.
Thus completely eliminating storage access on this code path.

A corresponding unit test is also modified to account for the elimination of storage access when leader tries to commit.

A previous related PR(merged): #139907
Jira issue: [CRDB-45763](https://cockroachlabs.atlassian.net/browse/CRDB-45763)
Fixes: #137826

Release note (performance improvement): removed a potential storage read from raft commit pipeline. This reduces the worst-case KV write latency.

Co-authored-by: Anthony Xu <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jan 29, 2025

Build failed:

@hakuuww
Copy link
Contributor Author

hakuuww commented Jan 30, 2025

Retry

bors r=pav-kv

@hakuuww
Copy link
Contributor Author

hakuuww commented Jan 30, 2025

bors retry

@craig
Copy link
Contributor

craig bot commented Jan 30, 2025

Already running a review

@craig craig bot merged commit 893ca74 into cockroachdb:master Jan 30, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-kv KV Team T-kv-replication
Projects
None yet
Development

Successfully merging this pull request may close these issues.

raft: optimize the leader commit term check
3 participants