-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvclient: stop reads on followers #114205
kvclient: stop reads on followers #114205
Conversation
93f9755
to
e3bdff2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @mgartner)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 278 at r1 (raw file):
"kv.dist_sender.follower_reads_unhealthy.enabled", "send follower reads to unhealthy nodes", true,
We want to start with this set to false, right? Until we've mitigated the performance impact. We can then set it to true in the tests which want to exercise it.
I guess the downside of that is that we get less passive testing of the logic. We don't want certain customers running with logic that we only test in rare cases.
Ok, I guess it seems ok to keep this set to true on master to ensure it gets full test coverage, and then set it to false in the backport.
EDIT: I was actually misinterpreting this setting. I thought "true" meant "opt into the new behavior". Landing this on master defaulted to off (true) seems like the right approach. I think we may still want to get it turned on by default on master before backporting the off-by-default version to the release branch, so we know that it's at least getting full test coverage on some branch.
Does that seem like a reasonable plan?
pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go
line 1184 at r1 (raw file):
Settings: settings, Locality: localities[i], DefaultTestTenant: base.TestIsSpecificToStorageLayerAndNeedsASystemTenant,
When we backport, we'll just not include this, right? Are there any downsides to that, other than the test taking a little longer?
Stop follower reads on draining, decommissioning or unhealthy nodes. Epic: none Fixes: cockroachdb#112351 Release note (performance improvement): This change prevents failed requests from being issued on followers that are draining, decommissioning or unhealthy which prevents latency spikes if those nodes later go offline.
e3bdff2
to
3111a55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR - Running one more time with the removal of the TestArgs so it can be an "exact" backport. It should finish shortly.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 278 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We want to start with this set to false, right? Until we've mitigated the performance impact. We can then set it to true in the tests which want to exercise it.
I guess the downside of that is that we get less passive testing of the logic. We don't want certain customers running with logic that we only test in rare cases.
Ok, I guess it seems ok to keep this set to true on master to ensure it gets full test coverage, and then set it to false in the backport.
EDIT: I was actually misinterpreting this setting. I thought "true" meant "opt into the new behavior". Landing this on master defaulted to off (true) seems like the right approach. I think we may still want to get it turned on by default on master before backporting the off-by-default version to the release branch, so we know that it's at least getting full test coverage on some branch.
Does that seem like a reasonable plan?
Yes - this is exactly what I was thinking. Leaving this as "disabled" (reading from unhealthy) no both 23.2 and master and then enabling once the other PRs land.
pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go
line 1184 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
When we backport, we'll just not include this, right? Are there any downsides to that, other than the test taking a little longer?
I'm removing these settings so I don't need to wait for the other PR. I'll reenable these later on master in a different PR. It does just take a little longer, although I've seen tests flake on stress+race without these.
We have a somewhat annoying situation that tests only run StressRace after a change has been made to them, and I believe a number of tests (maybe a majority) will fail the first time any change is made. We probably should figure out a better way to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
bors r=nvanbenschoten |
Build succeeded: |
blathers backport 23.2 |
Stop follower reads on draining, decommissioning or unhealthy nodes.
Epic: none
Fixes: #112351
Release note (performance improvement): This change prevents failed requests from being issued on followers that are draining, decommissioning or unhealthy which prevents latency spikes if those nodes later go offline.