Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvclient: stop reads on followers #114205

Merged

Conversation

andrewbaptist
Copy link
Contributor

Stop follower reads on draining, decommissioning or unhealthy nodes.

Epic: none
Fixes: #112351

Release note (performance improvement): This change prevents failed requests from being issued on followers that are draining, decommissioning or unhealthy which prevents latency spikes if those nodes later go offline.

@andrewbaptist andrewbaptist requested a review from a team as a code owner November 10, 2023 01:38
@andrewbaptist andrewbaptist requested a review from a team November 10, 2023 01:38
@andrewbaptist andrewbaptist requested a review from a team as a code owner November 10, 2023 01:38
@andrewbaptist andrewbaptist requested review from mgartner and removed request for a team November 10, 2023 01:38
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 12 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @mgartner)


pkg/kv/kvclient/kvcoord/dist_sender.go line 278 at r1 (raw file):

	"kv.dist_sender.follower_reads_unhealthy.enabled",
	"send follower reads to unhealthy nodes",
	true,

We want to start with this set to false, right? Until we've mitigated the performance impact. We can then set it to true in the tests which want to exercise it.

I guess the downside of that is that we get less passive testing of the logic. We don't want certain customers running with logic that we only test in rare cases.

Ok, I guess it seems ok to keep this set to true on master to ensure it gets full test coverage, and then set it to false in the backport.


EDIT: I was actually misinterpreting this setting. I thought "true" meant "opt into the new behavior". Landing this on master defaulted to off (true) seems like the right approach. I think we may still want to get it turned on by default on master before backporting the off-by-default version to the release branch, so we know that it's at least getting full test coverage on some branch.

Does that seem like a reasonable plan?


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go line 1184 at r1 (raw file):

			Settings:          settings,
			Locality:          localities[i],
			DefaultTestTenant: base.TestIsSpecificToStorageLayerAndNeedsASystemTenant,

When we backport, we'll just not include this, right? Are there any downsides to that, other than the test taking a little longer?

Stop follower reads on draining, decommissioning or unhealthy nodes.

Epic: none
Fixes: cockroachdb#112351

Release note (performance improvement): This change prevents failed
requests from being issued on followers that are draining,
decommissioning or unhealthy which prevents latency spikes if those
nodes later go offline.
@andrewbaptist andrewbaptist force-pushed the 2023-11-09-follower-reads-simple branch from e3bdff2 to 3111a55 Compare November 10, 2023 17:07
Copy link
Contributor Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR - Running one more time with the removal of the TestArgs so it can be an "exact" backport. It should finish shortly.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/dist_sender.go line 278 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We want to start with this set to false, right? Until we've mitigated the performance impact. We can then set it to true in the tests which want to exercise it.

I guess the downside of that is that we get less passive testing of the logic. We don't want certain customers running with logic that we only test in rare cases.

Ok, I guess it seems ok to keep this set to true on master to ensure it gets full test coverage, and then set it to false in the backport.


EDIT: I was actually misinterpreting this setting. I thought "true" meant "opt into the new behavior". Landing this on master defaulted to off (true) seems like the right approach. I think we may still want to get it turned on by default on master before backporting the off-by-default version to the release branch, so we know that it's at least getting full test coverage on some branch.

Does that seem like a reasonable plan?

Yes - this is exactly what I was thinking. Leaving this as "disabled" (reading from unhealthy) no both 23.2 and master and then enabling once the other PRs land.


pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go line 1184 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

When we backport, we'll just not include this, right? Are there any downsides to that, other than the test taking a little longer?

I'm removing these settings so I don't need to wait for the other PR. I'll reenable these later on master in a different PR. It does just take a little longer, although I've seen tests flake on stress+race without these.

We have a somewhat annoying situation that tests only run StressRace after a change has been made to them, and I believe a number of tests (maybe a majority) will fail the first time any change is made. We probably should figure out a better way to handle this.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)

@andrewbaptist
Copy link
Contributor Author

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Nov 10, 2023

Build succeeded:

@craig craig bot merged commit 0d5057d into cockroachdb:master Nov 10, 2023
@andrewbaptist andrewbaptist added the backport-23.2.x Flags PRs that need to be backported to 23.2. label Nov 10, 2023
@andrewbaptist
Copy link
Contributor Author

blathers backport 23.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvclient: don't send follower reads to unhealthy nodes
3 participants