-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvclient: fix an infinite loop in the DistSender #51085
Conversation
cc @knz enjoy vacay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a PR comment that this is a likely contributor to recent test failures in which TestCluster
s deadlocked during shutdown with a stack in sendTToReplicas
.
Reviewed 5 of 5 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/kv/kvclient/kvcoord/transport.go, line 87 at r1 (raw file):
// - the replica that NextReplica() would return is skipped. // // Returns true if, after skipping the transport is not exhausted. Returns
The return value seems useless - we have IsExhausted()
already. You're also not using it in production code, so I would prefer not returning anything here to keep things clean and simple.
pkg/kv/kvclient/kvcoord/transport.go, line 397 at r1 (raw file):
func (s *senderTransport) SkipReplica() bool { s.called = true
Can you add a comment on this field about when it is set? And then a comment here why we set it here.
This patch fixes some silly code which deals with the situation in which sendToReplicas() needs to try another replica, but some of the replicas with which it started are known to be stale. The code tries to skip the stale replicas except that instead of skipping anything, it was just looping endlessly. This should fix recent timeouts of tests with stacktraces in DistSender.sendToReplicas(). Fixes cockroachdb#51061 Release note: None
f7a7ef5
to
8aff78b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a PR comment that this is a likely contributor to recent test failures in which TestClusters deadlocked during shutdown with a stack in sendTToReplicas.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
pkg/kv/kvclient/kvcoord/transport.go, line 87 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
The return value seems useless - we have
IsExhausted()
already. You're also not using it in production code, so I would prefer not returning anything here to keep things clean and simple.
done
pkg/kv/kvclient/kvcoord/transport.go, line 397 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Can you add a comment on this field about when it is set? And then a comment here why we set it here.
added some words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a PR comment that this is a likely contributor to recent test failures in which TestClusters deadlocked during shutdown with a stack in sendTToReplicas.
done
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
bors r+ |
Build succeeded |
This patch fixes some silly code which deals with the situation in which
sendToReplicas() needs to try another replica, but some of the replicas
with which it started are known to be stale. The code tries to skip the
stale replicas except that instead of skipping anything, it was just
looping endlessly.
This should fix recent timeouts of tests with stacktraces in
DistSender.sendToReplicas().
Fixes #51061
Release note: None