-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvcoord: Correctly handle stuck rangefeeds #92582
Conversation
@@ -71,11 +72,11 @@ func (w *stuckRangeFeedCanceler) stop() { | |||
|
|||
// ping notifies the canceler that the rangefeed has received an | |||
// event, i.e. is making progress. | |||
func (w *stuckRangeFeedCanceler) ping() { | |||
func (w *stuckRangeFeedCanceler) do(fn func() error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this is currently structured, we don't need to return error
from either of these functions. The closure can close over err
and we can check it outside of do()
.
There are a couple of other approaches we can consider:
-
Have
do()
also do error handling, returningerrRestartStuckRange
when the rangefeed is stuck. -
Return
true
if the rangefeed is stuck when the function returns.
These would allow us to get rid of the stuck()
method and simplify the caller logic. I think I have a preference for returning errRestartStuckRange
and encapsulating as much of the logic as possible inside do()
, but it's not a strongly held view. The current approach is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack -- I was thinking of doing that. Still, I think returning an error is better -- I prefer not to close over errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I can get rid of stuck() -- we call it in few places , and I think it just reads better when error handling is explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'm not sure why we bother with returning errStuckRangefeed
from do()
in that case, since none of the callers use it anyway. You call though, nbd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted return of errStuckRange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if I can get rid of stuck() -- but other comments addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments, the core logic seems fine. Preemptively approving once addressed.
We should also have a test, either for the DistSender or as a CDC roachtest, which makes sure a stalled sink won't cause undue restarts.
@@ -71,11 +72,11 @@ func (w *stuckRangeFeedCanceler) stop() { | |||
|
|||
// ping notifies the canceler that the rangefeed has received an | |||
// event, i.e. is making progress. | |||
func (w *stuckRangeFeedCanceler) ping() { | |||
func (w *stuckRangeFeedCanceler) do(fn func() error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'm not sure why we bother with returning errStuckRangefeed
from do()
in that case, since none of the callers use it anyway. You call though, nbd.
Added canceller test. |
Fixes cockroachdb#92570 A watcher responsible for restarting stuck range feeds may incorrectly cancel rangefeed if the downstream event consumer is slow. Release note (bug fix): Fix incorrect cancellation logic when attempting to detect stuck range feeds.
bors r+ |
1 similar comment
bors r+ |
Already running a review |
Build succeeded: |
Fixes #92570
A watcher responsible for restarting stuck range feeds may incorrectly cancel rangefeed if the downstream event consumer is slow.
Release note (bug fix): Fix incorrect cancellation logic when attempting to detect stuck range feeds.