Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dnm] kvcoord: re-work bazel+gomock implementation #66975

Closed

Conversation

irfansharif
Copy link
Contributor

Exploring an alternative to #66910.

erikgrinaker and others added 3 commits June 25, 2021 20:46
In cockroachdb#66199, the gRPC error `FailedPrecondition` (returned when contacting
a decommissioned node) was incorrectly considered a permanent error
during range descriptor lookups, via `IsRangeLookupErrorRetryable()`.
These lookups are all trying to reach a meta-range (addressed by key),
so when encountering a decommissioned node they should evict the cache
token and retry the lookup.

This patch reverses that change, making only authentication errors
permanent (i.e. when the local node is decommissioned or otherwise not
part of the cluster).

Release note: None
`DistSender.RangeFeed()` was meant to retry transport errors after
refreshing the range descriptor (invalidating the cached entry).
However, due to an incorrect error type check (`*sendError` vs
`sendError`), these errors failed the range feed without invalidating
the cached range descriptor.

This was particularly severe in cases where a large number of nodes had
been decommissioned, where some stale range descriptors on some nodes
contained only decommissioned nodes. Since change feeds set up range
feeds across many nodes and ranges in the cluster, they are likely to
encounter these decommissioned nodes and return an error -- and since
the descriptor cache wasn't invalidated they would keep erroring until
the nodes were restarted such that the caches were flushed (often
requiring a full cluster restart).

Release note (bug fix): Change feeds now properly invalidate cached
range descriptors and retry when encountering decommissioned nodes.
Exploring an alternative to cockroachdb#66910.

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif deleted the 210628.gomock-bazel branch June 28, 2021 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants