Online DDL workflows stall permanently with a "wrong tablet type" error #16646
Labels
Component: Online DDL
Online DDL (vitess/native/gh-ost/pt-osc)
Component: VReplication
Type: Feature
Feature Description
There have been multiple reports of failed online ddl migrations where they return the following error:
"error": "vttablet: rpc error: code = FailedPrecondition desc = wrong tablet type: PRIMARY, want: REPLICA or []"
. These are treated by the vreplication workflows as non-recoverable.On restarting the migrations manually the workflow proceeds, because the picker now gets the updated tablet record (or selects a different tablet).
Use Case(s)
On investigation it appears that the tablet picker is choosing a
REPLICA
tablet for streaming from. However there is a race where the tablet gets promoted because of a PRS causing the subsequentVStream
rpc to fail because the tablet type in the request doesn't match.The text was updated successfully, but these errors were encountered: