Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors #847

unmarshall · 2025-02-21T08:31:37Z

How to categorize this issue?

/area control-plane
/kind bug

What happened:
A specific gardener e2e kind test is failing often - Shoot Tests Hibernated Shoot [It] Create, Migrate and Delete [Shoot, control-plane-migration, hibernated]

Creation, Migration and hibernation steps succeed. To do the deletion of the migrated shoot which is currently hibernated, you need to wake up the etcd-cluster. At this stage the etcd cluster is not getting ready.

In one such occurrence we see the following logs in etcd-events-2 (backup-restore container):

2025-02-17T12:45:52.969873914Z stderr F 2025-02-17 12:45:52.968607 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:52.970531124Z stderr F 2025-02-17 12:45:52.970317 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.055124837Z stderr F 2025-02-17 12:45:53.054945 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.062374513Z stderr F 2025-02-17 12:45:53.062106 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.153435731Z stderr F 2025-02-17 12:45:53.153314 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.160917167Z stderr F 2025-02-17 12:45:53.160807 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.251792044Z stderr F 2025-02-17 12:45:53.251680 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.264667024Z stderr F 2025-02-17 12:45:53.264552 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)

For complete logs see: etcd-events-2-backup-restore.log

You would typically see cluster ID mismatch in the 3 scenarios that are documented here.

Prior to starting the embedded etcd process, initialization is triggered by etcd-wrapper. Once the initialization succeeds, etcd-wrapper requests for etcd config. etcd-backup-restore computes the etcd config here. One of the key parameters in the etcd config is to determine the initial-cluster-state which is done here to distinguish if this member bootstraps/joins a new cluster or joins an existing cluster.

If member list API call fails (see IsLearnerPresent) due to any reason then this function correctly returns an error which is swallowed by the calling function (see here) and the calling function assumes initial-cluster-state=new. This is done for 0->3 replicas bootstrap case because while bootstrapping a new cluster etcd Member API calls will never succeed. Even in case of errors, we have to serve the config with initial-cluster-state=new to let the bootstrap succeed.

However, the above code-flow has a negative consequence as well. Consider the following case:

Data directory of one of the etcd member gets corrupted while bringing up the cluster from 0->3.
Etcd-backup-restore validates the data directory and finds it corrupt. It will trigger the single member restoration (see this for more information).
As part of single-member-restoration, it will add this member as a learner after which it will trigger the initialization. Once initialization is successful, it will serve an etcd config.
While computing the initial-cluster-state if there is an error while making the etcd Member API call (due to transient quorum loss - possible due to VPA eviction etc.) then it assumes initial-cluster-state as new. This will cause Cluster ID mismatch as this state for a learner as it's not the correct inital-cluster state.
This will force this member to create a new member ID which will never match with the member IDs that are known by other 2 members of the etcd cluster. Once it dials the other 2 members then they will reject the call with the Cluster ID mismatch response.

What you expected to happen:
initial-cluster-state should always be computed correctly.

The text was updated successfully, but these errors were encountered:

ishan16696 · 2025-02-21T09:23:54Z

To reproduce this issue locally please follow these steps:

Start an etcd cluster with 3 members.
Remove one of the etcd member from the cluster using API call: etcdctl member remove <memberID>
Add a learner/member to same etcd cluster.
Start a leaner/member but with initial-cluster-state set to new instead of existing.

you will see such logs:

{"level":"warn","ts":"2025-02-21T14:52:29.42254+0530","caller":"rafthttp/stream.go:653","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"3e9662d4914e445d","remote-peer-cluster-id":"7fa825e3d560ad6f","local-member-id":"91bc3c398fb3c146","local-member-cluster-id":"6e9fdbc6edbe620","error":"cluster ID mismatch"}
{"level":"warn","ts":"2025-02-21T14:52:29.495654+0530","caller":"rafthttp/stream.go:653","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"3e9662d4914e445d","remote-peer-cluster-id":"7fa825e3d560ad6f","local-member-id":"91bc3c398fb3c146","local-member-cluster-id":"6e9fdbc6edbe620","error":"cluster ID mismatch"}
{"level":"warn","ts":"2025-02-21T14:52:29.495668+0530","caller":"rafthttp/stream.go:653","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"3e9662d4914e445d","remote-peer-cluster-id":"7fa825e3d560ad6f","local-member-id":"91bc3c398fb3c146","local-member-cluster-id":"6e9fdbc6edbe620","error":"cluster ID mismatch"}

gardener-robot added area/control-plane Control plane related kind/bug Bug labels Feb 21, 2025

unmarshall changed the title ~~Incorrect computation of initial-cluster-state~~ Incorrect computation of initial-cluster-state which can lead to cluster ID mismatch errors Feb 21, 2025

unmarshall changed the title ~~Incorrect computation of initial-cluster-state which can lead to cluster ID mismatch errors~~ Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors Feb 21, 2025

rfranzke mentioned this issue Feb 26, 2025

[Flaky Test] pull-gardener-e2e-kind-migration-ha-single-zone: Wait for Shoot to be deleted stuck at Scaling up etcd main and event gardener/gardener#11525

Open

shreyas-s-rao self-assigned this Feb 27, 2025

shreyas-s-rao modified the milestone: v0.35.0 Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors #847

Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors #847

unmarshall commented Feb 21, 2025 •

edited by ishan16696

Loading

ishan16696 commented Feb 21, 2025

Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors #847

Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors #847

Comments

unmarshall commented Feb 21, 2025 • edited by ishan16696 Loading

ishan16696 commented Feb 21, 2025

unmarshall commented Feb 21, 2025 •

edited by ishan16696

Loading