Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors #847

Open
unmarshall opened this issue Feb 21, 2025 · 1 comment
Assignees
Labels
area/control-plane Control plane related kind/bug Bug

Comments

@unmarshall
Copy link
Contributor

unmarshall commented Feb 21, 2025

How to categorize this issue?

/area control-plane
/kind bug

What happened:
A specific gardener e2e kind test is failing often - Shoot Tests Hibernated Shoot [It] Create, Migrate and Delete [Shoot, control-plane-migration, hibernated]

Creation, Migration and hibernation steps succeed. To do the deletion of the migrated shoot which is currently hibernated, you need to wake up the etcd-cluster. At this stage the etcd cluster is not getting ready.

In one such occurrence we see the following logs in etcd-events-2 (backup-restore container):

2025-02-17T12:45:52.969873914Z stderr F 2025-02-17 12:45:52.968607 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:52.970531124Z stderr F 2025-02-17 12:45:52.970317 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.055124837Z stderr F 2025-02-17 12:45:53.054945 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.062374513Z stderr F 2025-02-17 12:45:53.062106 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.153435731Z stderr F 2025-02-17 12:45:53.153314 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.160917167Z stderr F 2025-02-17 12:45:53.160807 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.251792044Z stderr F 2025-02-17 12:45:53.251680 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.264667024Z stderr F 2025-02-17 12:45:53.264552 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)

For complete logs see: etcd-events-2-backup-restore.log

You would typically see cluster ID mismatch in the 3 scenarios that are documented here.

Prior to starting the embedded etcd process, initialization is triggered by etcd-wrapper. Once the initialization succeeds, etcd-wrapper requests for etcd config. etcd-backup-restore computes the etcd config here. One of the key parameters in the etcd config is to determine the initial-cluster-state which is done here to distinguish if this member bootstraps/joins a new cluster or joins an existing cluster.

If member list API call fails (see IsLearnerPresent) due to any reason then this function correctly returns an error which is swallowed by the calling function (see here) and the calling function assumes initial-cluster-state=new. This is done for 0->3 replicas bootstrap case because while bootstrapping a new cluster etcd Member API calls will never succeed. Even in case of errors, we have to serve the config with initial-cluster-state=new to let the bootstrap succeed.

However, the above code-flow has a negative consequence as well. Consider the following case:

  • Data directory of one of the etcd member gets corrupted while bringing up the cluster from 0->3.
  • Etcd-backup-restore validates the data directory and finds it corrupt. It will trigger the single member restoration (see this for more information).
  • As part of single-member-restoration, it will add this member as a learner after which it will trigger the initialization. Once initialization is successful, it will serve an etcd config.
  • While computing the initial-cluster-state if there is an error while making the etcd Member API call (due to transient quorum loss - possible due to VPA eviction etc.) then it assumes initial-cluster-state as new. This will cause Cluster ID mismatch as this state for a learner as it's not the correct inital-cluster state.
  • This will force this member to create a new member ID which will never match with the member IDs that are known by other 2 members of the etcd cluster. Once it dials the other 2 members then they will reject the call with the Cluster ID mismatch response.

What you expected to happen:
initial-cluster-state should always be computed correctly.

@gardener-robot gardener-robot added area/control-plane Control plane related kind/bug Bug labels Feb 21, 2025
@unmarshall unmarshall changed the title Incorrect computation of initial-cluster-state Incorrect computation of initial-cluster-state which can lead to cluster ID mismatch errors Feb 21, 2025
@unmarshall unmarshall changed the title Incorrect computation of initial-cluster-state which can lead to cluster ID mismatch errors Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors Feb 21, 2025
@ishan16696
Copy link
Member

To reproduce this issue locally please follow these steps:

  1. Start an etcd cluster with 3 members.
  2. Remove one of the etcd member from the cluster using API call: etcdctl member remove <memberID>
  3. Add a learner/member to same etcd cluster.
  4. Start a leaner/member but with initial-cluster-state set to new instead of existing.

you will see such logs:

{"level":"warn","ts":"2025-02-21T14:52:29.42254+0530","caller":"rafthttp/stream.go:653","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"3e9662d4914e445d","remote-peer-cluster-id":"7fa825e3d560ad6f","local-member-id":"91bc3c398fb3c146","local-member-cluster-id":"6e9fdbc6edbe620","error":"cluster ID mismatch"}
{"level":"warn","ts":"2025-02-21T14:52:29.495654+0530","caller":"rafthttp/stream.go:653","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"3e9662d4914e445d","remote-peer-cluster-id":"7fa825e3d560ad6f","local-member-id":"91bc3c398fb3c146","local-member-cluster-id":"6e9fdbc6edbe620","error":"cluster ID mismatch"}
{"level":"warn","ts":"2025-02-21T14:52:29.495668+0530","caller":"rafthttp/stream.go:653","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"3e9662d4914e445d","remote-peer-cluster-id":"7fa825e3d560ad6f","local-member-id":"91bc3c398fb3c146","local-member-cluster-id":"6e9fdbc6edbe620","error":"cluster ID mismatch"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Control plane related kind/bug Bug
Projects
None yet
Development

No branches or pull requests

4 participants