Manager controller recreates clusters when manager cluster ID is missing from status #1902

rzetelskik · 2024-04-22T11:50:27Z

What happened?

Currently, the manager cluster ID is saved in ScyllaCluster's status on cluster creation. If the controller fails to update ScyllaCluster's status, the ID is lost, or an older generation of the object is reconciled, the controller will delete the existing cluster from the manager state and create it again.

The issue and its root cause are similar #1752.

This not only adds a superfluous workload, but may introduce incorrectness, involving e.g. task retention.

/priority important-soon
/assign

What did you expect to happen?

Clusters in manager state should not be deleted once they've been created successfully.

How can we reproduce it (as minimally and precisely as possible)?

n/a

Scylla Operator version

master

Kubernetes platform name and version

n/a

Please attach the must-gather archive.

n/a

Anything else we need to know?

Unfortunately, we now have no reliable way of telling whether a cluster existing in manager state corresponds to a K8S object if we don't have the ID.
This should be easy to fix with scylladb/scylla-manager#3219, since we'll be able to save metadata in manager state, and so we'll be able to "reclaim" the cluster despite not having its ID.

scylla-operator-bot · 2024-07-08T10:43:34Z

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out

/lifecycle stale

scylla-operator-bot · 2024-08-08T10:35:28Z

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out

/lifecycle rotten

rzetelskik · 2024-08-08T10:37:10Z

/remove-lifecycle rotten
/triage accepted

rzetelskik added the kind/bug Categorizes issue or PR as related to a bug. label Apr 22, 2024

scylla-operator-bot bot assigned rzetelskik Apr 22, 2024

scylla-operator-bot bot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 22, 2024

rzetelskik mentioned this issue Apr 22, 2024

Manager integration stability and correctness [Part 1] #1897

Closed

rzetelskik changed the title ~~Manager controller recreates clusters when manager cluster ID is lost from status~~ Manager controller recreates clusters when manager cluster ID is missing from status Apr 22, 2024

rzetelskik mentioned this issue May 24, 2024

Manager integration stability and correctness [Part 2] #1939

Open

scylla-operator-bot bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 8, 2024

scylla-operator-bot bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 8, 2024

scylla-operator-bot bot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Aug 8, 2024

rzetelskik mentioned this issue Sep 10, 2024

ScyllaClusters connected into a multi-datacenter cluster with externalSeeds are registered as separate clusters with Scylla Manager #2119

Open

rzetelskik mentioned this issue Oct 15, 2024

Use Scylla Manager cluster labels for cluster reconciliation #2156

Merged

scylla-operator-bot bot closed this as completed in #2156 Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manager controller recreates clusters when manager cluster ID is missing from status #1902

Manager controller recreates clusters when manager cluster ID is missing from status #1902

rzetelskik commented Apr 22, 2024

scylla-operator-bot bot commented Jul 8, 2024

scylla-operator-bot bot commented Aug 8, 2024

rzetelskik commented Aug 8, 2024

Manager controller recreates clusters when manager cluster ID is missing from status #1902

Manager controller recreates clusters when manager cluster ID is missing from status #1902

Comments

rzetelskik commented Apr 22, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Scylla Operator version

Kubernetes platform name and version

Please attach the must-gather archive.

Anything else we need to know?

scylla-operator-bot bot commented Jul 8, 2024

scylla-operator-bot bot commented Aug 8, 2024

rzetelskik commented Aug 8, 2024