Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manager controller recreates clusters when manager cluster ID is missing from status #1902

Closed
Tracked by #1939
rzetelskik opened this issue Apr 22, 2024 · 3 comments · Fixed by #2156
Closed
Tracked by #1939
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@rzetelskik
Copy link
Member

What happened?

Currently, the manager cluster ID is saved in ScyllaCluster's status on cluster creation. If the controller fails to update ScyllaCluster's status, the ID is lost, or an older generation of the object is reconciled, the controller will delete the existing cluster from the manager state and create it again.

The issue and its root cause are similar #1752.

This not only adds a superfluous workload, but may introduce incorrectness, involving e.g. task retention.

/priority important-soon
/assign

What did you expect to happen?

Clusters in manager state should not be deleted once they've been created successfully.

How can we reproduce it (as minimally and precisely as possible)?

n/a

Scylla Operator version

master

Kubernetes platform name and version

n/a

Please attach the must-gather archive.

n/a

Anything else we need to know?

Unfortunately, we now have no reliable way of telling whether a cluster existing in manager state corresponds to a K8S object if we don't have the ID.
This should be easy to fix with scylladb/scylla-manager#3219, since we'll be able to save metadata in manager state, and so we'll be able to "reclaim" the cluster despite not having its ID.

@rzetelskik rzetelskik added the kind/bug Categorizes issue or PR as related to a bug. label Apr 22, 2024
@scylla-operator-bot scylla-operator-bot bot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 22, 2024
@rzetelskik rzetelskik changed the title Manager controller recreates clusters when manager cluster ID is lost from status Manager controller recreates clusters when manager cluster ID is missing from status Apr 22, 2024
Copy link
Contributor

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out

/lifecycle stale

@scylla-operator-bot scylla-operator-bot bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 8, 2024
Copy link
Contributor

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 30d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out

/lifecycle rotten

@scylla-operator-bot scylla-operator-bot bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 8, 2024
@rzetelskik
Copy link
Member Author

/remove-lifecycle rotten
/triage accepted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant