Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Unpeer should delete remote resource gracefully rather than force #1846

Closed
Sharathmk99 opened this issue May 31, 2023 · 1 comment
Labels
workaround This issue or pull request contains a workaround

Comments

@Sharathmk99
Copy link
Contributor

Is your feature request related to a problem? Please describe.
With the current liqoctl unpeer .... command remote cluster resources will be forcefully deleted, which means any running pod's in remote cluster will be deleted without any notice. In our case we run Machine Learning training in remote cluster for weeks/months by mistake if someone deletes FC CRD or execute liqoctl unpeer ML training will be deleted, which is not good.

Describe the solution you'd like
When using liqoctl,

When liqoctl unpeer command is executed, just disable OutgoingPeering, may be cordon the node so future pods will not be scheduled and delete FC CRD(which will delete virtual kubelet deployment) if no resource reflection.
If uses executes liqoctl unpeer --force then delete remote resource forcefully and delete FC CRD.

When FC CRD is deleted,

Block the deletion of the FC CRD till all the reflected resources are completed, but cordon the node so no new jobs will be scheduled.
If FC CRD was deleted with forcefully(not sure if we can differentiate in controller) then forcefully delete reflected resources.

This helps to avoid accident commands executions.

Describe alternatives you've considered

Additional context
I'm open for discussion and implementation once we agree together.

@Sharathmk99 Sharathmk99 changed the title [Feature] Unpeer should delete remote resource gracefully rather than forcefully [Feature] Unpeer should delete remote resource gracefully rather than force May 31, 2023
@aleoli aleoli added the workaround This issue or pull request contains a workaround label Dec 23, 2024
@aleoli
Copy link
Member

aleoli commented Dec 23, 2024

The suggested workflow is as follows:

  1. cordon the nodes related to the cluster to drain
  2. wait for offloaded pods to complete
  3. unpeer the cluster

@aleoli aleoli closed this as completed Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
workaround This issue or pull request contains a workaround
Projects
None yet
Development

No branches or pull requests

2 participants