Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Leader Election for HA Mode #135

Open
aminmr opened this issue Oct 5, 2024 · 3 comments
Open

Implement Leader Election for HA Mode #135

aminmr opened this issue Oct 5, 2024 · 3 comments

Comments

@aminmr
Copy link
Contributor

aminmr commented Oct 5, 2024

Description

Currently, the k8s-cleaner operator is deployed with a single replica by default and does not support high availability (HA). If the number of replicas is increased, multiple pods may attempt to take actions simultaneously, which could lead to conflicts or redundant operations on the cluster.

To address this, I suggest implementing leader election for HA mode. Many open-source projects utilize Kubernetes' Lease mechanism for this purpose, allowing only one pod to act as the leader at any given time. This would prevent multiple instances from interfering with each other when multiple replicas are running.

Here are the relevant Kubernetes docs: Kubernetes Lease Mechanism.

Proposed Solution:

Introduce leader election logic using Kubernetes Leases.
Ensure that only the pod holding the active lease performs actions on the cluster, while other replicas remain in a standby mode.

I am happy to volunteer to implement this feature for the project.

I am looking forward to your thoughts! Thanks!

@gianlucam76
Copy link
Owner

Thanks @aminmr. I am not sure about this. k8s-cleaner originally had leader election. But I removed it. Reason being, I believe with cleaner we might hit scaling more than availability (and the end it does not have to respond to other services). And when we hit scaling, my plan is to introduce sharding. So different cleaner instances can process in parallel different cleaner instances based on some annotation.

Let's keep this open though. If we don't take that path, we will add leader election back.

Thank you again!

@aminmr
Copy link
Contributor Author

aminmr commented Oct 6, 2024

Thanks, @gianlucam76 !

I appreciate your explanation, but I have a couple of questions regarding the future implementation and design decisions for the Cleaner.
qir
What technologies are you planning to use for implementing sharding in the Cleaner?
I'm curious about how you plan to manage sharding to ensure multiple Cleaner instances can process resources in parallel.

Could you clarify why leader election isn't a good solution for this project?
I understand your point about scaling being more of a concern than availability, but I’m still unclear on why leader election is being considered less effective in this case. Is there a specific issue with leader election that conflicts with the overall goals of the project? And do you know any relevant operator with sharding feature?

Thanks again for your time, and I look forward to your thoughts on this!

@gianlucam76
Copy link
Owner

gianlucam76 commented Oct 7, 2024

Hi @aminmr regarding sharding, I am planning on using same approach I used in sveltos here it will require some manual configuration (as I don't have a shard controller like in Sveltos), but the idea is that.

In general leader election is great (tough it has a cost of you having to run 3 pods instead of 1 with 2 pods doing nothing most of the time). But I do see that more valuable for a service that need to respond to other services (where you cannot afford having it down for 30 seconds or so).
Cleaner has a configurable jitter window. So if cleaner gets stuck, it won't miss processing cleaner instances which are due. This makes the leader less needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants