-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control pod routing in service objects with label #181
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
roypaulin
approved these changes
Mar 21, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
spilchen
pushed a commit
to spilchen/vertica-kubernetes
that referenced
this pull request
Mar 21, 2022
This adds a new label to pods to control when they are visible for Service objects. This is separate from the ready state. A label gives us the ability to control routing for scaling events. For instance, it will allow us to drain out nodes ahead of them being scaled down. Or, we can add them after we do an add node and have verified that Vertica has finished rebalancing the nodes. The label isn't part of the statefulset spec. Rather we add (or remove) it using a new reconcile actor. This should close timing scenarios during scaling seen occasionally in the e2e tests (e.g. step 40 in k-safety-0-scaling test). The online upgrade reconciler got a bit of an overhaul. When adding the pending delete state in pod facts, it broke handling of transient subclusters. So I am now adding transient subclusters to the VerticaDB during online upgrade. I think this is a cleaner solution anyway. It allowed me to cleanup transient handling in sc_finder.go for instance.
spilchen
pushed a commit
that referenced
this pull request
Mar 23, 2022
This will add drain logic when we do a scale down so that we don't kill nodes that have active connections. This builds on the work done in #181. Namely the pending delete state, to know if a pod is going to be scaled down. And the client routing change so that the Service object does not route new connections to the pod we are draining.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a new label to pods to control when they are visible for Service objects. This is separate from the ready state. A label gives us the ability to control routing for scaling events. For instance, it will allow us to drain out nodes ahead of them being scaled down. Or, we can add them after we do an add node and have verified that Vertica has finished rebalancing the nodes.
The label isn't part of the statefulset spec. Rather we add (or remove) it using a new reconcile actor.
This should close timing scenarios during scaling seen occasionally in the e2e tests (e.g. step 40 in k-safety-0-scaling test).
The online upgrade reconciler got a bit of an overhaul. When adding the pending delete state in pod facts, it broke handling of transient subclusters. So I am now adding transient subclusters to the VerticaDB during online upgrade. I think this is a cleaner solution anyway. It allowed me to cleanup transient handling in sc_finder.go for instance.