Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control pod routing in service objects with label #181

Merged
merged 9 commits into from
Mar 21, 2022

Conversation

spilchen
Copy link
Collaborator

This adds a new label to pods to control when they are visible for Service objects. This is separate from the ready state. A label gives us the ability to control routing for scaling events. For instance, it will allow us to drain out nodes ahead of them being scaled down. Or, we can add them after we do an add node and have verified that Vertica has finished rebalancing the nodes.

The label isn't part of the statefulset spec. Rather we add (or remove) it using a new reconcile actor.

This should close timing scenarios during scaling seen occasionally in the e2e tests (e.g. step 40 in k-safety-0-scaling test).

The online upgrade reconciler got a bit of an overhaul. When adding the pending delete state in pod facts, it broke handling of transient subclusters. So I am now adding transient subclusters to the VerticaDB during online upgrade. I think this is a cleaner solution anyway. It allowed me to cleanup transient handling in sc_finder.go for instance.

@spilchen spilchen self-assigned this Mar 18, 2022
@spilchen spilchen requested a review from roypaulin March 21, 2022 11:24
Copy link
Collaborator

@roypaulin roypaulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.

@spilchen spilchen merged commit 7882e1b into vertica:main Mar 21, 2022
@spilchen spilchen deleted the routing-labeler branch March 21, 2022 17:06
spilchen pushed a commit to spilchen/vertica-kubernetes that referenced this pull request Mar 21, 2022
This adds a new label to pods to control when they are visible for Service
objects. This is separate from the ready state. A label gives us the ability to
control routing for scaling events. For instance, it will allow us to drain out
nodes ahead of them being scaled down. Or, we can add them after we do an add
node and have verified that Vertica has finished rebalancing the nodes.

The label isn't part of the statefulset spec. Rather we add (or remove) it
using a new reconcile actor.

This should close timing scenarios during scaling seen occasionally in the e2e
tests (e.g. step 40 in k-safety-0-scaling test).

The online upgrade reconciler got a bit of an overhaul. When adding the pending
delete state in pod facts, it broke handling of transient subclusters. So I am
now adding transient subclusters to the VerticaDB during online upgrade. I
think this is a cleaner solution anyway. It allowed me to cleanup transient
handling in sc_finder.go for instance.
spilchen pushed a commit that referenced this pull request Mar 23, 2022
This will add drain logic when we do a scale down so that we don't kill nodes
that have active connections. This builds on the work done in #181. Namely the
pending delete state, to know if a pod is going to be scaled down. And the
client routing change so that the Service object does not route new connections
to the pod we are draining.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants