Refactor LB #31

w13915984028 · 2024-07-03T07:47:30Z

The optimizaitons:

Remove the strict limitation of at least one VM should exist when creating a LB
Refactor the controllers
2.1 VMI changes only enqueue LB, instead to add/remove backend servers
2.2 LB changes to Not Ready in case of errors like ip allocation failure, serivce failure, endpointslices failure...
2.3 Remove the controller looping waiting of External IP, use enqueue instead
Refactor the IP Allocation (TBD), the current code has a few bugs [Bug] Load Balancer Deployment Fails in Both Guest and Harvester Cluster Scenarios harvester#5033 (comment)
...

Related issues:
harvester/harvester#5316
harvester/harvester#4821
harvester/harvester#4972
harvester/harvester#5033

Test plan:
(1) LB can be created alone when VM is not existing, the status Ready is False
(2) VM's can be added/removed to/from LB freely
(3) LB can allocate & free IP from/to pool robustly, test the IP exhausting of pool, even no VM instances.
(4) If health-check probe is enabled and all probe fails when there are at least 1 active VMs on this LB, the status Ready is False; if probe is disabled, the LB turns to Ready

w13915984028 · 2024-07-03T13:53:27Z

Local test:
(1) Various of Not-Ready status of LB

(2) Delete lb3, IP is released and then lb4 get it

(3) When prober fails on all backend servers:

(3.1) Disable Healthy check, it became Ready

(3.2) Change helthy check port (from not working 80 to working 22)

Then it became Ready

added at 2024.08.09
4. the dummy endpoint

A dummy endpoint is appended to avoid the LB traffic is routed to local host accidently. When there is no backend server, ot backend server is detected as none-ready, it is added.

endpointslice
NAME         ADDRESSTYPE   PORTS   ENDPOINTS                AGE

lb1          IPv4          443     10.52.0.15,10.52.0.255   5h42m  // VM 10.52.0.15 is not ready, then 10.52.0.255 is added
lb3          IPv4          443     10.52.0.255              3h43m
lb5          IPv4          443     10.52.0.255              3h40m
harv31:/home/rancher # kk get endpointslice -n test1
NAME        ADDRESSTYPE   PORTS   ENDPOINTS     AGE
test1-lb1   IPv4          443     10.52.0.255   4m16s

endpointslice lb1 -oyaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
  - 10.52.0.15
  conditions:
    ready: false
  targetRef:
    name: vm1
    namespace: default
    uid: a8ebb12c-bea8-4feb-8079-3c675d7dd00e
- addresses:
  - 10.52.0.255
  conditions:
    ready: true
  targetRef:
    name: lb1
    namespace: default
    uid: dummy347-546a-4642-9da6-endpoint5608
kind: EndpointSlice
metadata:
  creationTimestamp: "2024-08-09T07:58:57Z"
  generation: 28
  labels:
    kubernetes.io/service-name: lb1
    loadbalancer.harvesterhci.io/servicelb: "true"
  name: lb1
  namespace: default

Allow place-holder of LB without any VM instance existing Fix IP duplicated allocation or release Simply VMI controller Enhance health check probe Refactor lb controller, move lb to none-ready status in various error cases

pkg/webhook/loadbalancer/validator.go

pkg/lb/servicelb/manager.go

pkg/prober/worker.go

starbops

The refactor generally LGTM, thank you for the great work.

However, letting users create dummy LBs (LBs without backend servers) introduces a problem: if the listening port of an LB is 22, 80, 443, or anything that Harvester exposes on the nodes, users will be able to access the Harvester services unexpectedly. The same situation also happens to LBs, with all its backend servers failing the health check. This could be an existing issue; I just want to bring it up that letting users create dummy LBs will widen the hole.

pkg/lb/servicelb/manager.go

starbops · 2024-07-31T09:51:37Z

pkg/lb/servicelb/manager.go

 				// add the existing endpoint
 				endpoints = append(endpoints, ep)
 				break
 			}
 		}
 		// add the non-existing endpoint
-		if !flag {
+		if !existing {


Can we, by default, set the ready condition of the newly added endpoint to false? In the current implementation, if an LB with a health checker defined is created, the backend server endpoints' ready condition will be true from the beginning. This behavior seems a bit unexpected from the POV of a user who intentionally configured a health checker. Though this PR does not introduce the behavior, I suggest we make it this way. WDYT?

A few considerations :

User may enable healthCheck on the fly, set it as false may have a short interruption; if frequently enable/disable, the default true seems to be a bit better
if backend server works well, it has almost no interruption; if not, the traffic itself is interrupted

For history compatibility

Sure, it has drawbacks, the UI may show LB none-ready a bit late, if everything is from beginning:
LB transfers from: created, get-lb-ip, no-backend server, VM is up and IP is detected, LB becomes ready, health-check-fail-and-become-none-ready.
If someone looks continuously, he found the LB is ready for a bit while then none-ready; other similar cases

At the moment, I would like to keep it initially as true. Will add this to Harvester document. Your idea? thanks.

starbops · 2024-08-01T00:32:17Z

pkg/controller/vmi/controller.go

Maybe we could use the handy relatedresource.Watch as the following to replace the entire vmi controller here:

relatedresource.Watch(ctx, "lb-trigger", func(namespace, name string, obj runtime.Object) ([]relatedresource.Key, error) { var keys []relatedresource.Key vmi, ok := obj.(*kubevirtv1.VirtualMachineInstance) if !ok { return keys, nil } lbs, err := handler.lbCache.List(vmi.Namespace, labels.Everything()) if err != nil { return nil, fmt.Errorf("fail to list load balancers, error: %w", err) } for _, lb := range lbs { // skip the cluster LB or the LB whose server selector is empty if lb.DeletionTimestamp != nil || lb.Spec.WorkloadType == lbv1.Cluster || len(lb.Spec.BackendServerSelector) == 0 { continue } // notify LB selector, err := utils.NewSelector(lb.Spec.BackendServerSelector) if err != nil { return nil, fmt.Errorf("fail to parse selector %+v, error: %w", lb.Spec.BackendServerSelector, err) } if selector.Matches(labels.Set(vmi.Labels)) { logrus.Debugf("VMI %s/%s notify lb %s/%s", vmi.Namespace, vmi.Name, lb.Namespace, lb.Name) handler.lbController.Enqueue(lb.Namespace, lb.Name) } } return keys, nil }, lbc, vmis)

Will test this on the last step, thanks.

pkg/prober/worker.go

w13915984028 · 2024-08-09T18:47:07Z

@starbops @FrankYang0529 Thanks for your effective review, please take a new look, thanks.

Besides the minor fix, the second commit brings:
(1) Add a dummy endpoint to lb if it has no endpoint/all endpoints are not Ready, thus the traffic will be routed to other service/local host accidently.
(2) Validator and mutator for potential invalid port number and healthy check parameters.

Validator returns such error


error: loadbalancers.loadbalancer.harvesterhci.io "lb1" could not be patched: admission webhook "validator.harvester-system.harvester-load-balancer-webhook" denied the request: 
Internal error occurred: update loadbalancer default/lb1 failed: listener backend port 443333 must <= 65535

..update loadbalancer default/lb1 failed with healthyCheck: healthcheck port 4435 is not in listener backend port list

..update loadbalancer default/lb1 failed with healthyCheck: healthcheck port 443 can only be a TCP backend port

..update loadbalancer default/lb1 failed: listener port 0 must >= 1

..update loadbalancer default/lb1 failed: the loadbalancer needs to have at least one listener

Mutator makes sure:

  healthCheck:
    failureThreshold: 2 // min value 2
    periodSeconds: 1  // min value 1
    port: 443
    successThreshold: 2 // min value 2
    timeoutSeconds: 3 // min value 1

When user enable the healthCheck, but does not fill the time/thresh param, they are mutatored to:

Signed-off-by: Jian Wang <[email protected]>

FrankYang0529

LGTM. Thanks for the PR.

starbops

LGTM, thank you!

w13915984028 mentioned this pull request Jul 3, 2024

[BUG] Harvester kube-vip is stucking in releasing VIP harvester/harvester#4972

Open

w13915984028 force-pushed the fixwebhook branch 4 times, most recently from 818ac20 to 075494b Compare July 3, 2024 11:14

w13915984028 force-pushed the fixwebhook branch 10 times, most recently from e8db723 to cc6d434 Compare July 9, 2024 16:38

w13915984028 mentioned this pull request Jul 10, 2024

Add detailed error when lb fail to get IP harvester/cloud-provider-harvester#44

Merged

w13915984028 force-pushed the fixwebhook branch 5 times, most recently from 2321047 to 015c0b2 Compare July 11, 2024 20:06

w13915984028 changed the title ~~[WIP] Refactor LB~~ Refactor LB Jul 11, 2024

w13915984028 force-pushed the fixwebhook branch from 015c0b2 to b26747e Compare July 11, 2024 20:59

w13915984028 requested review from starbops and FrankYang0529 July 11, 2024 21:00

w13915984028 force-pushed the fixwebhook branch from b26747e to 0eac441 Compare July 11, 2024 21:07

Refactor LB

d2bd347

Allow place-holder of LB without any VM instance existing Fix IP duplicated allocation or release Simply VMI controller Enhance health check probe Refactor lb controller, move lb to none-ready status in various error cases

w13915984028 force-pushed the fixwebhook branch from 0eac441 to d2bd347 Compare July 11, 2024 21:11

w13915984028 mentioned this pull request Jul 12, 2024

[BUG] Harvester LB is not correctly syncing with VMI harvester/harvester#5316

Closed

FrankYang0529 reviewed Jul 23, 2024

View reviewed changes

pkg/webhook/loadbalancer/validator.go Show resolved Hide resolved

FrankYang0529 reviewed Jul 23, 2024

View reviewed changes

pkg/lb/servicelb/manager.go Outdated Show resolved Hide resolved

FrankYang0529 reviewed Jul 23, 2024

View reviewed changes

pkg/prober/worker.go Show resolved Hide resolved

starbops reviewed Jul 31, 2024

View reviewed changes

starbops reviewed Aug 1, 2024

View reviewed changes

w13915984028 force-pushed the fixwebhook branch 3 times, most recently from 035a24e to b938e34 Compare August 9, 2024 18:39

w13915984028 requested review from starbops and FrankYang0529 August 9, 2024 18:43

w13915984028 force-pushed the fixwebhook branch from b938e34 to 53108d3 Compare August 9, 2024 18:48

Update minor typos and add dummy endpoint when necessary

83e1432

Signed-off-by: Jian Wang <[email protected]>

w13915984028 force-pushed the fixwebhook branch from 53108d3 to 83e1432 Compare August 9, 2024 19:03

FrankYang0529 approved these changes Aug 21, 2024

View reviewed changes

starbops approved these changes Aug 29, 2024

View reviewed changes

starbops merged commit df4c506 into harvester:master Aug 29, 2024
5 checks passed

This was referenced Aug 29, 2024

Bump harvester-load-balancer to v0.4.1 for Harvester v1.4.0 harvester/charts#275

Closed

Bump LB image tag to v0.4.1 harvester/harvester#6459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor LB #31

Refactor LB #31

w13915984028 commented Jul 3, 2024 •

edited

Loading

w13915984028 commented Jul 3, 2024 •

edited

Loading

starbops left a comment

starbops Jul 31, 2024

w13915984028 Aug 9, 2024 •

edited

Loading

starbops Aug 1, 2024

w13915984028 Aug 9, 2024

w13915984028 commented Aug 9, 2024 •

edited

Loading

FrankYang0529 left a comment

starbops left a comment

Refactor LB #31

Refactor LB #31

Conversation

w13915984028 commented Jul 3, 2024 • edited Loading

w13915984028 commented Jul 3, 2024 • edited Loading

starbops left a comment

Choose a reason for hiding this comment

starbops Jul 31, 2024

Choose a reason for hiding this comment

w13915984028 Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

starbops Aug 1, 2024

Choose a reason for hiding this comment

w13915984028 Aug 9, 2024

Choose a reason for hiding this comment

w13915984028 commented Aug 9, 2024 • edited Loading

FrankYang0529 left a comment

Choose a reason for hiding this comment

starbops left a comment

Choose a reason for hiding this comment

w13915984028 commented Jul 3, 2024 •

edited

Loading

w13915984028 commented Jul 3, 2024 •

edited

Loading

w13915984028 Aug 9, 2024 •

edited

Loading

w13915984028 commented Aug 9, 2024 •

edited

Loading