Try to release IP if it has been allocated to current LB #47

w13915984028 · 2025-01-27T21:09:42Z

To solve the duplicated allocation error when LB is frequently created and deleted.

In case LB is repeatedly created and deleted, the deletion onRemove may not be called due to UUID change on the same namespace/name object.

Solution:

Check return error of IP allocation, if it contains the keyword, then try to release it.
Add an annotation to manually release an dangling IP allocation record

issue:
harvester/harvester#7449

Test plan: #47 (comment)

To solve the duplicated allocation error when LB is frequently created and deleted. Signed-off-by: Jian Wang <[email protected]>

w13915984028 · 2025-02-07T14:47:44Z

Test plan:

(1) Create an IP pool

kind: IPPool
  name: pool1
spec:
  ranges:
  - gateway: 192.168.5.1
    rangeEnd: 192.168.5.20
    rangeStart: 192.168.5.10
    subnet: 192.168.5.0/24
  selector: {}

(1) Set an old image tag rancher/harvester-load-balancer:v0.4.4 on the LB to produce the error via following script

When creating & deleting a same object frequently, it will trigger the bug harvester/harvester#7449

cat > cluster1-lb-3.yaml << 'EOF'
apiVersion: loadbalancer.harvesterhci.io/v1beta1
kind: LoadBalancer
metadata:
  annotations:
    loadbalancer.harvesterhci.io/namespace: default
    loadbalancer.harvesterhci.io/network: ""
    loadbalancer.harvesterhci.io/project: ""
    loadbalancer.harvesterhci.io/cluster: "cluster-1"
  name: cluster1-lb-3
  namespace: default
spec:
  healthCheck: {}
  ipPool: pool1
  ipam: pool
  workloadType: cluster
EOF


cat > loop-test-lb-3.sh << 'EOF'

while true;
do
 date && kubectl create -f cluster1-lb-3.yaml
 date && kubectl delete lb cluster1-lb-3
 date && kubectl create -f cluster1-lb-3.yaml
 date && kubectl get lb cluster1-lb-3 -ojsonpath="{.status}"
done
EOF
chmod +x loop-test-lb-3.sh

./loop-test-lb-3.sh

The pod harvester-load-balancer-*-* will have such error log

time="2025-02-07T13:50:06Z" level=error msg="error syncing 'default/cluster1-lb-3': handler harvester-lb-controller: 192.168.5.12 has been allocated to default/cluster1-lb-3, duplicate allocation is not allowed, requeuing"
time="2025-02-07T13:50:36Z" level=error msg="error syncing 'default/cluster1-lb-3': handler harvester-lb-controller: 192.168.5.12 has been allocated to default/cluster1-lb-3, duplicate allocation is not allowed, requeuing"

(2) Use new image (master-head), it automatically release the complained duplicated allocated IP

time="2025-02-07T13:54:03Z" level=info msg="Starting loadbalancer.harvesterhci.io/v1beta1, Kind=IPPool controller"
time="2025-02-07T13:54:03Z" level=info msg="lb default/cluster1-lb-3 error: 192.168.5.12 has been allocated to default/cluster1-lb-3, duplicate allocation is not allowed, try to release ip to pool pool1, ok"
time="2025-02-07T13:54:03Z" level=error msg="error syncing 'default/cluster1-lb-3': handler harvester-lb-controller: 192.168.5.12 has been allocated to default/cluster1-lb-3, duplicate allocation is not allowed, requeuing"
time="2025-02-07T13:54:03Z" level=info msg="lb default/cluster1-lb-3 allocate ip 192.168.5.12 from pool pool1"
time="2025-02-07T13:54:03Z" level=info msg="Starting kubevirt.io/v1, Kind=VirtualMachineInstance controller"
harv41:/home/rancher #

(3) Manaully edit pool1 object to add following annotation, and observe the pod log

  annotations:
    loadbalancer.harvesterhci.io/manuallyReleaseIP: "192.168.5.12: default/cluster1-lb-3"

// ip is still using by a LB
time="2025-02-07T14:00:06Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.12: default/cluster1-lb-3, the lb default/cluster1-lb-3 is still existing, skip"

// ip has been released
time="2025-02-07T14:01:53Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.123: default/cluster1-lb-3, it has been released, skip"

// annotation value is with invalid format
time="2025-02-07T14:03:04Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.123:: default/cluster1-lb-3, it is not valid, skip"

(4) Repeat (1) to reproduce the error, and then delete the LB object,

$kubectl get ippools.loadbalancer pool1 -oyaml
apiVersion: loadbalancer.harvesterhci.io/v1beta1
kind: IPPool
metadata:
  creationTimestamp: "2025-01-27T13:46:08Z"
  finalizers:
  - wrangler.cattle.io/harvester-ipam-controller
  generation: 39
  labels:
    loadbalancer.harvesterhci.io/global-ip-pool: "false"
    loadbalancer.harvesterhci.io/vid: "0"
  name: pool1
  resourceVersion: "1024517"
  uid: 28a03db8-6d74-4cc7-8cdc-b7b0738d8aca
spec:
  ranges:
  - gateway: 192.168.5.1
    rangeEnd: 192.168.5.20
    rangeStart: 192.168.5.10
    subnet: 192.168.5.0/24
  selector: {}
status:
  allocated:
    192.168.5.10: default/lb1
    192.168.5.11: default/cluster1-lb-2
    192.168.5.12: default/cluster1-lb-3  // the allocation record is remaining
  available: 8
  conditions:
  - lastUpdateTime: "2025-01-27T13:46:08Z"
    status: "True"
    type: Ready
  lastAllocated: 192.168.5.12
  total: 11

(5) Replace the LB with new image, manaully edit pool1 object to add following annotation (change to IP), and observe the pod log

  annotations:
    loadbalancer.harvesterhci.io/manuallyReleaseIP: "192.168.5.12: default/cluster1-lb-3"

The pod should have such log

time="2025-02-07T14:10:32Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.12: default/cluster1-lb-3, it is successfully released"
time="2025-02-07T14:10:32Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.12: default/cluster1-lb-3, it has been released, skip"
time="2025-02-07T14:10:32Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.12: default/cluster1-lb-3, it has been released, skip"

check the pool, the IP was released

$kubectl get ippools.loadbalancer pool1 -oyaml
apiVersion: loadbalancer.harvesterhci.io/v1beta1
kind: IPPool
metadata:
  creationTimestamp: "2025-01-27T13:46:08Z"
  finalizers:
  - wrangler.cattle.io/harvester-ipam-controller
  generation: 40
  labels:
    loadbalancer.harvesterhci.io/global-ip-pool: "false"
    loadbalancer.harvesterhci.io/vid: "0"
  name: pool1
  resourceVersion: "1028843"
  uid: 28a03db8-6d74-4cc7-8cdc-b7b0738d8aca
spec:
  ranges:
  - gateway: 192.168.5.1
    rangeEnd: 192.168.5.20
    rangeStart: 192.168.5.10
    subnet: 192.168.5.0/24
  selector: {}
status:
  allocated:
    192.168.5.10: default/lb1
    192.168.5.11: default/cluster1-lb-2
  allocatedHistory:
    192.168.5.12: default/cluster1-lb-3 // manually freed
  available: 9
  conditions:
  - lastUpdateTime: "2025-01-27T13:46:08Z"
    status: "True"
    type: Ready
  lastAllocated: 192.168.5.12
  total: 11

Signed-off-by: Jian Wang <[email protected]>

FrankYang0529

Overall LGTM. Leave a minor comment. Thanks.

FrankYang0529 · 2025-02-13T06:28:12Z

pkg/controller/ippool/controller.go

+
+	a := h.allocatorMap.Get(ipPool.Name)
+	if a == nil {
+		return ipPool, fmt.Errorf("IP Pool %s has a manual IP release request %s, fail to get allocator", ipPool.Name, ipStr)


If there is an IPPool, but there is no internal allocator, can we just create a new allocator for it and leave some log? If we return error directly, the controller may keep retrying.

There has already been a OnChange controller, which ensures the allocator.

load-balancer-harvester/pkg/controller/ippool/controller.go

Line 63 in 1741161

a, err := ipam.NewAllocator(ipPool.Name, ipPool.Spec.Ranges, h.ipPoolCache, h.ipPoolClient)

If this place happens to get a nil allocator, then next reconciller will get it normally. That's the consideration, thanks.

w13915984028 · 2025-02-14T11:29:30Z

@mergify backport v1.5

mergify · 2025-02-14T11:29:34Z

backport v1.5

✅ Backports have been created

#50 Try to release IP if it has been allocated to current LB (backport #47) has been created for branch v1.5

Try to release IP if it has been allocated to current LB

5ada963

To solve the duplicated allocation error when LB is frequently created and deleted. Signed-off-by: Jian Wang <[email protected]>

w13915984028 force-pushed the fix7449 branch 2 times, most recently from f2219b9 to d95dfea Compare February 7, 2025 14:23

w13915984028 requested review from ihcsim, starbops and FrankYang0529 February 7, 2025 14:48

w13915984028 mentioned this pull request Feb 7, 2025

[BUG] Guest Kubernetes Cluster Loadbalancer IP assignment can become permenently stuck in IPAM mode harvester/harvester#7449

Open

w13915984028 force-pushed the fix7449 branch from d95dfea to 2929ac2 Compare February 7, 2025 15:08

Allow manually release an IP from IPPool

fe46cec

Signed-off-by: Jian Wang <[email protected]>

w13915984028 force-pushed the fix7449 branch from 2929ac2 to fe46cec Compare February 7, 2025 15:16

FrankYang0529 approved these changes Feb 13, 2025

View reviewed changes

w13915984028 merged commit 75d8b92 into harvester:master Feb 19, 2025
5 checks passed

mergify bot mentioned this pull request Feb 19, 2025

Try to release IP if it has been allocated to current LB (backport #47) #50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to release IP if it has been allocated to current LB #47

Try to release IP if it has been allocated to current LB #47

w13915984028 commented Jan 27, 2025 •

edited

Loading

w13915984028 commented Feb 7, 2025

FrankYang0529 left a comment

FrankYang0529 Feb 13, 2025

w13915984028 Feb 13, 2025

w13915984028 commented Feb 14, 2025

mergify bot commented Feb 14, 2025 •

edited

Loading

Try to release IP if it has been allocated to current LB #47

Try to release IP if it has been allocated to current LB #47

Conversation

w13915984028 commented Jan 27, 2025 • edited Loading

w13915984028 commented Feb 7, 2025

FrankYang0529 left a comment

Choose a reason for hiding this comment

FrankYang0529 Feb 13, 2025

Choose a reason for hiding this comment

w13915984028 Feb 13, 2025

Choose a reason for hiding this comment

w13915984028 commented Feb 14, 2025

mergify bot commented Feb 14, 2025 • edited Loading

✅ Backports have been created

w13915984028 commented Jan 27, 2025 •

edited

Loading

mergify bot commented Feb 14, 2025 •

edited

Loading