Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to release IP if it has been allocated to current LB #47

Merged
merged 2 commits into from
Feb 19, 2025

Conversation

w13915984028
Copy link
Member

@w13915984028 w13915984028 commented Jan 27, 2025

To solve the duplicated allocation error when LB is frequently created and deleted.

In case LB is repeatedly created and deleted, the deletion onRemove may not be called due to UUID change on the same namespace/name object.

Solution:

  1. Check return error of IP allocation, if it contains the keyword, then try to release it.
  2. Add an annotation to manually release an dangling IP allocation record

issue:
harvester/harvester#7449

Test plan: #47 (comment)

To solve the duplicated allocation error when LB is frequently
created and deleted.

Signed-off-by: Jian Wang <[email protected]>
@w13915984028 w13915984028 force-pushed the fix7449 branch 2 times, most recently from f2219b9 to d95dfea Compare February 7, 2025 14:23
@w13915984028
Copy link
Member Author

Test plan:

(1) Create an IP pool

kind: IPPool
  name: pool1
spec:
  ranges:
  - gateway: 192.168.5.1
    rangeEnd: 192.168.5.20
    rangeStart: 192.168.5.10
    subnet: 192.168.5.0/24
  selector: {}

(1) Set an old image tag rancher/harvester-load-balancer:v0.4.4 on the LB to produce the error via following script

When creating & deleting a same object frequently, it will trigger the bug harvester/harvester#7449

cat > cluster1-lb-3.yaml << 'EOF'
apiVersion: loadbalancer.harvesterhci.io/v1beta1
kind: LoadBalancer
metadata:
  annotations:
    loadbalancer.harvesterhci.io/namespace: default
    loadbalancer.harvesterhci.io/network: ""
    loadbalancer.harvesterhci.io/project: ""
    loadbalancer.harvesterhci.io/cluster: "cluster-1"
  name: cluster1-lb-3
  namespace: default
spec:
  healthCheck: {}
  ipPool: pool1
  ipam: pool
  workloadType: cluster
EOF


cat > loop-test-lb-3.sh << 'EOF'

while true;
do
 date && kubectl create -f cluster1-lb-3.yaml
 date && kubectl delete lb cluster1-lb-3
 date && kubectl create -f cluster1-lb-3.yaml
 date && kubectl get lb cluster1-lb-3 -ojsonpath="{.status}"
done
EOF
chmod +x loop-test-lb-3.sh

./loop-test-lb-3.sh

The pod harvester-load-balancer-*-* will have such error log

time="2025-02-07T13:50:06Z" level=error msg="error syncing 'default/cluster1-lb-3': handler harvester-lb-controller: 192.168.5.12 has been allocated to default/cluster1-lb-3, duplicate allocation is not allowed, requeuing"
time="2025-02-07T13:50:36Z" level=error msg="error syncing 'default/cluster1-lb-3': handler harvester-lb-controller: 192.168.5.12 has been allocated to default/cluster1-lb-3, duplicate allocation is not allowed, requeuing"

(2) Use new image (master-head), it automatically release the complained duplicated allocated IP

time="2025-02-07T13:54:03Z" level=info msg="Starting loadbalancer.harvesterhci.io/v1beta1, Kind=IPPool controller"
time="2025-02-07T13:54:03Z" level=info msg="lb default/cluster1-lb-3 error: 192.168.5.12 has been allocated to default/cluster1-lb-3, duplicate allocation is not allowed, try to release ip to pool pool1, ok"
time="2025-02-07T13:54:03Z" level=error msg="error syncing 'default/cluster1-lb-3': handler harvester-lb-controller: 192.168.5.12 has been allocated to default/cluster1-lb-3, duplicate allocation is not allowed, requeuing"
time="2025-02-07T13:54:03Z" level=info msg="lb default/cluster1-lb-3 allocate ip 192.168.5.12 from pool pool1"
time="2025-02-07T13:54:03Z" level=info msg="Starting kubevirt.io/v1, Kind=VirtualMachineInstance controller"
harv41:/home/rancher # 

(3) Manaully edit pool1 object to add following annotation, and observe the pod log

  annotations:
    loadbalancer.harvesterhci.io/manuallyReleaseIP: "192.168.5.12: default/cluster1-lb-3"

// ip is still using by a LB
time="2025-02-07T14:00:06Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.12: default/cluster1-lb-3, the lb default/cluster1-lb-3 is still existing, skip"

// ip has been released
time="2025-02-07T14:01:53Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.123: default/cluster1-lb-3, it has been released, skip"

// annotation value is with invalid format
time="2025-02-07T14:03:04Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.123:: default/cluster1-lb-3, it is not valid, skip"

(4) Repeat (1) to reproduce the error, and then delete the LB object,

$kubectl get ippools.loadbalancer pool1 -oyaml
apiVersion: loadbalancer.harvesterhci.io/v1beta1
kind: IPPool
metadata:
  creationTimestamp: "2025-01-27T13:46:08Z"
  finalizers:
  - wrangler.cattle.io/harvester-ipam-controller
  generation: 39
  labels:
    loadbalancer.harvesterhci.io/global-ip-pool: "false"
    loadbalancer.harvesterhci.io/vid: "0"
  name: pool1
  resourceVersion: "1024517"
  uid: 28a03db8-6d74-4cc7-8cdc-b7b0738d8aca
spec:
  ranges:
  - gateway: 192.168.5.1
    rangeEnd: 192.168.5.20
    rangeStart: 192.168.5.10
    subnet: 192.168.5.0/24
  selector: {}
status:
  allocated:
    192.168.5.10: default/lb1
    192.168.5.11: default/cluster1-lb-2
    192.168.5.12: default/cluster1-lb-3  // the allocation record is remaining
  available: 8
  conditions:
  - lastUpdateTime: "2025-01-27T13:46:08Z"
    status: "True"
    type: Ready
  lastAllocated: 192.168.5.12
  total: 11

(5) Replace the LB with new image, manaully edit pool1 object to add following annotation (change to IP), and observe the pod log

  annotations:
    loadbalancer.harvesterhci.io/manuallyReleaseIP: "192.168.5.12: default/cluster1-lb-3"

The pod should have such log

time="2025-02-07T14:10:32Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.12: default/cluster1-lb-3, it is successfully released"
time="2025-02-07T14:10:32Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.12: default/cluster1-lb-3, it has been released, skip"
time="2025-02-07T14:10:32Z" level=info msg="IP Pool pool1 has a manual IP release request 192.168.5.12: default/cluster1-lb-3, it has been released, skip"

check the pool, the IP was released

$kubectl get ippools.loadbalancer pool1 -oyaml
apiVersion: loadbalancer.harvesterhci.io/v1beta1
kind: IPPool
metadata:
  creationTimestamp: "2025-01-27T13:46:08Z"
  finalizers:
  - wrangler.cattle.io/harvester-ipam-controller
  generation: 40
  labels:
    loadbalancer.harvesterhci.io/global-ip-pool: "false"
    loadbalancer.harvesterhci.io/vid: "0"
  name: pool1
  resourceVersion: "1028843"
  uid: 28a03db8-6d74-4cc7-8cdc-b7b0738d8aca
spec:
  ranges:
  - gateway: 192.168.5.1
    rangeEnd: 192.168.5.20
    rangeStart: 192.168.5.10
    subnet: 192.168.5.0/24
  selector: {}
status:
  allocated:
    192.168.5.10: default/lb1
    192.168.5.11: default/cluster1-lb-2
  allocatedHistory:
    192.168.5.12: default/cluster1-lb-3 // manually freed
  available: 9
  conditions:
  - lastUpdateTime: "2025-01-27T13:46:08Z"
    status: "True"
    type: Ready
  lastAllocated: 192.168.5.12
  total: 11

Copy link
Member

@FrankYang0529 FrankYang0529 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Leave a minor comment. Thanks.


a := h.allocatorMap.Get(ipPool.Name)
if a == nil {
return ipPool, fmt.Errorf("IP Pool %s has a manual IP release request %s, fail to get allocator", ipPool.Name, ipStr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an IPPool, but there is no internal allocator, can we just create a new allocator for it and leave some log? If we return error directly, the controller may keep retrying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has already been a OnChange controller, which ensures the allocator.

a, err := ipam.NewAllocator(ipPool.Name, ipPool.Spec.Ranges, h.ipPoolCache, h.ipPoolClient)

If this place happens to get a nil allocator, then next reconciller will get it normally. That's the consideration, thanks.

@w13915984028
Copy link
Member Author

@mergify backport v1.5

Copy link

mergify bot commented Feb 14, 2025

backport v1.5

✅ Backports have been created

@w13915984028 w13915984028 merged commit 75d8b92 into harvester:master Feb 19, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants