-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition between netpol and IPVS based ipset updates #1732
Comments
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Not stale. |
Hi @alexcriss I'm trying to replicate your issue but I wasn't able to do it. On your setup do you have other services created and only one is getting the issue? Do you have network policies configured? |
Hi @rbrtbnfgl, We have multiple services, all of them are impacted. These services all have ExternalIPs announced by BGP and wwe see traffic failing on those IPs. They are also all target of Network Policies, which allow traffic to said ExternalIPs only by specific IPs. So yeah, we send traffic to the ExternalIP, and the IPs that the Netowrk Policies allow to send traffic there are not in the ipset that allow traffic at the iptables layer. Hopefully this helps, I am here for any other question! |
Could it be possible related to the network policy defined? How you defined it? I tried using |
The netpol we use has multiple entries, we match on pod selectors and on raw IPs. I am not really sure it matters though, since the IP that is not getting set in the A stripped down version of the netpol looks like
|
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
not stale :) |
sorry I didn't have time to look at it lately. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
not stale |
@alexcriss - After spending some time trying to reproduce this @rbrtbnfgl and myself were never able to reproduce this. However, race conditions can be tricky on things like this. It likely has to do with the size of your cluster, how many network policies you have, specific timing and the like. After reviewing the logic, we do think we could see how kube-router's state could become unaligned with ipset state loaded in by the NetworkServicesController. Essentially ipsets only get saved in the NPC when @rbrtbnfgl added what we believe to be a fix for this issue in #1806, would you be able to test it and see if that resolves the issue that you are experiencing? If it helps, I built a container containing this change and pushed it to docker hub: https://hub.docker.com/layers/cloudnativelabs/kube-router-git/amd64-prs1806/images/sha256-a3b126d49890b40408e5cd010806af1d28dab785fa7585d61f4e6708036e1e3a
|
@alexcriss ping - It would be really good to know if the patch in #1806 fixes the issue for you before we merge it since @rbrtbnfgl and myself are unable to reproduce this issue in our cluster. Do you have time to test this, in the next couple of days? |
Sorry, i saw the comment, and did not have the time to look properly. We have been running basically the same patch and it solves the issue. We also call Will post what we use later, but this is resolving it for us. |
This is what we use, which looks exactly like your commit for the
As I mentioned before, we added a |
What happened?
I am observing a race condition between the
NetworkPolicyController
and theNetworkServicesController
when updating IPVS entries. The scenario is as follow:kube-router
runs the periodicsyncIpvsFirewall
and adds the ExternalIP to thekube-router-svip-prt
ipset. Here traffic to the ExternalIP coming from other nodes start being ACCEPT-ed byiptables
. At this stage,NetworkServicesController
also adds the ExternalIP to theipSetHandlers
map it maintains in memory.kube-router
runssyncNetworkPolicyChains
. This refreshes ipsets to include IPs contained in NetworkPolicies, starting from the in memory values that theNetworkPolicyController
holds in itsipSetHandlers
.NetworkPolicyController
ipSetHandlers
map doesn't know anything about the ExternalIP that was added by theNetworkServicesController
, and hence it is removed fromkube-router-svip-prt
. Traffic to the ExternalIP gets REJECT-ed by itlables, untilsyncIpvsFirewall
runs again.What did you expect to happen?
The ExternalIPs of services should be added to the
kube-router-svip-prt
ipset and remain there, instead of getting removed and re-added.How can we reproduce the behavior you experienced?
Steps to reproduce the behavior:
a.b.c.d
.kube-router-svip-prt
ipset on the host where the pod started withipset list kube-router-svip-prt | grep -P "a\.b\.c\.d"
kube-router
runssyncIpvsFirewall
and will disappear whenkube-router
runsfullPolicySync
.System Information (please complete the following information)
Kube-Router Version (
kube-router --version
):Running kube-router version v2.1.0-11-gac6b898c, built on 2024-03-18T20:39:38+0100, go1.22.0
Kube-Router Parameters:
kubectl version
) : 1.27.13Logs, other output, metrics
This is what i see in logs (I extracted the relevant parts, the full run is attached)
When ipsets are restored by the
NetworkServicesController
thekube-router-svip-prt
contains87.250.179.246
, while when they are restored by theNetworkPolicyController
87.250.179.246
is missing.I am patching the issue for now by running
ipset.Save()
at each controller before they build their updated version, to make sure the base layer is the current config, instead of the previous inmemory content which might be outdated.kube-router-ipset-race.log
The text was updated successfully, but these errors were encountered: