Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: [NPM] [Linux] race when deleting/readding NetPol with CIDR rules #2978

Merged
merged 1 commit into from
Aug 30, 2024

Conversation

huntergregory
Copy link
Contributor

@huntergregory huntergregory commented Aug 29, 2024

Reason for Change:

In Linux, apply IPSets before modifying the IPSet cache when adding CIDR NetPols if the previous apply call failed.

Issue Fixed:
Fix #2977

Requirements:

Notes:
Followed steps to reproduce #2977 but now there is no issue due to the new step to apply IPSets here:
dataplane.go:331] [DataPlane] [ApplyDataPlane] [ADD-NETPOL-CIDR-PRECAUTION] starting to apply ipsets

Full logs:

I0829 20:25:12.654104       1 networkPolicyController.go:225] Network Policy default/repro is not found, may be it is deleted
I0829 20:25:12.654134       1 dataplane.go:569] [DataPlane] Remove Policy called for default/repro
I0829 20:25:12.654209       1 chain-management_linux.go:366] Executing iptables command with args [-w 60 -D AZURE-NPM-INGRESS -j AZURE-NPM-INGRESS-1739478793 -m set --match-set azure-npm-3888852483 dst -m set --match-set azure-npm-784554818 dst -m comment --comment INGRESS-POLICY-default/repro-TO-podlabel-pod:c-AND-ns-default-IN-ns-default]
I0829 20:25:12.659612       1 chain-management_linux.go:366] Executing iptables command with args [-w 60 -D AZURE-NPM-EGRESS -j AZURE-NPM-EGRESS-1739478793 -m set --match-set azure-npm-3888852483 src -m set --match-set azure-npm-784554818 src -m comment --comment EGRESS-POLICY-default/repro-FROM-podlabel-pod:c-AND-ns-default-IN-ns-default]
I0829 20:25:12.687485       1 restore.go:188] running this restore command: [iptables-nft-restore -w 60 -T filter --noflush]
I0829 20:25:12.693947       1 dataplane.go:331] [DataPlane] [ApplyDataPlane] [APPLY-DP] starting to apply ipsets
I0829 20:25:12.693984       1 ipsetmanager.go:467] [IPSetManager] dirty caches. toAddUpdateCache: to create: [], to update: [], toDeleteCache: map[cidr-repro-in-ns-default-0-1OUT:0xc0003730c0 cidr-repro-in-ns-default-0-2OUT:0xc000373160 cidr-repro-in-ns-default-0-3OUT:0xc0003731b0 cidr-repro-in-ns-default-0-4OUT:0xc000373200 cidr-repro-in-ns-default-0-5OUT:0xc000373250 cidr-repro-in-ns-default-0-6OUT:0xc0003732a0 cidr-repro-in-ns-default-0-7OUT:0xc0003732f0 cidr-repro-in-ns-default-0-8OUT:0xc000373340 cidr-repro-in-ns-default-3-5IN:0xc000372fc0]
I0829 20:25:12.694056       1 restore.go:188] running this restore command: [ipset restore]
I0829 20:25:12.695711       1 restore.go:299] continuing after line 10 for command [ipset restore]
I0829 20:25:12.695836       1 restore.go:188] running this restore command: [ipset restore]
2024/08/29 20:25:12 [1] skipping destroy line for set cidr-repro-in-ns-default-0-1OUT since the set is in use by a kernel component
2024/08/29 20:25:12 [1] error: on try number 1, failed to run command [ipset restore]. Rerunning with updated file. err: [line-number error for line [-X azure-npm-2296081723]: error running command [ipset restore] with err [exit status 1] and stdErr [ipset v7.5: Error in line 10: Set cannot be destroyed: it is in use by a kernel component
]]
2024/08/29 20:25:12 [1] skipping destroy line for set cidr-repro-in-ns-default-0-2OUT since the set is in use by a kernel component
I0829 20:25:12.698425       1 restore.go:299] continuing after line 1 for command [ipset restore]
2024/08/29 20:25:12 [1] error: on try number 2, failed to run command [ipset restore]. Rerunning with updated file. err: [line-number error for line [-X azure-npm-2668225420]: error running command [ipset restore] with err [exit status 1] and stdErr [ipset v7.5: Error in line 1: Set cannot be destroyed: it is in use by a kernel component
]]
I0829 20:25:12.699273       1 restore.go:188] running this restore command: [ipset restore]
I0829 20:25:12.706674       1 restore.go:299] continuing after line 1 for command [ipset restore]
I0829 20:25:12.706767       1 restore.go:188] running this restore command: [ipset restore]
2024/08/29 20:25:12 [1] skipping destroy line for set cidr-repro-in-ns-default-0-5OUT since the set is in use by a kernel component
2024/08/29 20:25:12 [1] error: on try number 3, failed to run command [ipset restore]. Rerunning with updated file. err: [line-number error for line [-X azure-npm-3363764167]: error running command [ipset restore] with err [exit status 1] and stdErr [ipset v7.5: Error in line 1: Set cannot be destroyed: it is in use by a kernel component
]]
I0829 20:25:12.710655       1 restore.go:299] continuing after line 1 for command [ipset restore]
I0829 20:25:12.710751       1 restore.go:188] running this restore command: [ipset restore]
2024/08/29 20:25:12 [1] skipping destroy line for set cidr-repro-in-ns-default-0-3OUT since the set is in use by a kernel component
2024/08/29 20:25:12 [1] error: on try number 4, failed to run command [ipset restore]. Rerunning with updated file. err: [line-number error for line [-X azure-npm-562284781]: error running command [ipset restore] with err [exit status 1] and stdErr [ipset v7.5: Error in line 1: Set cannot be destroyed: it is in use by a kernel component
]]
2024/08/29 20:25:12 [1] error: failed to apply ipsets: ipset restore failed when applying ipsets: Operation [RunCommandWithFile] failed with error code [999], full cmd [], full error after 5 tries, failed to run command [ipset restore] with error: error running command [ipset restore] with err [exit status 1] and stdErr [ipset v7.5: Error in line 1: Set cannot be destroyed: it is in use by a kernel component
]
2024/08/29 20:25:12 [1] syncNetPol error due to error syncing 'default/repro': [syncNetPol] error: [cleanUpNetworkPolicy] Error: failed to remove policy due to [DataPlane] [APPLY-DP] error while applying IPSets: ipset restore failed when applying ipsets: Operation [RunCommandWithFile] failed with error code [999], full cmd [], full error after 5 tries, failed to run command [ipset restore] with error: error running command [ipset restore] with err [exit status 1] and stdErr [ipset v7.5: Error in line 1: Set cannot be destroyed: it is in use by a kernel component
] when network policy is not found, requeuing
E0829 20:25:12.714336       1 networkPolicyController.go:195] error syncing 'default/repro': [syncNetPol] error: [cleanUpNetworkPolicy] Error: failed to remove policy due to [DataPlane] [APPLY-DP] error while applying IPSets: ipset restore failed when applying ipsets: Operation [RunCommandWithFile] failed with error code [999], full cmd [], full error after 5 tries, failed to run command [ipset restore] with error: error running command [ipset restore] with err [exit status 1] and stdErr [ipset v7.5: Error in line 1: Set cannot be destroyed: it is in use by a kernel component
] when network policy is not found, requeuing
I0829 20:25:12.719592       1 networkPolicyController.go:225] Network Policy default/repro is not found, may be it is deleted
I0829 20:25:12.719609       1 dataplane.go:569] [DataPlane] Remove Policy called for default/repro
I0829 20:25:12.719615       1 dataplane.go:584] [DataPlane] Policy default/repro is not found. Might been deleted already
I0829 20:25:12.719692       1 networkPolicyController.go:191] Successfully synced 'default/repro'
I0829 20:25:18.965998       1 dataplane.go:636] [DataPlane] Update Policy called for default/repro
I0829 20:25:18.966018       1 dataplane.go:639] [DataPlane] Policy default/repro is not found.
I0829 20:25:18.966025       1 dataplane.go:395] [DataPlane] Add Policy called for default/repro
I0829 20:25:18.966030       1 types.go:214] [DataPlane] enqueuing policy default/repro in netPolQueue
I0829 20:25:18.966035       1 dataplane.go:409] [DataPlane] [ADD-NETPOL] new pending netpol count: 1
I0829 20:25:18.966052       1 networkPolicyController.go:191] Successfully synced 'default/repro'
I0829 20:25:19.389008       1 dataplane.go:422] [DataPlane] adding policies [0xc0000b60b0]
I0829 20:25:19.389048       1 dataplane.go:331] [DataPlane] [ApplyDataPlane] [ADD-NETPOL-CIDR-PRECAUTION] starting to apply ipsets
I0829 20:25:19.389085       1 ipsetmanager.go:467] [IPSetManager] dirty caches. toAddUpdateCache: to create: [], to update: [], toDeleteCache: map[cidr-repro-in-ns-default-0-1OUT:0xc0003730c0 cidr-repro-in-ns-default-0-2OUT:0xc000373160 cidr-repro-in-ns-default-0-3OUT:0xc0003731b0 cidr-repro-in-ns-default-0-4OUT:0xc000373200 cidr-repro-in-ns-default-0-5OUT:0xc000373250 cidr-repro-in-ns-default-0-6OUT:0xc0003732a0 cidr-repro-in-ns-default-0-7OUT:0xc0003732f0 cidr-repro-in-ns-default-0-8OUT:0xc000373340 cidr-repro-in-ns-default-3-5IN:0xc000372fc0]
I0829 20:25:19.389212       1 restore.go:188] running this restore command: [ipset restore]
I0829 20:25:19.392142       1 dataplane.go:336] [DataPlane] [ApplyDataPlane] [ADD-NETPOL-CIDR-PRECAUTION] finished applying ipsets
I0829 20:25:19.392272       1 dataplane.go:331] [DataPlane] [ApplyDataPlane] [ADD-NETPOL] starting to apply ipsets
I0829 20:25:19.392313       1 ipsetmanager.go:467] [IPSetManager] dirty caches. toAddUpdateCache: to create: [cidr-repro-in-ns-default-0-3OUT: &{membersToAdd:map[139.0.0.0/32:{}] membersToDelete:map[]},cidr-repro-in-ns-default-0-5OUT: &{membersToAdd:map[139.0.0.0/32:{}] membersToDelete:map[]},cidr-repro-in-ns-default-3-5IN: &{membersToAdd:map[10.224.0.147/32:{}] membersToDelete:map[]},cidr-repro-in-ns-default-0-1OUT: &{membersToAdd:map[10.0.0.0/32:{}] membersToDelete:map[]},cidr-repro-in-ns-default-0-2OUT: &{membersToAdd:map[10.0.0.0/32:{}] membersToDelete:map[]},cidr-repro-in-ns-default-0-8OUT: &{membersToAdd:map[51.0.0.0/32:{}] membersToDelete:map[]},cidr-repro-in-ns-default-0-4OUT: &{membersToAdd:map[139.0.0.0/32:{}] membersToDelete:map[]},cidr-repro-in-ns-default-0-6OUT: &{membersToAdd:map[139.0.0.0/32:{}] membersToDelete:map[]},cidr-repro-in-ns-default-0-7OUT: &{membersToAdd:map[18.0.0.0/32:{}] membersToDelete:map[]}], to update: [], toDeleteCache: map[]
I0829 20:25:19.392381       1 restore.go:188] running this restore command: [ipset restore]
I0829 20:25:19.395198       1 dataplane.go:336] [DataPlane] [ApplyDataPlane] [ADD-NETPOL] finished applying ipsets
I0829 20:25:19.395374       1 restore.go:188] running this restore command: [iptables-nft-restore -w 60 -T filter --noflush]
I0829 20:25:19.401905       1 dataplane.go:427] [DataPlane] [BACKGROUND] added policies successfully

@huntergregory huntergregory added npm Related to NPM. linux labels Aug 29, 2024
@huntergregory huntergregory requested a review from a team as a code owner August 29, 2024 19:56
@huntergregory huntergregory requested a review from matmerr August 29, 2024 19:56
@huntergregory huntergregory force-pushed the huntergregory/npm-cidr-netpol-readd branch 3 times, most recently from bba689a to e8272be Compare August 29, 2024 20:15
@huntergregory huntergregory force-pushed the huntergregory/npm-cidr-netpol-readd branch from e8272be to 50af502 Compare August 29, 2024 20:24
@huntergregory
Copy link
Contributor Author

/azp run Azure Container Networking PR

@huntergregory
Copy link
Contributor Author

/azp run NPM Conformance Tests

@huntergregory
Copy link
Contributor Author

/azp run NPM Scale Test

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

2 similar comments
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@huntergregory huntergregory added this pull request to the merge queue Aug 29, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 29, 2024
@huntergregory huntergregory added this pull request to the merge queue Aug 29, 2024
@huntergregory huntergregory changed the title fix: [NPM] [Linux] race when deleting/readding CIDR NetPol fix: [NPM] [Linux] race when deleting/readding NetPol Aug 30, 2024
@huntergregory huntergregory removed this pull request from the merge queue due to a manual request Aug 30, 2024
@huntergregory huntergregory added this pull request to the merge queue Aug 30, 2024
@huntergregory huntergregory changed the title fix: [NPM] [Linux] race when deleting/readding NetPol fix: [NPM] [Linux] race when deleting/readding NetPol with CIDR rules Aug 30, 2024
Merged via the queue into master with commit e64d9d8 Aug 30, 2024
33 checks passed
@huntergregory huntergregory deleted the huntergregory/npm-cidr-netpol-readd branch August 30, 2024 03:27
huntergregory added a commit that referenced this pull request Aug 30, 2024
github-merge-queue bot pushed a commit that referenced this pull request Aug 31, 2024
[backport] fix: [NPM] [Linux] race when deleting/readding NetPol (#2978)

Signed-off-by: Hunter Gregory <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linux npm Related to NPM.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NPM] [Linux] another race condition when editing a NetPol or deleting then readding it
2 participants