-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: [NPM] [LINUX] add NetPols in background #1969
Conversation
21bda59
to
849b570
Compare
npm/config/config.go
Outdated
@@ -4,6 +4,8 @@ import "github.com/Azure/azure-container-networking/npm/util" | |||
|
|||
const ( | |||
defaultResyncPeriod = 15 | |||
defaultNetPolMaxBatches = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we cannot reuse the defaultApplyMaxBatches and defaultApplyInterval ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline to have separate threads/config for windows and Linux
// 1. Delete jump rules from ingress/egress chains to ingress/egress policy chains. | ||
// We ought to delete these jump rules here in the foreground since if we add an NP back after deleting, iptables-restore --noflush can add duplicate jump rules. | ||
deleteErr := pMgr.deleteOldJumpRulesOnRemove(networkPolicy) | ||
if deleteErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to ignore errors here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually already ignore doesNotExistErrorCode
in case the rule doesn't exist
TODO: only apply netpol in background if nf-tables is present |
@@ -7,6 +7,8 @@ const ( | |||
defaultApplyMaxBatches = 100 | |||
defaultApplyInterval = 500 | |||
defaultMaxBatchedACLsPerPod = 30 | |||
defaultMaxPendingNetPols = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt 100 too much ? can we have something like 10 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with MaxPendingNetPols=10
. This doesn't seem to give us much performance boost. Took 12 hours to add 640 NetPols. With MaxPendingNetPols=100
, the same takes <20 minutes (table in PR description has more details)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With MaxPendingNetPols=10
, I also saw 1/10 NPM Pods crash with this error after adding 625/640 NetPols, 11.5 hours after NPM came online:
I0622 05:55:28.926162 1 trace.go:219] Trace[1241999867]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169 (22-Jun-2023 05:54:24.710) (total time: 64119ms):
Trace[1241999867]: ---"Objects listed" error:<nil> 64118ms (05:55:28.828)
Trace[1241999867]: [1m4.119544167s] [1m4.119544167s] END
I0622 05:55:29.727548 1 trace.go:219] Trace[257377008]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169 (22-Jun-2023 05:54:24.512) (total time: 65114ms):
Trace[257377008]: ---"Objects listed" error:<nil> 65018ms (05:55:29.531)
Trace[257377008]: [1m5.114375378s] [1m5.114375378s] END
|
||
// Prevent netpol in background unless we're in Linux and using nftables. | ||
// This step must be performed after bootupDataplane() because it calls util.DetectIptablesVersion(), which sets the proper value for util.Iptables | ||
dp.netPolInBackground = cfg.NetPolInBackground && !util.IsWindowsDP() && (strings.Contains(util.Iptables, "nft") || dp.debug) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we saying we will only make this change for nft ? why not for legacy too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline to only do this for nft. We will keep codepath the same for legacy, which doesn't need the performance boost
// The caller must lock netPolQueue. | ||
func (dp *DataPlane) addPoliciesWithRetry(context string) { | ||
netPols := dp.netPolQueue.dump() | ||
klog.Infof("[DataPlane] adding policies %+v", netPols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a debug statement ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we don't have debug logging capability via klog
npm/pkg/dataplane/dataplane.go
Outdated
// 2. get Endpoints | ||
var err error | ||
var endpointList map[string]string | ||
if !dp.inBootupPhase() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this condition correct ? the only condition for wasInBootPhase to be true is if dp.inBootupPhase() returns true.
here you are checking the opposite and wasInBootPhase , i think we will never hit this condition ? unless i am missing something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to protect against a race:
- DP in bootup phase.
- dp.incrementBatchAndApplyIfNeeded() is called in last iteration of above loop.
- dp.FinishBootupPhase() in another thread
Few more details in code comment below
3503c1c
to
6ad7639
Compare
// addPoliciesWithRetry tries adding all policies. If this fails, it tries adding policies one by one. | ||
// The caller must lock netPolQueue. | ||
func (dp *DataPlane) addPoliciesWithRetry(context string) { | ||
netPols := dp.netPolQueue.dump() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storing a queue of netpols as a map in background, but when applied we convert from map back to slice, any reason to not store background queue as a slice and then avoid this conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having the map will help with dp.RemovePolicy()
, where we remove the NetPol from the queue
/azp run |
No commit pushedDate could be found for PR 1969 in repo Azure/azure-container-networking |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s), but failed to run 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s), but failed to run 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s), but failed to run 1 pipeline(s). |
@huntergregory why not update the iptables binary that is used in the docker image? |
Hi @xanecs, the limitation seems to be in the iptables version installed on the AKS node |
Reason for Change:
NPM must use nftables on Ubuntu 22 (k8s 1.25+). NPM SysCalls into nftables take much longer than into legacy iptables. Customers with ~100+ NetPols can see:
Issue Fixed:
Requirements:
Notes:
Latency Comparison
Ran 2 experiments for each version. Used b-series burstable VMs.
Cluster has 640 services with 2 Pods each.
Fix: Long Term
nftables do not scale for services · Issue #96018 · kubernetes/kubernetes (github.com)
iptables v1.8.8 has scale improvements. However, AKS is on iptables v1.8.4.
Fix: Near Term (this PR)
Optimizing for ADD NetworkPolicy.
Current Design
ADD NetworkPolicy:
This PR
Since iptables SysCalls take so long, add iptables rules for multiple NetworkPolicies at once.

This change will only take effect if
nftables
is present on machine & NPM's ConfigMap hasNetPolInBackground=true
.Other Changes in this PR
Pipelines
Metrics
Adds the following metrics:
npm_linux_iptables_restore_latency_seconds
npm_linux_iptables_delete_latency_seconds_bucket