-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: [NPM] [LINUX] add NetPols in background #1969
Merged
Merged
Changes from all commits
Commits
Show all changes
88 commits
Select commit
Hold shift + click to select a range
01917c6
wip: apply dirty NetPols every 500ms in Linux
huntergregory 521417e
only build npm linux image
huntergregory c6b0da6
fix: check for empty cache
huntergregory 2b23891
feat: toggle for netpol interval. default 500 ms
huntergregory 6f3538c
ci: remove stages "build binaries" and "run windows tests"
huntergregory b664e9d
wip: max batched netpols (toggle-specified)
huntergregory b84f11b
ci: remove manifest build/push for win npm
huntergregory 1f9767a
wip: handle ipset deletion properly and max batch for delete too
huntergregory ab494e5
fix: correct remove policy
huntergregory 9d400d3
fix: only remove policy if it was in kernel
huntergregory a914945
finalize toggles, allowing ability to turn off iptablesInBackground
huntergregory 67743b3
ci: conf + cyc use PR's configmaps
huntergregory 71080f0
fix: lints
huntergregory 1ec9b81
fix dp toggle: iptablesInBackground
huntergregory adb4473
fix lock typo and config logging
huntergregory 5b3169f
fix background thread. add comments. only add tmp ref when enabled
huntergregory ef0efe2
copy pod selector list
huntergregory 940337d
fix: removepolicy needs namespace too
huntergregory 9e5e699
rename opInfo to event
huntergregory abf3638
fix: fix references and prevent concurrent map read/write
huntergregory c27e507
tmp: debug logging
huntergregory 0336346
fix: missing set references by swap keys and values
huntergregory 1e8d6af
Revert "tmp: debug logging"
huntergregory 05ba523
fix: add podSelectorList to fake NetPol
huntergregory 1da93f2
log: do not print error when failing to delete non-existent nft rule
huntergregory 546857d
log: verbose iptables bootup
huntergregory f7e06f3
log: use fmt.Errorf for clean logging
huntergregory c876d86
log: never return error for iptables in background and fix some lints
huntergregory 8267e80
fix: activate/deactivate azure chain rules
huntergregory ed5de88
fix: correctly decrement netpols in kernel
huntergregory 9a3cd62
ci: run UTs again
huntergregory 173fa77
ci: update profiles. default to placefirst=false
huntergregory d904251
address comment: rename batch to pendingPolicy
huntergregory bcf4ee7
refactor: make dirty cache OS-specific
huntergregory ed82252
test: UTs
huntergregory f0329ba
test: put UT cfg back to placefirst to not break things
huntergregory b8de686
ci: update cyclonus workflows
huntergregory 807f6df
fmt: address comment & lint
huntergregory 2cc8283
fmt: rename numInKernel to policiesInKernel
huntergregory a0b0335
log: switch to fmt.Errorf
huntergregory 134ec4e
fmt: whitespace
huntergregory 9dd332f
feat: resiliency to errors while reconciling dirty netpols
huntergregory b8a11ca
log: temporarily print everything for ipset restore
huntergregory 10e3257
fix: remove nomatch from ipset -D for cidr blocks
huntergregory 416d981
test: UTs for non-happy path
huntergregory ddcfaee
test: fix hns fake
huntergregory 57d7c17
fix: don't change windows. let it delete ipsets when removing policies
huntergregory 1d51247
fix windows lint
huntergregory 23ee4f1
fix: ignore chain doesn't exist errors for iptables -D
huntergregory 8518bda
feat: latency and failure metrics
huntergregory 76b37c7
test: update exit code for UT
huntergregory ae9d755
metrics: new metrics should go in node-metrics path
huntergregory 8f395ec
Merge branch 'master' into hgregory/05-18-netpol-background
huntergregory bb70d52
Merge branch 'master' into hgregory/05-18-netpol-background
huntergregory 7df05fa
style: simplify nesting
huntergregory 3700763
style: move identical windows & linux code to shared file
huntergregory a95bef4
ci: remove v1 conformance and cyclonus
huntergregory eace676
feat: add NetPols in background from the DP (revert background code i…
huntergregory d96413d
style: remove "background" from iptables metrics
huntergregory dec20ed
revert changes in ipsetmanager, const.go, and dp.Remove/UpdatePolicy
huntergregory a454548
style: whitespace
huntergregory 9502a42
perf: use len() instead of creating slice from map
huntergregory ca5e7ce
remove verbosity for iptables bootup
huntergregory c72ccdf
build: add return statement
huntergregory e952178
style: whitespace
huntergregory 86cce28
build: fix variable shadowing
huntergregory badd2c7
build: fix more import shadowing
huntergregory cb2b426
build: windows pointer issue and UT issue
huntergregory 0dc1df0
test: fix UT for iptables error code 2
huntergregory 82e8497
ci: enable linux scale test
huntergregory 6e0dc1b
ci: revert to master pipeline.yaml
huntergregory 695fc1f
revert changes to chain-management. do changes in PR #2012
huntergregory c89c7f8
Merge branch 'master' into hgregory/05-18-netpol-background
huntergregory 483123e
log: change wording
huntergregory 9e9812e
test: UTs for netpol in background
huntergregory 3ea0bab
log: wording
huntergregory 12b1318
feat: apply ipsets for each netpol individually
huntergregory f754343
config: rearrange ConfigMap & update capz yaml
huntergregory 43afc64
fix: windows bootup phase logic for addpolicy
huntergregory ff1dac6
feat: restrict netpol in background to linux + nftables
huntergregory 36da0d7
test: skip nftables check for UT
huntergregory 9a117d2
Merge branch 'master' into hgregory/05-18-netpol-background
vakalapa f615436
style: netpols[0] instead of loop
huntergregory 33732fd
log: address log comments
huntergregory 6ad7639
Merge branch 'master' into hgregory/05-18-netpol-background
huntergregory f337aee
style: lint for long line
huntergregory cd40a02
Merge branch 'master' into hgregory/05-18-netpol-background
huntergregory 35f07f3
Merge branch 'master' into hgregory/05-18-netpol-background
huntergregory File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
package metrics | ||
|
||
import ( | ||
"github.com/prometheus/client_golang/prometheus" | ||
) | ||
|
||
func RecordIPTablesRestoreLatency(timer *Timer, op OperationKind) { | ||
labels := prometheus.Labels{ | ||
operationLabel: string(op), | ||
} | ||
itpablesRestoreLatency.With(labels).Observe(timer.timeElapsed()) | ||
} | ||
|
||
func RecordIPTablesDeleteLatency(timer *Timer) { | ||
iptablesDeleteLatency.Observe(timer.timeElapsed()) | ||
} | ||
|
||
func IncIPTablesRestoreFailures(op OperationKind) { | ||
labels := prometheus.Labels{ | ||
operationLabel: string(op), | ||
} | ||
iptablesRestoreFailures.With(labels).Inc() | ||
} | ||
|
||
func TotalIPTablesRestoreLatencyCalls(op OperationKind) (int, error) { | ||
return histogramVecCount(itpablesRestoreLatency, prometheus.Labels{ | ||
operationLabel: string(op), | ||
}) | ||
} | ||
|
||
func TotalIPTablesDeleteLatencyCalls() (int, error) { | ||
collector, ok := iptablesDeleteLatency.(prometheus.Collector) | ||
if !ok { | ||
return 0, errNotCollector | ||
} | ||
return histogramCount(collector) | ||
} | ||
|
||
func TotalIPTablesRestoreFailures(op OperationKind) (int, error) { | ||
return counterValue(iptablesRestoreFailures.With(prometheus.Labels{ | ||
operationLabel: string(op), | ||
})) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
package metrics | ||
|
||
import ( | ||
"testing" | ||
"time" | ||
|
||
"github.com/stretchr/testify/require" | ||
) | ||
|
||
func TestRecordIPTablesRestoreLatency(t *testing.T) { | ||
timer := StartNewTimer() | ||
time.Sleep(1 * time.Millisecond) | ||
RecordIPTablesRestoreLatency(timer, UpdateOp) | ||
timer = StartNewTimer() | ||
time.Sleep(1 * time.Millisecond) | ||
RecordIPTablesRestoreLatency(timer, CreateOp) | ||
|
||
count, err := TotalIPTablesRestoreLatencyCalls(CreateOp) | ||
require.Nil(t, err, "failed to get metric") | ||
require.Equal(t, 1, count, "should have recorded create once") | ||
|
||
count, err = TotalIPTablesRestoreLatencyCalls(UpdateOp) | ||
require.Nil(t, err, "failed to get metric") | ||
require.Equal(t, 1, count, "should have recorded update once") | ||
} | ||
|
||
func TestRecordIPTablesDeleteLatency(t *testing.T) { | ||
timer := StartNewTimer() | ||
time.Sleep(1 * time.Millisecond) | ||
RecordIPTablesDeleteLatency(timer) | ||
|
||
count, err := TotalIPTablesDeleteLatencyCalls() | ||
require.Nil(t, err, "failed to get metric") | ||
require.Equal(t, 1, count, "should have recorded create once") | ||
} | ||
|
||
func TestIncIPTablesRestoreFailures(t *testing.T) { | ||
IncIPTablesRestoreFailures(CreateOp) | ||
IncIPTablesRestoreFailures(UpdateOp) | ||
IncIPTablesRestoreFailures(CreateOp) | ||
|
||
count, err := TotalIPTablesRestoreFailures(CreateOp) | ||
require.Nil(t, err, "failed to get metric") | ||
require.Equal(t, 2, count, "should have failed to create twice") | ||
|
||
count, err = TotalIPTablesRestoreFailures(UpdateOp) | ||
require.Nil(t, err, "failed to get metric") | ||
require.Equal(t, 1, count, "should have failed to update once") | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt 100 too much ? can we have something like 10 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with
MaxPendingNetPols=10
. This doesn't seem to give us much performance boost. Took 12 hours to add 640 NetPols. WithMaxPendingNetPols=100
, the same takes <20 minutes (table in PR description has more details)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With
MaxPendingNetPols=10
, I also saw 1/10 NPM Pods crash with this error after adding 625/640 NetPols, 11.5 hours after NPM came online: