Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: [WIN-NPM] allow readiness probes #1887

Merged
merged 15 commits into from
Apr 11, 2023
Merged

Conversation

huntergregory
Copy link
Contributor

@huntergregory huntergregory commented Apr 4, 2023

Reason for Change:
Allows readiness probes no matter the NetPol configuration. This is in-line with other NetworkPolicy managers including:

  • Linux NPM
  • Calico (Linux and Windows)
  • Cilium

Issue Fixed:
Fixes #1881

Requirements:

Notes:

@huntergregory huntergregory added npm Related to NPM. windows labels Apr 4, 2023
@huntergregory huntergregory force-pushed the hgregory/03-31-readiness-probe branch from e940588 to 1389016 Compare April 4, 2023 00:08
@huntergregory huntergregory marked this pull request as ready for review April 4, 2023 18:28
@huntergregory huntergregory requested a review from a team as a code owner April 4, 2023 18:28
@huntergregory huntergregory requested review from matmerr and removed request for a team April 4, 2023 18:28
@huntergregory
Copy link
Contributor Author

Testing

  • 100 windows nodes
  • 1 cx Pod on each node
  • NPM image: acnpublic.azurecr.io/azure-npm:d4269faf

Add deny-all policy for these Pods.

  1. NPM finds node IP correctly
  2. NPM adds deny-all policy
  3. Pods continue to run

// fixes #1881
hnsRules[len(policy.ACLs)] = &NPMACLPolSettings{
Id: policy.ACLPolicyID,
Action: hcn.ActionTypeAllow,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are allowing all rules, you should add the source ip as node ip here right ?
Add a comment on this too .

vakalapa
vakalapa previously approved these changes Apr 6, 2023
@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@@ -92,6 +94,11 @@ func (pMgr *PolicyManager) Bootup(epIDs []string) error {
// update Prometheus metrics on success
metrics.IncNumACLRulesBy(numLinuxBaseACLRules)
}

if util.IsWindowsDP() && pMgr.NodeIP == "" {
return npmerrors.Errorf(npmerrors.BootupPolicyMgr, false, "policy manager must have a configured nodeIP in Windows")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return error might be fatal, but should be okay for time being. We will need to make our getnodeip logic robust in case we see any failures.

Can put up a alert on this error message from telemetry.

// Get preferred outbound ip of this machine
// source: https://stackoverflow.com/questions/23558425/how-do-i-get-the-local-ip-address-in-go
func NodeIP() (string, error) {
conn, err := net.Dial("udp", "8.8.8.8:80")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if 8.8.8.8 is blocked by some NSGs or other way , will this still work ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the stack overflow post, it says that this actually does not actually create a connection since it is the udp protocol. Any IP address would work here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested in the following way:

  1. Made sure NPM gets the right node IP in 100-node cluster
  2. On local Windows machine, got node IP by dialing 20 million different IPs
  3. Added deny rules for 8.8.8.8:80 in AKS NSG

@vakalapa vakalapa merged commit ddb3417 into master Apr 11, 2023
@vakalapa vakalapa deleted the hgregory/03-31-readiness-probe branch April 11, 2023 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
npm Related to NPM. windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Liveness/readiness probes time out with default deny all network policy on Windows 2022
2 participants