Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNAT port exhaustion alerting #343

Merged
merged 1 commit into from
Jan 13, 2025
Merged

SNAT port exhaustion alerting #343

merged 1 commit into from
Jan 13, 2025

Conversation

neillturner
Copy link
Contributor

Context

We want to be alerted if SNAT port exhaustion happens on the kubernetes load balancer as well as when port usage is high on the load balancer for a node.

Changes proposed in this pull request

Two Azure alerts created with actions to send to slack.
I don't believe we can use prometheus for this as the load balancer is outside of kubernetes so does report info to prometheus AFAIK.

Guidance to review

this has been applied to the platform test environment via
make platform-test terraform-aks-cluster-apply CONFIRM_PLATFORM_TEST=yes
so can be seen in the azure portal.
Not user which slack webhook to use.
To test the slack integration we could set the ConnectionState to Success instead of Failed to generate an alert
and reduce the port threshold from 900 to a small value to test the only alert.

Checklist

  • I have performed a self-review of my code, including formatting and typos
  • I have cleaned the commit history
  • I have added the Devops label
  • I have attached the pull request to the trello card

@neillturner
Copy link
Contributor Author

tested this successfully before merging by setting port usage down from 900 to 20 and saw it get triggered and email sent.

@neillturner neillturner merged commit cbdc41b into main Jan 13, 2025
3 checks passed
@neillturner neillturner deleted the port-exhaustion branch January 13, 2025 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants