Ingress returning 503s when using Topology Aware Routing and the controller has no endpoints in the zone #11342
Labels
lifecycle/frozen
Indicates that an issue or PR should not be auto-closed due to staleness.
needs-kind
Indicates a PR lacks a `kind/foo` label and requires one.
needs-priority
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
What happened:
Ingress returns 503 when run in a multi-zone setup where the backend endpointslice doesn't have any endpoints in the same zone as the Ingress Controller
What you expected to happen:
Like kube-proxy, Ingress should send you to a random endpoint as topology hints are meant to be fail open not shut (unlike xTP).
My impression is that all testing/thought about this feature has been assuming people are using the topology-aware-routing:auto which doesn't let you into this situation, but the hints feature is explicitly designed to separate the responsibility of making the decision of enabling topology routing for a service from the responsibility of implementing it, so the implementation of the hints in the dataplane shouldn't make decisions around the assumption of what it thinks is setting them.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
Note that this is still the current behavior in the latest commit of this repo, see this snippet:
https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/controller/endpointslices.go#L144
Kubernetes version (use
kubectl version
):1.27
Environment:
A multi-zone cluster, e.g.:
Then:
Ingress 3/4 on NodeC/D populate the endpoint list including pod-1 and work.
Ingress 1/2 on NodeA/B do not populate the endpoint list as pod-1 is marked as in a different zone in the endpointslice
** Workaround **
Setting service-upstream and delegating the decision to kube-proxy makes this work, as kube-proxy handles this situation properly (sends you to a random endpoint regardless of topology). It would be nice if ingress-nginx handled this though as there are lots of downsides to service-upstream as I'm sure you folks know
The text was updated successfully, but these errors were encountered: