-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid flapping of multiple Ingress definitions #3862
Avoid flapping of multiple Ingress definitions #3862
Conversation
Applying this PR on newer branches may require 5b5e8dc |
@timoreimann I guess the only question I have is: If you have two identical ingresses witht he same host/path combo, but different services, are we supposed to merge the backends if they are valid? If so, then this resolution is fine. Otherwise, we should define what should happen. I agree having the frontend appear/disappear is not ideal, but that doesn't mean that merging all overlaps is the only other option. Thoughts? |
We had an outage because the same host/path was defined by 2 ingress in 2 namespaces (in that particular case even for valid reasons). That caused an outage as only one of those was correctly scaled and traffic was routed to one of the two namespaces (choosen randomly on each reload). We have since then a testcase and see it as a must that an ingress merges those. Even if it is regarded as a misconfiguration I would prefer anything to taking the frontend offline. I see the point that merging ingress definitions is not desirable (annotation? restrictions? auth? ....) but what are non-disruptive alternatives? |
IMHO, the host/path tuple is a unique identifier for an Ingress'ed resource. As such, I feel it'd be natural to merge -- it seems like there should be no difference between an Ingress spec holding all backends and two Ingress specs holding all backends together. The only alternative I could see is to reject Ingresses that refer an already existing host/path pair, or make things configurable. I don't really have a use case for that, other than to protect Ingress objects from each other. If people want that, however, they can control access to Ingress resources more tightly (which is what my org does) or run separate Traefik instances per namespace. @dtomcej WDYT? |
Sorry for the delayed response. I agree @timoreimann that the tuple should be unique. In the case at hand, the annotations from the ingress resource first loaded would be applied. I do think that this case may be noteworthy as far as documentation goes, since it could lead to non-deterministic configuration. If @timoreimann is ok with this non-deterministic configuration, I am as well. |
Documenting sounds like a good idea. I'd also emphasize that we now merge configurations that map to the same Ingress resource. @rtreffer would you do us the documentation honor? 🙂 |
Sure. Merging is not new (it is already happening on 2 service backed ingress definitions) but I couldn't find it in the docs. I'll add a paragraph about it in the kubernetes user-guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment left.
Other than that, I find your addendum to the documentation quite cheesy -- exactly the way I like it. 🧀
docs/user-guide/kubernetes.md
Outdated
Træfik will now look for cheddar service endpoints (ports on healthy pods) in both the cheese and the default namespace. Deploying cheddar into the cheese namespace and afterwards shutting down cheddar in the default namespace is enough to migrate the traffic. | ||
|
||
!!! note | ||
The kubernetes documentation does not specify this merging behavior. The reference nginx implementation has undefined behavior: If two ingress objects define the same host/port then one of them will randomly win on reload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd omit the description pertaining to Nginx's behavior (second sentence to the end of the paragraph) as it may change in the future, leaving our docs outdated without us being able to notice the fact easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, added it to explain the motivation, but that's not helpful for someone just reading the docs 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM. 👍 Time to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
A frontend would be flapping between available and unavailable if 1. The hostname/path was defined by multiple ingress objects 2. One of the ingress objects references a non-existing service The flapping was caused by undefined ordering in a golang map: if the last processed ingress was broken a delete was issued on the frontends map - even if previous runs succeeded. The new logic only adds a frontend to the map of frontends if the basic validation passed. The frontend will thus always be available if at least one ingress definition is not broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this PR do?
If multiple ingress objects define the same frontend (host + path) and one of the ingresses is broken (e.g. referenced service does not exist) then the frontend will be flapping (appear, disappear, ...).
Motivation
We found this in production and I've extracted the problem into a testcase...
Given 2 ingresses:
The
missing
service will cause the wholehost-a
frontend to disappear if the ingress objects are processed in order[ingress/web-a, ingress/web-b]
. We have seen in production that the ordering can be random, thus causing the frontend to become unavailable.More
Additional Notes
Instead if writing to the
Frontends
map directly this patch checks out the currentfrontend
and writes it to the map only if basic validations pass.There was a second test-failure after applying this patch. The
root2
testcase has 2 ingress definitions, one of those is broken.The behavior now changed to accept the valid one and keep the frontend.
This is a rebase of #3857 on v1.6
This version does not require the following testcase patch: 5b5e8dc