Avoid flapping of multiple Ingress definitions #3862

rtreffer · 2018-09-06T15:06:46Z

What does this PR do?

If multiple ingress objects define the same frontend (host + path) and one of the ingresses is broken (e.g. referenced service does not exist) then the frontend will be flapping (appear, disappear, ...).

Motivation

We found this in production and I've extracted the problem into a testcase...

Given 2 ingresses:

- apiVersion: extensions/v1beta1
  kind: Ingress
  metadata:
    namespace: testing
    name: web-a
  spec:
    rules:
    - host: host-a
      http:
        paths:
        - backend:
            serviceName: service1
            servicePort: 80
- apiVersion: extensions/v1beta1
  kind: Ingress
  metadata:
    namespace: testing
    name: web-b
  spec:
    rules:
    - host: host-a
      http:
        paths:
        - backend:
            serviceName: missing
            servicePort: 80

The missing service will cause the whole host-a frontend to disappear if the ingress objects are processed in order [ingress/web-a, ingress/web-b]. We have seen in production that the ordering can be random, thus causing the frontend to become unavailable.

More

Added/updated tests
Added/updated documentation

Additional Notes

Instead if writing to the Frontends map directly this patch checks out the current frontend and writes it to the map only if basic validations pass.

There was a second test-failure after applying this patch. The root2 testcase has 2 ingress definitions, one of those is broken.
The behavior now changed to accept the valid one and keep the frontend.

This is a rebase of #3857 on v1.6
This version does not require the following testcase patch: 5b5e8dc

rtreffer · 2018-09-06T15:11:31Z

Applying this PR on newer branches may require 5b5e8dc

dtomcej · 2018-09-06T16:07:02Z

@timoreimann I guess the only question I have is: If you have two identical ingresses witht he same host/path combo, but different services, are we supposed to merge the backends if they are valid?

If so, then this resolution is fine. Otherwise, we should define what should happen.

I agree having the frontend appear/disappear is not ideal, but that doesn't mean that merging all overlaps is the only other option.

Thoughts?

rtreffer · 2018-09-06T18:22:46Z

We had an outage because the same host/path was defined by 2 ingress in 2 namespaces (in that particular case even for valid reasons). That caused an outage as only one of those was correctly scaled and traffic was routed to one of the two namespaces (choosen randomly on each reload).

We have since then a testcase and see it as a must that an ingress merges those. Even if it is regarded as a misconfiguration I would prefer anything to taking the frontend offline.

I see the point that merging ingress definitions is not desirable (annotation? restrictions? auth? ....) but what are non-disruptive alternatives?

timoreimann · 2018-09-12T10:42:54Z

IMHO, the host/path tuple is a unique identifier for an Ingress'ed resource. As such, I feel it'd be natural to merge -- it seems like there should be no difference between an Ingress spec holding all backends and two Ingress specs holding all backends together.

The only alternative I could see is to reject Ingresses that refer an already existing host/path pair, or make things configurable. I don't really have a use case for that, other than to protect Ingress objects from each other. If people want that, however, they can control access to Ingress resources more tightly (which is what my org does) or run separate Traefik instances per namespace.

@dtomcej WDYT?

dtomcej · 2018-09-24T14:45:25Z

Sorry for the delayed response.

I agree @timoreimann that the tuple should be unique. In the case at hand, the annotations from the ingress resource first loaded would be applied. I do think that this case may be noteworthy as far as documentation goes, since it could lead to non-deterministic configuration.

If @timoreimann is ok with this non-deterministic configuration, I am as well.

timoreimann · 2018-09-27T07:20:05Z

Documenting sounds like a good idea. I'd also emphasize that we now merge configurations that map to the same Ingress resource.

@rtreffer would you do us the documentation honor? 🙂

rtreffer · 2018-09-30T15:57:49Z

Sure. Merging is not new (it is already happening on 2 service backed ingress definitions) but I couldn't find it in the docs. I'll add a paragraph about it in the kubernetes user-guide.

timoreimann

One minor comment left.

Other than that, I find your addendum to the documentation quite cheesy -- exactly the way I like it. 🧀

timoreimann · 2018-10-01T23:07:47Z

docs/user-guide/kubernetes.md

+Træfik will now look for cheddar service endpoints (ports on healthy pods) in both the cheese and the default namespace. Deploying cheddar into the cheese namespace and afterwards shutting down cheddar in the default namespace is enough to migrate the traffic.
+
+!!! note
+   The kubernetes documentation does not specify this merging behavior. The reference nginx implementation has undefined behavior: If two ingress objects define the same host/port then one of them will randomly win on reload.


I'd omit the description pertaining to Nginx's behavior (second sentence to the end of the paragraph) as it may change in the future, leaving our docs outdated without us being able to notice the fact easily.

Removed, added it to explain the motivation, but that's not helpful for someone just reading the docs 🤷‍♂️

timoreimann

Great, LGTM. 👍 Time to

dtomcej

LGTM

A frontend would be flapping between available and unavailable if 1. The hostname/path was defined by multiple ingress objects 2. One of the ingress objects references a non-existing service The flapping was caused by undefined ordering in a golang map: if the last processed ingress was broken a delete was issued on the frontends map - even if previous runs succeeded. The new logic only adds a frontend to the map of frontends if the basic validation passed. The frontend will thus always be available if at least one ingress definition is not broken.

ldez

LGTM

rtreffer requested a review from a team as a code owner September 6, 2018 15:06

traefiker added area/provider/k8s/ingress size/M status/0-needs-triage labels Sep 6, 2018

traefiker added this to the 1.6 milestone Sep 6, 2018

rtreffer mentioned this pull request Sep 6, 2018

Multiple kubernetes ingress definitions may cause flapping frontend #3857

Closed

2 tasks

rtreffer changed the title ~~Fix missing frontend if the last ingress is broken~~ Multiple kubernetes ingress definitions may cause flapping frontend Sep 6, 2018

ldez added kind/bug/fix a bug fix status/2-needs-review and removed status/0-needs-triage labels Sep 7, 2018

ldez requested review from dtomcej and timoreimann September 7, 2018 14:22

timoreimann changed the title ~~Multiple kubernetes ingress definitions may cause flapping frontend~~ Avoid flapping of multiple Ingress definitions Sep 12, 2018

ldez added the contributor/waiting-for-documentation label Sep 27, 2018

nmengin removed the contributor/waiting-for-documentation label Oct 1, 2018

timoreimann suggested changes Oct 1, 2018

View reviewed changes

traefiker added the contributor/waiting-for-corrections label Oct 1, 2018

timoreimann approved these changes Oct 2, 2018

View reviewed changes

traefiker removed the contributor/waiting-for-corrections label Oct 2, 2018

ldez added the bot/no-merge label Oct 2, 2018

dtomcej approved these changes Oct 4, 2018

View reviewed changes

ldez added kind/enhancement a new or improved feature. and removed kind/enhancement a new or improved feature. labels Oct 4, 2018

mmatur requested a review from ldez October 5, 2018 14:31

rtreffer-sc and others added 4 commits October 5, 2018 17:04

Add kubernetes ingress merging documentation

aeac8c2

Remove reference to nginx reference ingress implemention

c570dca

review: modify test related to rebase.

f897664

ldez modified the milestones: 1.6, 1.7 Oct 5, 2018

rtreffer requested review from a team as code owners October 5, 2018 15:18

ldez changed the base branch from v1.6 to v1.7 October 5, 2018 15:18

ldez removed request for a team October 5, 2018 15:18

ldez removed the bot/no-merge label Oct 5, 2018

ldez approved these changes Oct 5, 2018

View reviewed changes

ldez added the bot/no-merge label Oct 5, 2018

review: doc.

658326b

ldez added status/3-needs-merge and removed bot/no-merge status/2-needs-review labels Oct 5, 2018

traefiker merged commit 157580c into traefik:v1.7 Oct 5, 2018

traefiker removed the status/3-needs-merge label Oct 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid flapping of multiple Ingress definitions #3862

Avoid flapping of multiple Ingress definitions #3862

rtreffer commented Sep 6, 2018 •

edited

Loading

rtreffer commented Sep 6, 2018

dtomcej commented Sep 6, 2018

rtreffer commented Sep 6, 2018

timoreimann commented Sep 12, 2018

dtomcej commented Sep 24, 2018 •

edited

Loading

timoreimann commented Sep 27, 2018

rtreffer commented Sep 30, 2018

timoreimann left a comment

timoreimann Oct 1, 2018

rtreffer Oct 2, 2018

timoreimann left a comment

dtomcej left a comment

ldez left a comment

Avoid flapping of multiple Ingress definitions #3862

Avoid flapping of multiple Ingress definitions #3862

Conversation

rtreffer commented Sep 6, 2018 • edited Loading

What does this PR do?

Motivation

More

Additional Notes

rtreffer commented Sep 6, 2018

dtomcej commented Sep 6, 2018

rtreffer commented Sep 6, 2018

timoreimann commented Sep 12, 2018

dtomcej commented Sep 24, 2018 • edited Loading

timoreimann commented Sep 27, 2018

rtreffer commented Sep 30, 2018

timoreimann left a comment

Choose a reason for hiding this comment

timoreimann Oct 1, 2018

Choose a reason for hiding this comment

rtreffer Oct 2, 2018

Choose a reason for hiding this comment

timoreimann left a comment

Choose a reason for hiding this comment

dtomcej left a comment

Choose a reason for hiding this comment

ldez left a comment

Choose a reason for hiding this comment

rtreffer commented Sep 6, 2018 •

edited

Loading

dtomcej commented Sep 24, 2018 •

edited

Loading