-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix unknown cluster issue #12160
fix unknown cluster issue #12160
Conversation
36e0341
to
067be3a
Compare
when the envoy sidecar get disconnected from the xds stream, the reconnecting request will contain `InitialResourceVersions` and `ResourceNamesSubscribe`. This is OK for endpoint and route type, as these two are driven by (child types of) cluster and listener type respectively. However, register `cluster` type as subscription instead of wildcard would cause envoy not able to get any new cluster updates for the rest of this life. Same goes for `listener`. This pr is to always set cluster and listener type to wildcard, to ensure the envoy sidecar will get those updates after disconnecting from xds stream for whatever reason (network blip/consul restart/etc).
067be3a
to
bcf26a3
Compare
this solves the failure case as described here: ingress log before:
after patching consul:
|
I found this reference to this strange behavior as being non-ideal in envoy: envoyproxy/envoy#16063 |
And the fix landed here: envoyproxy/envoy#16153 |
thanks @rboyer! do you know which envoy version has it? AFAIK we are on the latest envoy version (v1.18.4) that is compatible with consul 1.10.x, and are still seeing this issue. |
It doesn't show up in the changelog, but I scanned for a key line from the initial PR commit and it shows up in
|
So there's precedent in the Consul codebase for slightly altering xDS behavior based on the connected envoy versions. I'm going to look into bending your changes to that form soon so that we don't have to do something "off spec" for compliant envoy instances going forward. |
awesome, thank you for looking into this @rboyer ! |
Thank you for tracking down the shape of this proposed fix, which helped identify what was possibly up over on the envoy side. |
For my own notes, the envoy folks have slightly updated the surrounding code from this patch in this later patch: envoyproxy/envoy#16855 |
@fredwangwang I've made a new PR that replaces this one (and avoids the typo in your branch name) with the conditional behavior. I've carried your test over basically unchanged: #12174 |
great thanks @rboyer! Let me close this pr and keep an eye on urs instead :) |
when the envoy sidecar get disconnected from the xds stream, the reconnecting request will
contain
InitialResourceVersions
andResourceNamesSubscribe
. This is OK for endpoint and route type,as these two are driven by (child types of) cluster and listener type respectively.
However, register
cluster
type as subscription instead of wildcard would cause envoynot able to get any new cluster updates for the rest of this life. Same goes for
listener
.This pr is to always set cluster and listener type to wildcard, to ensure the envoy sidecar will get
those updates after disconnecting from xds stream for whatever reason (network blip/consul restart/etc).