Support retaining clusters when still referenced by route/listeners #18958

howardjohn · 2021-11-10T16:25:18Z

Title: Support retaining clusters when still referenced by route/listeners

Description:

Currently, our ingress's gets clusters for all services we know about. This is often extremely wasteful, since most of them are not referenced by any routes.

We added an option to filter these out, but started seeing some issues when

Initial state: route /foo -> cluster foo-v1
Desired end state: route /foo -> cluster foo-v2, remove foo-v1

In practice, there is a gap between the cluster being removed and the route being updated. This results in a period of downtime on route updates.

Our ideal would be that envoy doesn't delete the cluster while a route still references it.

The alternative would be to solve it in the control plane. But that seems pretty hard... I guess we would need to realize foo-v1 is referenced by a route and keep it around until the route is updated.

This may be ~impossible for us (until we move to Delta XDS, anyways), since "keep it around" with sotw means we literally need a copy of it in the control plane and we don't store config after we send it, nor can we generate it again.

And to know its updated we need to send the route and wait for the ack I guess? Which also introduces a ton of complexity to the control plane and requires it to be stateful.

It seems a bit complex, and I am not even sure it can work if the proxy reconnects to a new control plane instance during the middle of this operation, unless our control plane instances sync data which is extremely complicated

[optional Relevant Links:]
Issue tracker on our side: istio/istio#29131

Shikugawa · 2021-11-11T09:01:38Z

Our expectation is that prevent the immediate drain of the cluster which is bound to the route when the route is updated via RDS, and wait to update the cluster until the next cluster is available. Is it correct?

howardjohn · 2021-11-11T16:29:18Z

Yeah that would be ideal I think.

mattklein123 · 2021-11-11T17:00:01Z

Adding this is going to be very complicated and IMHO this is a control plane problem concern, not an Envoy concern. I'm going to push back pretty hard on adding this unless there is a very good reason.

howardjohn · 2021-11-11T17:07:25Z

I actually generally agree with that, but my concern is that I am not convinced a control plane can actually solve this in any reasonable way.

Consider:

Initial state: route /foo -> cluster foo-v1. Connecting to CP-1 instance of control plane.
User config changes to point /foo -> cluster foo-v2.
Control plane is smart, so it sends the new cluster foo-v2 first.
Control plane crashes, new instance CP-2 starts up and envoy reconnects
Now CP-2 needs to resend all config. It seems that the user wants /foo -> cluster foo-v2; no config needs cluster foo-v1 (and in fact, CP-2 may not even know foo-v1 is something that even exists at all!), so it removes foo-v1.
Traffic is now broken
Eventually CP-2 finishes the push, and moves /foo -> foo-v2. Traffic is now fixed

In our specific case, our control plane is essentially always in the 'Control plane crashes' state, because we do not maintain the state required to handle this complex operation; I used a crash as that is something that I would assume no control plane is immune from.

mattklein123 · 2021-11-11T17:30:08Z

IMO if your control plane crashes it's sufficient for there to be a delay until things converge, downstream retries, etc. should take care of this.

howardjohn · 2021-11-11T17:34:58Z

When the control plane crashes I would expect configuration updates to be delayed. I would not expect traffic to be broken (ie 503s until control plane comes back up).

mattklein123 · 2021-11-11T17:58:29Z

When the control plane crashes I would expect configuration updates to be delayed. I would not expect traffic to be broken (ie 503s until control plane comes back up).

I don't understand why your control plane can't come up in a consistent state and send the new config.

howardjohn · 2021-11-11T18:05:52Z

The users state is configured in Kubernetes. This includes service discovery (Services/Endpoints), as well as routing configuration. In Kubernetes, we can only get access to the current configuration - we cannot view historical state, unless we maintain that out of band.

In step 3 from #18958 (comment), the control plane can do this because it saw the old config, and now it sees the new configuration, so it could be smart enough to know we need to "stage" the update in two parts (in theory - Istio cannot do this today).

With a new control plane coming up though, there is no way it can understand the state of the old config. So all it sees is the user has config for a foo-v2 Service (ie foo-v2 Envoy Cluster), at which point it doesn't really have much choice other than to send just foo-v2. Even if it somehow knew it should send foo-v1, we don't have the Service for foo-v1 so we wouldn't know how to synthesize a cluster for it.

Its possible I am over complicated this :-)

mattklein123 · 2021-11-11T18:10:25Z

I understand what you are saying in terms of the corner case, but I don't think Envoy holding configuration is the right solution to this problem. It's too complicated. If you need atomic updates I would rather just figure out a way in which a route update is held until all of the clusters it references are actually available. This has been discussed in the past. cc @htuch

htuch · 2021-11-12T22:18:28Z

Yeah. The neat way to approach this would be to have the xDS Resource declare its dependencies in the wrapper and have the xDS machinery wait to deliver until these are available from all sources. The downsides are that this is going to add extra complexity and debug pain when using this feature (it would be opt-in), and our xDS delivery right now is not really setup as a cache (where this would be more natural). CC @adisuissa

howardjohn · 2021-11-12T22:44:48Z

Yeah. The neat way to approach this would be to have the xDS Resource declare its dependencies in the wrapper and have the xDS machinery wait to deliver until these are available from all sources. The downsides are that this is going to add extra complexity and debug pain when using this feature (it would be opt-in), and our xDS delivery right now is not really setup as a cache (where this would be more natural). CC @adisuissa

I can definitely imagine other cases this could be useful. One similar problem (that is not a high priority for us) is that we have a coupling of SAN in Cluster to EDS endpoints. A similar approach could probably work there (although SAN in the EDS may be better). If I think about it more there are probably other cross-resource dependencies that would be useful.

github-actions · 2021-12-13T00:01:59Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions · 2021-12-20T00:02:09Z

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

howardjohn · 2021-12-20T16:16:35Z

Not stale, maybe we can add "help wanted"? Unless this feature would not be accepted

htuch · 2021-12-21T04:59:13Z

@howardjohn which variant? The original proposal or what we ended up discussing?

howardjohn · 2021-12-21T16:24:53Z

I am fine with either one, I think both will satisfy our issue

htuch · 2021-12-27T00:56:03Z

I'd be interested in hearing from @mattklein123 and @adisuissa here, but I think what would be ideal would be to open an issue to track making the xDS system more cache like and using that structure to then accommodate the resource staging (probably two issues). I can file them if there is general agreement on this approach. It would then need to be help wanted as I'm not sure if anyone in the xDS maintainer orbit has cycles for a major undertaking like this.

Absent this, I'm not sure if the first approach is going to fly, but I'm open minded to options.

mattklein123 · 2022-01-04T17:24:26Z

I can file them if there is general agreement on this approach. It would then need to be help wanted as I'm not sure if anyone in the xDS maintainer orbit has cycles for a major undertaking like this.

Yes this is the approach that I would recommend. @htuch I marked this help wanted and reopened, but fee free to close this out and replace with a fresh issue if you prefer.

adisuissa · 2022-01-04T18:26:23Z

I also think the approach of using an xDS internal cache and explicitly setting the dependencies in the resource will make things easier to understand, and generalize this cross-xDS dependency requirement for other use-cases.

howardjohn · 2022-09-16T22:48:31Z

Coming back to this, I actually tried implementing #18958 (comment) (ignoring the control plane crashing edge case). Even that is seemingly ~impossible to do correctly.

Despite a correct ordering:

RDS -> points to foo
CDS: push foo and bar
RDS: update to point to bar
CDS: push bar only

There is actually a step "1b" needed to wait for "bar" to finish warming. This information is not actually exposed to control planes, so its ambiguous when to do step "3"

howardjohn · 2022-10-05T23:37:30Z

#23321 I guess resolves #18958 (comment) being impossible to implement

vikaschoudhary16 · 2022-10-07T03:54:54Z

#23321 I guess resolves #18958 (comment) being impossible to implement

@howardjohn I realized pausing cds ack is already working. In my earlier testing, seems like, I was missing proper changes on the Istiod and incorrectly thought that envoy's pausing is not working. I retested istio/istio#41229 with no custom proxy and it works i.e no 503s.

jewertow · 2022-12-12T23:31:49Z

@vikaschoudhary16 Pausing CDS works correctly, but the problem is that the listener manager and cluster manager pause requesting resources independently. Listener manager pauses requesting Cluster, ClusterLoadAssignment, LbEndpoints and Secret resources. Cluster manager on the other hand, pauses requesting Listener, RouteConfiguration and Secret resources.

I was trying to work around getting 503 by updating a route in 2 steps:

Switch traffic from existing route to a new cluster.
Remove an old cluster.

I expected that it will work without getting 503, because I didn't remove and add clusters in the same update, but the update failed after the first step. I got the following order of events in Envoy:

Cluster test-app-8087 is serving requests successfully.
CDS added test-app-8088.
Envoy paused requesting Clusters, ClusterLoadAssignments, LbEndpoints and Secrets.
Cluster test-app-8088 started being warmed.
Envoy resumed requesting ClusterLoadAssignment.
Received new LDS configuration.
Received new RDS configuration.
New route was loaded!
Responding with 503, because cluster test-app-8088 is not yet warmed.
Received EDS response with ClusterLoadAssignments.
Cluster test-app-8088 is warmed.
Next requests succeed.

This is because of not pausing requests for Listener and RouteConfiguration by CDS. More logs you can find here.

The documentation describes a similar case here and calls the solution "make before break model", but it complicates control plane significantly. On the other hand, implementing dependent resources in XDS API, as it was suggested, is much more complex and couldn't be delivered in a short term.

I'm looking for a trade-off, so what do you think about extending Envoy API with an option that would pause requesting Listener and RouteConfiguration on a CDS update? Then a control plane could send clusters referenced previously and currently by a route and update that route without 503.
cc: @mattklein123 @htuch @howardjohn

mattklein123 · 2023-01-03T16:56:09Z

I'm looking for a trade-off, so what do you think about extending Envoy API with an option that would pause requesting Listener and RouteConfiguration on a CDS update? Then a control plane could send clusters referenced previously and currently by a route and update that route without 503.

I still think implementing the caching/dependencies is the right solution, but this sounds reasonable to me at a high level. I'm not enough in the weeds anymore to understand the full implications of this change so I will defer to @adisuissa on that.

jewertow · 2023-01-03T23:45:31Z

@mattklein123 Thank you for your feedback, but it turned out that my solution does not make sense. @howardjohn helped me realize that in the case I described above, a route was loaded, because it was pushed by the control plane - it was not requested by the proxy. So pausing requests does not make sense in this case.

jewertow · 2023-01-20T10:32:12Z

I can file them if there is general agreement on this approach. It would then need to be help wanted as I'm not sure if anyone in the xDS maintainer orbit has cycles for a major undertaking like this.

@htuch could you file these issues? We are interested in this feature and we would like to know its scope and requirements.

htuch · 2023-01-27T05:19:57Z

@adisuissa this relates to your current arc with XdsManager I think, right?

adisuissa · 2023-01-27T15:06:52Z

@adisuissa this relates to your current arc with XdsManager I think, right?

There are a few threads in this issue, so I'll give a short summary of what I think can be done.

As discussed IRL the idea is that the XdsManager will ensure warming by not exposing a resource to the upper-level component (RouteConfigProviderManager in this case), until the resource dependencies arrive and it can become active.
Some details on how this will be achieved: when the resource arrives to envoy, there will be a create() method that takes the resource proto, converts it to an Envoy-internal representation, and states if there are any dependencies.
If there are no dependencies, the resource will be added to a cache and any of the internal observers (e.g., RouteConfigProviderManager) will be notified.
If there are dependencies, the resource will be stored in a "warming" state, and the internal observers will not be notified. Once the dependencies arrive, the resource will become "active", stored in the cache, and the internal observers will be notified.

From the protocol side, Envoy will still send an ACK for a resource received and kept in a warmed state. It will just not use that resource until the dependencies arrive.

jewertow · 2023-01-31T12:39:34Z

@adisuissa thank you very much for explanation!

Does someone already work on it or is it just an idea?
Will it be the default behavior or an optional feature controlled by a config option?

howardjohn added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Nov 10, 2021

mattklein123 added area/cluster_manager and removed enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Nov 11, 2021

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 13, 2021

github-actions bot closed this as completed Dec 20, 2021

mattklein123 reopened this Jan 4, 2022

mattklein123 added the help wanted Needs help! label Jan 4, 2022

github-actions bot removed the stale stalebot believes this issue/PR has not been touched recently label Jan 4, 2022

howardjohn mentioned this issue Sep 16, 2022

PILOT_FILTER_GATEWAY_CLUSTER_CONFIG: don't push xds when unrelated services are updated istio/istio#40995

Merged

jewertow mentioned this issue Nov 22, 2022

PILOT_FILTER_GATEWAY_CLUSTER_CONFIG: Handle route change gracefuly istio/istio#41229

Closed

jewertow mentioned this issue Jan 4, 2023

Clarify that on-demand CDS does not support SotW xDS #24763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support retaining clusters when still referenced by route/listeners #18958

Support retaining clusters when still referenced by route/listeners #18958

howardjohn commented Nov 10, 2021

Shikugawa commented Nov 11, 2021

howardjohn commented Nov 11, 2021

mattklein123 commented Nov 11, 2021

howardjohn commented Nov 11, 2021 •

edited

Loading

mattklein123 commented Nov 11, 2021

howardjohn commented Nov 11, 2021

mattklein123 commented Nov 11, 2021

howardjohn commented Nov 11, 2021

mattklein123 commented Nov 11, 2021

htuch commented Nov 12, 2021

howardjohn commented Nov 12, 2021

github-actions bot commented Dec 13, 2021

github-actions bot commented Dec 20, 2021

howardjohn commented Dec 20, 2021

htuch commented Dec 21, 2021

howardjohn commented Dec 21, 2021

htuch commented Dec 27, 2021

mattklein123 commented Jan 4, 2022

adisuissa commented Jan 4, 2022

howardjohn commented Sep 16, 2022

howardjohn commented Oct 5, 2022

vikaschoudhary16 commented Oct 7, 2022

jewertow commented Dec 12, 2022

mattklein123 commented Jan 3, 2023

jewertow commented Jan 3, 2023

jewertow commented Jan 20, 2023

htuch commented Jan 27, 2023

adisuissa commented Jan 27, 2023

jewertow commented Jan 31, 2023

Support retaining clusters when still referenced by route/listeners #18958

Support retaining clusters when still referenced by route/listeners #18958

Comments

howardjohn commented Nov 10, 2021

Shikugawa commented Nov 11, 2021

howardjohn commented Nov 11, 2021

mattklein123 commented Nov 11, 2021

howardjohn commented Nov 11, 2021 • edited Loading

mattklein123 commented Nov 11, 2021

howardjohn commented Nov 11, 2021

mattklein123 commented Nov 11, 2021

howardjohn commented Nov 11, 2021

mattklein123 commented Nov 11, 2021

htuch commented Nov 12, 2021

howardjohn commented Nov 12, 2021

github-actions bot commented Dec 13, 2021

github-actions bot commented Dec 20, 2021

howardjohn commented Dec 20, 2021

htuch commented Dec 21, 2021

howardjohn commented Dec 21, 2021

htuch commented Dec 27, 2021

mattklein123 commented Jan 4, 2022

adisuissa commented Jan 4, 2022

howardjohn commented Sep 16, 2022

howardjohn commented Oct 5, 2022

vikaschoudhary16 commented Oct 7, 2022

jewertow commented Dec 12, 2022

mattklein123 commented Jan 3, 2023

jewertow commented Jan 3, 2023

jewertow commented Jan 20, 2023

htuch commented Jan 27, 2023

adisuissa commented Jan 27, 2023

jewertow commented Jan 31, 2023

howardjohn commented Nov 11, 2021 •

edited

Loading