eds: introducing EDS resources cache #28079

adisuissa · 2023-06-21T20:29:22Z

Commit Message: introducing EDS resources cache
Additional Description:

Currently after an EDS-cluster update, Envoy waits for an EDS response. If a timeout occurs, the EDS-cluster will be used without endpoints.
This PR introduces a cache to store EDS resources (ClusterLoadAssignments), that allows invoking a callback when a resource is removed or expired.
This will be followed up by 2 PRs:

Plumbing the cache in the cluster-manager and GrpcMux implementations.
Modifying the EdsClusterImpl class to use the cache.

Risk Level: low - introducing a new component which isn't used anywhere.
Testing: Adding a unit test for the class.
Docs Changes: N/A.
Release Notes: N/A.
Platform Specific Features: N/A.
Part of #26749

The entire change can be looked here: adisuissa@f0b7ac8

Signed-off-by: Adi Suissa-Peleg <[email protected]>

zuercher · 2023-06-22T17:20:59Z

/assign-from @envoyproxy/envoy-maintainers

repokitteh-read-only · 2023-06-22T17:21:05Z

@envoyproxy/envoy-maintainers assignee is @ggreenway

🐱

Caused by: a #28079 (comment) was created by @zuercher.

see: more, trace.

adisuissa · 2023-06-22T17:31:55Z

cc @abeyad as an expert on the config-plane pipeline

abeyad

Thanks for this thoroughly well-written and well-tested PR! Overall looks great, left a few minor comments.

abeyad · 2023-06-22T23:38:35Z

envoy/config/eds_resources_cache.h

+// Represents an xDS resources cache for EDS resources, and currently supports
+// a single config-source (ADS). The motivation is that clusters that are
+// updated (not added) during a CDS response will be able to use the current EDS
+// configuration, thus avoiding an additional EDS response.


Will the CDS response always be followed by an EDS response if it's an EDS cluster? Or not necessarily?
The reason I ask is, assuming connectivity to the xDS server is sound, would that mean the cached EDS resources would only be used for a very brief period until the EDS response comes soon after the CDS response's arrival? An explanation to that effect in the comments might be helpful.

I'll provide some more context here, and let me know if you think this should go into the comment.

A CDS response will create a cluster in warming mode. If an EDS response will not be delivered in 15 seconds, that cluster will become active with an empty assignment (no endpoints). The series of these PRs will allow Envoy to use the previously cached assignment if no EDS response is received by the time the warming ends.

According to the xDS-protocol, Envoy should use the cached EDS assignment as soon as the CDS response arrives. However, Envoy intentionally doesn't do so, because it supports a case where the cluster's endpoints are substantially changed. For example, when a cluster is changed from a non-TLS cluster to a TLS cluster, the old assignment should not be used, and Envoy should use the new assignment.

This work is essentially allowing the use of cached resources, thus not requiring the server to send an assignment if there was no update, while still allowing a use-case as described above.

That makes sense, thanks for the explanation. I think just an extra note in the comments saying the xDS server isn't required to send an EDS response after a CDS response would help clarify things.

Clarified the comment.

envoy/config/eds_resources_cache.h

abeyad · 2023-06-23T00:46:13Z

source/extensions/config_subscription/grpc/eds_resources_cache_impl.h

+  // The value of the map, holds the resource and the removal callbacks.
+  struct ResourceData {
+    envoy::config::endpoint::v3::ClusterLoadAssignment resource_;
+    std::list<EdsResourceRemovalCallback*> removal_cbs_;


I think std::list probably makes sense here since removal from the list would likely be frequent, but I think if we expect removal to not be a regular occurrence than maybe std::vector is a better choice for better cache locality.

Hmmm... good point. I expect the list to be of size 1 or 2 at most. I'm not sure if it will matter that much, but I'm willing to change it to a vector if you think it makes more sense.

I think std::vector probably makes more sense, with a call to reserve(2) ? The only benefit of std::list would be frequent random insertion and deletion in a larger list, but with such a small list, that benefit wouldn't exist.

I've changed to a vector. I don't think that reserving the size to 2 makes sense here, as the typical use-case is an empty-vector.

test/extensions/config_subscription/grpc/eds_resources_cache_impl_test.cc

Signed-off-by: Adi Suissa-Peleg <[email protected]>

abeyad · 2023-06-23T13:22:29Z

envoy/config/eds_resources_cache.h

+// Represents an xDS resources cache for EDS resources, and currently supports
+// a single config-source (ADS). The motivation is that clusters that are
+// updated (not added) during a CDS response will be able to use the current EDS
+// configuration, thus avoiding an additional EDS response.


That makes sense, thanks for the explanation. I think just an extra note in the comments saying the xDS server isn't required to send an EDS response after a CDS response would help clarify things.

abeyad · 2023-06-23T13:23:57Z

envoy/config/eds_resources_cache.h

+   *         resource doesn't exist.
+   */
+  virtual OptRef<const envoy::config::endpoint::v3::ClusterLoadAssignment>
+  getResource(absl::string_view resource_name, EdsResourceRemovalCallback* removal_cb) PURE;


Can you add a comment on the lifetime expectations of the EdsResourceRemovalCallback pointer? i.e. why is it not owned by the EdsResourcesCache class?

I've updated the comment to clarify the lifetime.

abeyad · 2023-06-23T13:25:52Z

source/extensions/config_subscription/grpc/eds_resources_cache_impl.h

+  // The value of the map, holds the resource and the removal callbacks.
+  struct ResourceData {
+    envoy::config::endpoint::v3::ClusterLoadAssignment resource_;
+    std::list<EdsResourceRemovalCallback*> removal_cbs_;


I think std::vector probably makes more sense, with a call to reserve(2) ? The only benefit of std::list would be frequent random insertion and deletion in a larger list, but with such a small list, that benefit wouldn't exist.

Signed-off-by: Adi Suissa-Peleg <[email protected]>

abeyad

Thanks @adisuissa !

ggreenway

LGTM

Signed-off-by: Adi Suissa-Peleg <[email protected]> Signed-off-by: Ryan Eskin <[email protected]>

Continuation of PR #28079 (as part of the work for issue #26749). Currently after an EDS-cluster update, Envoy waits for an EDS response. If a timeout occurs, the EDS-cluster will be used without endpoints. This PR adds the use of caching into the GrpcMux. The GrpcMux object adds an EDS resource to the cache when it is received/updated, and removes it when there are no longer subscriptions (watchers). A runtime flag is added to disable the use of the cache, and will be enabled in a future PR when ADS is used. Next PR will plumb this into ADS, and add fetching of resources from the cache as part of the EdsClusterImpl. The entire change can be looked here: adisuissa/envoy@f0b7ac8 Risk Level: Low - the disabled runtime flag should prevent the use of the cache in non-tests code. Testing: Added unit tests. Docs Changes: N/A. Release Notes: N/A (future PR). Platform Specific Features: N/A. Runtime guard: disabled by default: envoy_restart_features_use_eds_cache_for_ads Signed-off-by: Adi Suissa-Peleg <[email protected]>

adisuissa added 2 commits June 21, 2023 20:17

eds: introducing EDS resources cache

17de172

Signed-off-by: Adi Suissa-Peleg <[email protected]>

fix format

810920f

Signed-off-by: Adi Suissa-Peleg <[email protected]>

repokitteh-read-only bot assigned ggreenway Jun 22, 2023

abeyad reviewed Jun 23, 2023

View reviewed changes

addressing comments

ee3ee88

Signed-off-by: Adi Suissa-Peleg <[email protected]>

abeyad reviewed Jun 23, 2023

View reviewed changes

abeyad self-assigned this Jun 23, 2023

adisuissa added 3 commits June 26, 2023 13:55

addressing comments

00602ea

Signed-off-by: Adi Suissa-Peleg <[email protected]>

clang-tidy

7d99e32

Signed-off-by: Adi Suissa-Peleg <[email protected]>

Merge remote-tracking branch 'upstream/main' into eds_caching_part1

0318b83

Signed-off-by: Adi Suissa-Peleg <[email protected]>

abeyad approved these changes Jun 26, 2023

View reviewed changes

ggreenway approved these changes Jun 27, 2023

View reviewed changes

ggreenway merged commit 057c80b into envoyproxy:main Jun 28, 2023

adisuissa mentioned this pull request Jul 7, 2023

eds: Adding eds caching support to grpc-mux #28273

Merged

reskin89 pushed a commit to reskin89/envoy that referenced this pull request Jul 11, 2023

eds: introducing EDS resources cache (envoyproxy#28079)

0baebb3

Signed-off-by: Adi Suissa-Peleg <[email protected]> Signed-off-by: Ryan Eskin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eds: introducing EDS resources cache #28079

eds: introducing EDS resources cache #28079

adisuissa commented Jun 21, 2023 •

edited

Loading

zuercher commented Jun 22, 2023

repokitteh-read-only bot commented Jun 22, 2023

adisuissa commented Jun 22, 2023

abeyad left a comment

abeyad Jun 22, 2023

adisuissa Jun 23, 2023

abeyad Jun 23, 2023

adisuissa Jun 26, 2023

abeyad Jun 23, 2023

adisuissa Jun 23, 2023

abeyad Jun 23, 2023

adisuissa Jun 26, 2023

abeyad Jun 23, 2023

abeyad Jun 23, 2023

adisuissa Jun 26, 2023

abeyad Jun 23, 2023

abeyad left a comment

ggreenway left a comment

eds: introducing EDS resources cache #28079

eds: introducing EDS resources cache #28079

Conversation

adisuissa commented Jun 21, 2023 • edited Loading

zuercher commented Jun 22, 2023

repokitteh-read-only bot commented Jun 22, 2023

adisuissa commented Jun 22, 2023

abeyad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abeyad left a comment

Choose a reason for hiding this comment

ggreenway left a comment

Choose a reason for hiding this comment

adisuissa commented Jun 21, 2023 •

edited

Loading