-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for explicit wildcard resource #16855
Changes from 38 commits
719661c
9b84dfd
a5f4ed7
d78ec97
2e8ba91
f4ca629
c008bcc
3fa6a37
3914615
1d53769
fa83949
6748f53
02e5370
872e737
502de45
4ba0fc4
4e34600
98c408b
18e9f70
d6e273f
db884fe
9339a0f
cb44431
fcc53aa
9a1855d
e71317f
cb93839
b7dee94
61361b8
63bba65
da12ed5
e249133
88587b0
da99fc7
d9f0803
74532e9
3d55048
f8b758f
fd76a69
4d20de1
ab65125
c726d04
f9d846d
bae8643
d10ec21
508fc9c
726d655
649af01
6f4724b
7d78a00
93ec187
52a66ee
044bceb
559e50a
9a34a6b
5e9a130
36b872b
5a2aac2
55dfdfa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,11 +22,59 @@ namespace Config { | |
// There can be multiple DeltaSubscriptionStates active. They will always all be | ||
// blissfully unaware of each other's existence, even when their messages are | ||
// being multiplexed together by ADS. | ||
// | ||
// There are two scenarios which affect how DeltaSubscriptionState manages the resources. First | ||
// scenario is when we are subscribed to a wildcard resource, and other scenario is when we are not. | ||
// | ||
// Delta subscription state also divides the resources it cached into three categories: requested, | ||
// wildcard and ambiguous. | ||
// | ||
// The "requested" category is for resources that we have explicitly asked for (either through the | ||
// initial set of resources or through the on-demand mechanism). Resources in this category are in | ||
// one of two states: "complete" and "waiting for server". | ||
// | ||
// "Complete" resources are resources about which the server sent us the information we need (for | ||
// now - just resource version). | ||
// | ||
// The "waiting for server" state is either for resources that we have just requested, but we still | ||
// didn't receive any version information from the server, or for the "complete" resources that, | ||
// according to the server, are gone, but we are still interested in them - in such case we strip | ||
// the information from the resource. | ||
// | ||
// The "wildcard" category is for resources that we are not explicitly interested in, but we are | ||
// indirectly interested through the subscription to the wildcard resource. | ||
// | ||
// The "ambiguous" category is for resources that we stopped being interested in, but we may still | ||
// be interested indirectly through the wildcard subscription - resources in these category are | ||
// "waiting" for the config server to confirm their status. | ||
// | ||
// Please refer to drawings (non-wildcard-resource-state-machine.png and | ||
// (wildcard-resource-state-machine.png) for visual depictions of the resource state machine. | ||
// | ||
// In the "no wildcard subscription" scenario all the cached resources should be in the "requested" | ||
// category. Resources are added to the category upon the explicit request and dropped when we | ||
// explicitly unsubscribe from it. Transitions between "complete" and "waiting for server" happen | ||
// when we receive messages from the server - if a resource in the message is in "added resources" | ||
// list (thus contains version information), the resource becomes "complete". If the resource in the | ||
// message is in "removed resources" list, it changes into the "waiting for server" state. If a | ||
// server sends us a resource that we didn't request, it's going to be ignored. | ||
// | ||
// In the "wildcard subscription" scenario, "requested" category is the same as in "no wildcard | ||
// subscription" scenario, with one exception - the unsubscribed "complete" resource is not removed | ||
// from the cache, but it's moved to the "ambiguous" resources instead. At this point we are waiting | ||
// for the server to tell us that this resource should be either moved to the "wildcard" resources, | ||
// or dropped. Resources in "wildcard" category are only added there or dropped from there by the | ||
// server. Resources from both "wildcard" and "ambiguous" categories can become "requested" | ||
// "complete" resources if we subscribe to them again. | ||
// | ||
// The delta subscription state transitions between the two scenarios depending on whether we are | ||
// subscribed to wildcard resource or not. Nothing special happens when we transition from "no | ||
// wildcard subscription" to "wildcard subscription" scenario, but when transitioning in the other | ||
// direction, we drop all the resources in "wildcard" and "ambiguous" categories. | ||
class DeltaSubscriptionState : public Logger::Loggable<Logger::Id::config> { | ||
public: | ||
DeltaSubscriptionState(std::string type_url, UntypedConfigUpdateCallbacks& watch_map, | ||
const LocalInfo::LocalInfo& local_info, Event::Dispatcher& dispatcher, | ||
const bool wildcard); | ||
const LocalInfo::LocalInfo& local_info, Event::Dispatcher& dispatcher); | ||
|
||
// Update which resources we're interested in subscribing to. | ||
void updateSubscriptionInterest(const absl::flat_hash_set<std::string>& cur_added, | ||
|
@@ -60,15 +108,21 @@ class DeltaSubscriptionState : public Logger::Loggable<Logger::Id::config> { | |
|
||
class ResourceState { | ||
public: | ||
ResourceState(const envoy::service::discovery::v3::Resource& resource) | ||
: version_(resource.version()) {} | ||
|
||
// Builds a ResourceState in the waitingForServer state. | ||
ResourceState() = default; | ||
// Builds a ResourceState with a specific version | ||
ResourceState(absl::string_view version) : version_(version) {} | ||
// Self-documenting alias of default constructor. | ||
static ResourceState waitingForServer() { return ResourceState(); } | ||
// Self-documenting alias of constructor with version. | ||
static ResourceState withVersion(absl::string_view version) { return ResourceState(version); } | ||
|
||
// If true, we currently have no version of this resource - we are waiting for the server to | ||
// provide us with one. | ||
bool waitingForServer() const { return version_ == absl::nullopt; } | ||
bool isWaitingForServer() const { return version_ == absl::nullopt; } | ||
|
||
void setAsWaitingForServer() { version_ = absl::nullopt; } | ||
void setVersion(absl::string_view version) { version_ = std::string(version); } | ||
|
||
// Must not be called if waitingForServer() == true. | ||
std::string version() const { | ||
|
@@ -80,36 +134,37 @@ class DeltaSubscriptionState : public Logger::Loggable<Logger::Id::config> { | |
absl::optional<std::string> version_; | ||
}; | ||
|
||
// Use these helpers to ensure resource_state_ and resource_names_ get updated together. | ||
void addResourceState(const envoy::service::discovery::v3::Resource& resource); | ||
void setResourceWaitingForServer(const std::string& resource_name); | ||
void removeResourceState(const std::string& resource_name); | ||
void addResourceStateFromServer(const envoy::service::discovery::v3::Resource& resource); | ||
OptRef<ResourceState> getRequestedResourceState(absl::string_view resource_name); | ||
OptRef<const ResourceState> getRequestedResourceState(absl::string_view resource_name) const; | ||
|
||
void populateDiscoveryRequest(envoy::service::discovery::v3::DeltaDiscoveryResponse& request); | ||
bool isInitialRequestForLegacyWildcard(); | ||
|
||
// A map from resource name to per-resource version. The keys of this map are exactly the resource | ||
// names we are currently interested in. Those in the waitingForServer state currently don't have | ||
// any version for that resource: we need to inform the server if we lose interest in them, but we | ||
// also need to *not* include them in the initial_resource_versions map upon a reconnect. | ||
absl::node_hash_map<std::string, ResourceState> resource_state_; | ||
absl::node_hash_map<std::string, ResourceState> requested_resource_state_; | ||
// A map from resource name to per-resource version. The keys of this map are resource names we | ||
// have received as a part of the wildcard subscription. | ||
absl::node_hash_map<std::string, std::string> wildcard_resource_state_; | ||
// Used for storing resources that we lost interest in, but could | ||
// also be a part of wildcard subscription. | ||
absl::node_hash_map<std::string, std::string> ambiguous_resource_state_; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just getting back to review here (sorry for delay). I was a little surprised by the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wrote a description of the state machine in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These diagrams look great! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, thanks so much. |
||
|
||
// Not all xDS resources supports heartbeats due to there being specific information encoded in | ||
// an empty response, which is indistinguishable from a heartbeat in some cases. For now we just | ||
// disable heartbeats for these resources (currently only VHDS). | ||
const bool supports_heartbeats_; | ||
TtlManager ttl_; | ||
// The keys of resource_versions_. Only tracked separately because std::map does not provide an | ||
// iterator into just its keys. | ||
absl::flat_hash_set<std::string> resource_names_; | ||
|
||
const std::string type_url_; | ||
// Is the subscription is for a wildcard request. | ||
const bool wildcard_; | ||
UntypedConfigUpdateCallbacks& watch_map_; | ||
const LocalInfo::LocalInfo& local_info_; | ||
Event::Dispatcher& dispatcher_; | ||
std::chrono::milliseconds init_fetch_timeout_; | ||
|
||
bool in_initial_legacy_wildcard_{true}; | ||
bool any_request_sent_yet_in_current_stream_{}; | ||
bool must_send_discovery_request_{}; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little sub-optimal, because it means that if a client is subscribed to the wildcard and to another explicit resource, if it unsubscribes from that explicit resource, it cannot remove the resource from its cache unless the server explicitly sends a response with the resource in the
removed_resources
list. This could lead to the cache growing without bound.Maybe we should add something to the xDS spec that requires the delta server to send the resource name in
removed_resources
in response to a client unsubscription, but only if the client is subscribed to the wildcard.@htuch, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my assumption was that in such case (we are subscribed to wildcard and just unsubscribed from some explicit resource) the server will send us a response with the explicit resource in
added_resources
if the explicit resource is also a part of the wildcard set (so we would move the resource from ambiguous to wildcard) or otherwise inremoved_resources
if the explicit resource is not a part of the wildcard set (so we would drop the resource from the cache).But yeah, probably it's best to add such a wording to xDS spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above -- why cache a resource if we receive a complete copy from the server anyway?
I don't think this is current behaviour and it needs to be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It seems inefficient to require the server to explicitly resend the resource if it is included in the wildcard, so I wasn't thinking it would do that. But I guess the client would need some signal to tell it to remove the resource from the "ambiguous" list in this case.
I've put together #17983 to update the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitri-d In the incremental protocol variant, the server does not need to resend any individual resource unless that resource has changed, so we won't necessarily receive a complete copy from the server anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on it, please? Right now I'm not sure if and how to address this. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the problem is with TTLs here. Can you say more about that?
With regard to the broader issue, I agree that the presence of "ambiguous" state is fairly ugly. But I think the root cause of the ugliness is actually the wildcard subscription semantics, which were not designed to coexist with per-resource subscriptions and do not provide any way for the client to know which returned resource is actually associated with the wildcard subscription. Given that inherent wire-level protocol problem, I don't see any choice other than to mandate a certain behavior for clients.
I will also note two things that I think make this more tolerable. First, the new xdstp: naming scheme replaces wildcard subscriptions with collection resources, and collection resources do not have the above problem: the wire protocol clearly indicates which returned resources are associated with which collection, so there is no ambiguity. This means that this problem will go away as we migrate to the new naming scheme.
Second, even for legacy clients and servers, this problem comes up only in a case where the client is subscribing to both the wildcard and to other resources on the same stream. In general, that will occur only when something like on-demand CDS is used, which means that clients that don't implement on-demand CDS will not need to worry about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, when a resource ttl expires, the resource gets deleted. It happens regardless of whether the resource is a part of wildcard subscription or not. IIRC the original conversation, the assumption was that the server will just resend a resource if the server missed an update.
With the changes in this PR, the client keeps the state for resources some of the time now. There isn't a way for the server to distinguish between unsubscribe request due to a ttl expiration from a regular unsubscribe based on data present in discovery request. As we expect different responses from the server (a full resource in case of ttl expiration, a resource name otherwise), the server is now required to distinguish between the two based on its internal state. which wasn't a requirement before. b/c of the above the changes in this PR would require a corresponding update in server implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the client actually generates an unsubscribe request when a TTL expires. A TTL is basically a server's way of saying "consider the resource to be removed at a particular time without my having to explicitly tell you that it's been removed". But a server indicating that a resource has been removed does that alter the fact that a client is subscribing to it. In effect, you can think of "does not exist" as just another possible value for a given resource; while a client is subscribed to a resource, the server is obligated to send updates whenever the value of that resource changes, so "resource previously did exist and now does not" and "resource previously did not exist and now does" are both just special cases of the resource's value changing.
A server that sends a resource with a TTL knows that the client will consider the resource to not exist when the TTL expires. If the server wants the client to continue using the resource after that, it needs to resend the resource with a new TTL value before the original TTL expires. But this should not be hard for the server to figure out, because the server knows that it sent the resource with a TTL in the first place. And I think this behavior is required independently of the change we're talking about here.
In the change we're talking about here, all we're saying is that a server that supports both wildcard and non-wildcard subscriptions on the same stream needs to know that if the client unsubscribes to a non-wildcard resource, the server needs to explicitly indicate whether that resource is still relevant to the client because of the wildcard or whether the client can remove the resource. I think this change is completely independent of TTL behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duh, indeed there's no unsubscribe request on ttl expiration. Ignore what I said.