-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cds: Configure LRS in CDS response #7953
Changes from 4 commits
8d3285e
6dd7370
cc3906d
1c66fa6
a3e0810
6fa9e39
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,7 +52,7 @@ service ClusterDiscoveryService { | |
// [#protodoc-title: Clusters] | ||
|
||
// Configuration for a single upstream cluster. | ||
// [#comment:next free field: 42] | ||
// [#comment:next free field: 43] | ||
message Cluster { | ||
// Supplies the name of the cluster which must be unique across all clusters. | ||
// The cluster name is used when emitting | ||
|
@@ -657,6 +657,35 @@ message Cluster { | |
// :ref:`lb_policy<envoy_api_field_Cluster.lb_policy>` field has the value | ||
// :ref:`LOAD_BALANCING_POLICY_CONFIG<envoy_api_enum_value_Cluster.LbPolicy.LOAD_BALANCING_POLICY_CONFIG>`. | ||
LoadBalancingPolicy load_balancing_policy = 41; | ||
|
||
// [#not-implemented-hide:] If present, tells the client where to send load reports via LRS. | ||
// | ||
// If multiple clusters are configured to use the same LRS server, then the data for all of | ||
// those clusters may be sent on the same LRS stream. | ||
// | ||
// Note that if LRS is configured this way, then there is no need for the LRS server to send | ||
// LoadStatsResponse messages on the LRS stream, because the data it would have sent is already | ||
// configured here. | ||
message LoadReportConfig { | ||
// A management server endpoint to stream load stats to via | ||
// *StreamLoadStats*. This must have :ref:`api_type | ||
// <envoy_api_field_core.ApiConfigSource.api_type>` :ref:`GRPC | ||
// <envoy_api_enum_value_core.ApiConfigSource.ApiType.GRPC>`. | ||
core.ApiConfigSource lrs_server = 1; | ||
markdroth marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In discussing this with @vishalpowar, one problem here is that some management server deployments want the LRS stream to be sent to the same management server that is providing the CDS request, but they don't currently know what name or credentials their clients are using to access the management server. It would be more work for them to figure that out just to give the client information that it already has (because it used the name and credentials to access the CDS request in the first place). Currently, ConfigSource has a special case for ADS that basically says "use the same ADS stream you're already using", but that won't work here, because LRS can't be sent over ADS. To address this, can we add a new field to the "oneof" in ConfigSource that basically says "talk to the same server you're getting this message from" but without being linked to ADS? I think this new type of ConfigSource might also be useful in cases that are not using ADS but where LDS points to RDS on the same server or where CDS points to EDS on the same server. @htuch and @mattklein123, WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes sense to me but can we track this in a new issue/proposal? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. I've sent out #8201 for review. |
||
// The minimum time interval to collect stats over. This is only a | ||
// minimum for two reasons: | ||
// 1. There may be some delay between when the timer fires and when | ||
// stats sampling occurs. | ||
// 2. Whenever the load reporting interval for the cluster changes, any | ||
// traffic that is observed since the last load report for the | ||
// cluster will be included in the next load report. This avoids | ||
// a period of inobservability for the cluster. | ||
google.protobuf.Duration load_reporting_interval = 2; | ||
// If set to *true*, the client should generate endpoint-granularity | ||
// load reports. Otherwise, load reports should be at locality granularity. | ||
bool report_endpoint_granularity = 3; | ||
} | ||
LoadReportConfig load_report_config = 42; | ||
} | ||
|
||
// [#not-implemented-hide:] Extensible load balancing policy configuration. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -150,6 +150,7 @@ LDS | |
LEV | ||
LHS | ||
LF | ||
LRS | ||
MB | ||
MD | ||
MGET | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a federation perspective I think it would be better if the LRS server could still independently configure the interval, and potentially even turn off reports until they are needed. So I guess 2 questions:
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing devil's advocate for a moment, why is it better to configure the interval via LRS and via CDS for federation purposes? Shouldn't federation be able to configure this via CDS?
This seems similar to the federation use-case we discussed for setting load balancing weights. How did we wind up deciding to handle that use-case once we removed proto-merging from the UDPA federation design? Could we apply whatever solution we used for that to this case as well?
If there's a good reason why this can be more easily done via LRS than via CDS, then maybe we should just leave the negotiation in the LRS stream where it is today and instead restrict this change to simply point the client at the right LRS server. I'd prefer to avoid the complexity of trying to support two different ways to do the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. This reason I want this is because there are different implementations and a separation of concerns between CDS and LRDS, and IMO it's better for LRS to control the reporting rate, so my strong preference is to have that be done by LRS only as you propose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this further... I think the reason I was trying to move this out of LRS in the first place is that as soon as we start configuring the LRS server in the CDS response, if we leave the negotiation in the LRS stream, then a misconfiguration could cause duplicate load reporting. Consider the case where cluster A configures LRS server X and cluster B configures LRS server Y. If the LRS streams to X and Y both request data for both clusters, then the data could wind up being double-reported.
Also, I think Harvey's proposal for UDPA was that use-cases like LRS were going to be handled by having the client publish the load-report resources on the single UDPA stream. I'm not sure we've fully thought through how we were going to decide which management server was going to get which load-report data in a federation deployment.
We should probably talk this through in a bit more detail. How about we have a short meeting about this once Harvey is back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM