Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cds: Configure LRS in CDS response #7953

Merged
merged 6 commits into from
Sep 20, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion api/envoy/api/v2/cds.proto
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ service ClusterDiscoveryService {
// [#protodoc-title: Clusters]

// Configuration for a single upstream cluster.
// [#comment:next free field: 42]
// [#comment:next free field: 43]
message Cluster {
// Supplies the name of the cluster which must be unique across all clusters.
// The cluster name is used when emitting
Expand Down Expand Up @@ -657,6 +657,35 @@ message Cluster {
// :ref:`lb_policy<envoy_api_field_Cluster.lb_policy>` field has the value
// :ref:`LOAD_BALANCING_POLICY_CONFIG<envoy_api_enum_value_Cluster.LbPolicy.LOAD_BALANCING_POLICY_CONFIG>`.
LoadBalancingPolicy load_balancing_policy = 41;

// [#not-implemented-hide:] If present, tells the client where to send load reports via LRS.
//
// If multiple clusters are configured to use the same LRS server, then the data for all of
// those clusters may be sent on the same LRS stream.
//
// Note that if LRS is configured this way, then there is no need for the LRS server to send
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a federation perspective I think it would be better if the LRS server could still independently configure the interval, and potentially even turn off reports until they are needed. So I guess 2 questions:

  1. Can we clarify potentially that duration can be a sentinel value (empty, 0, whatever) that means that no reports are sent?
  2. And that this can either be configured in the CDS or LRS response?

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing devil's advocate for a moment, why is it better to configure the interval via LRS and via CDS for federation purposes? Shouldn't federation be able to configure this via CDS?

This seems similar to the federation use-case we discussed for setting load balancing weights. How did we wind up deciding to handle that use-case once we removed proto-merging from the UDPA federation design? Could we apply whatever solution we used for that to this case as well?

If there's a good reason why this can be more easily done via LRS than via CDS, then maybe we should just leave the negotiation in the LRS stream where it is today and instead restrict this change to simply point the client at the right LRS server. I'd prefer to avoid the complexity of trying to support two different ways to do the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then maybe we should just leave the negotiation in the LRS stream where it is today and instead restrict this change to simply point the client at the right LRS server. I'd prefer to avoid the complexity of trying to support two different ways to do the same thing.

+1. This reason I want this is because there are different implementations and a separation of concerns between CDS and LRDS, and IMO it's better for LRS to control the reporting rate, so my strong preference is to have that be done by LRS only as you propose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this further... I think the reason I was trying to move this out of LRS in the first place is that as soon as we start configuring the LRS server in the CDS response, if we leave the negotiation in the LRS stream, then a misconfiguration could cause duplicate load reporting. Consider the case where cluster A configures LRS server X and cluster B configures LRS server Y. If the LRS streams to X and Y both request data for both clusters, then the data could wind up being double-reported.

Also, I think Harvey's proposal for UDPA was that use-cases like LRS were going to be handled by having the client publish the load-report resources on the single UDPA stream. I'm not sure we've fully thought through how we were going to decide which management server was going to get which load-report data in a federation deployment.

We should probably talk this through in a bit more detail. How about we have a short meeting about this once Harvey is back?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

// LoadStatsResponse messages on the LRS stream, because the data it would have sent is already
// configured here.
message LoadReportConfig {
// A management server endpoint to stream load stats to via
// *StreamLoadStats*. This must have :ref:`api_type
// <envoy_api_field_core.ApiConfigSource.api_type>` :ref:`GRPC
// <envoy_api_enum_value_core.ApiConfigSource.ApiType.GRPC>`.
core.ApiConfigSource lrs_server = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In discussing this with @vishalpowar, one problem here is that some management server deployments want the LRS stream to be sent to the same management server that is providing the CDS request, but they don't currently know what name or credentials their clients are using to access the management server. It would be more work for them to figure that out just to give the client information that it already has (because it used the name and credentials to access the CDS request in the first place).

Currently, ConfigSource has a special case for ADS that basically says "use the same ADS stream you're already using", but that won't work here, because LRS can't be sent over ADS. To address this, can we add a new field to the "oneof" in ConfigSource that basically says "talk to the same server you're getting this message from" but without being linked to ADS?

I think this new type of ConfigSource might also be useful in cases that are not using ADS but where LDS points to RDS on the same server or where CDS points to EDS on the same server.

@htuch and @mattklein123, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me but can we track this in a new issue/proposal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I've sent out #8201 for review.

// The minimum time interval to collect stats over. This is only a
// minimum for two reasons:
// 1. There may be some delay between when the timer fires and when
// stats sampling occurs.
// 2. Whenever the load reporting interval for the cluster changes, any
// traffic that is observed since the last load report for the
// cluster will be included in the next load report. This avoids
// a period of inobservability for the cluster.
google.protobuf.Duration load_reporting_interval = 2;
// If set to *true*, the client should generate endpoint-granularity
// load reports. Otherwise, load reports should be at locality granularity.
bool report_endpoint_granularity = 3;
}
LoadReportConfig load_report_config = 42;
}

// [#not-implemented-hide:] Extensible load balancing policy configuration.
Expand Down
6 changes: 6 additions & 0 deletions api/envoy/service/load_stats/v2/lrs.proto
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ service LoadReportingService {
// 6. The management server uses the load reports from all reported Envoys
// from around the world, computes global assignment and prepares traffic
// assignment destined for each zone Envoys are located in. Goto 2.
//
// Note: If the management server configured use of LRS via via the CDS
// :ref:`Cluster.load_report_config
// <envoy_api_field_Cluster.load_report_config>` field, then it will never
// send a response message on the stream. Instead, that data will
// come from the CDS response.
rpc StreamLoadStats(stream LoadStatsRequest) returns (stream LoadStatsResponse) {
}
}
Expand Down
1 change: 1 addition & 0 deletions tools/spelling_dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ LDS
LEV
LHS
LF
LRS
MB
MD
MGET
Expand Down