Filter endpoint in LB based on metadata #1279

htuch · 2017-07-18T18:44:03Z

As decided in https://github.com/lyft/envoy-api/issues/81, we added metadata support to EDS and matching in routes to the API in envoyproxy/data-plane-api@abcd5ba. This issue will track the implementation work around endpoint subsetting in the LB stage.

@mccv @9len @rshriram

The text was updated successfully, but these errors were encountered:

9len · 2017-07-18T19:30:51Z

cc @zuercher too

htuch · 2017-07-21T19:58:15Z

@9len @zuercher Will you folks be able to provide some implementation bandwidth on this feature?

9len · 2017-07-21T20:36:16Z

Hi @htuch; we're focused on #128 right now, as a means to make development easier for us, but once we land (or time-out trying to land) that, @zuercher can take a look.

htuch · 2017-07-21T20:38:04Z

Sure. It's going to be a few weeks until we've fully plumbed in protos into the CDS/EDS/RDS aspects of Envoy, was just trying to get a provisional notion of who might own this one. Will assign to @zuercher for now.

9len · 2017-07-21T22:35:20Z

our general focus will be on implementing the xDS APIs with our stuff, but this is definitely something we'll need, so we can sign up for it if nobody else needs it sooner.

rshriram · 2017-09-03T21:04:20Z

Would also be nice to have fallback support , i.e. if no endpoints with the given labels are found, load balance among all endpoints. And the ability to derive the metadata from request headers.

zuercher · 2017-09-07T21:52:42Z

I'm starting to look at this.

I think a reasonable first step is to plumb the LbEndpoint Metadata from EDS into Upstream::HostDescription/Impl. Then it should be possible to use the data from inside the load balancers.

zuercher · 2017-09-14T18:10:08Z

Ready to take the next steps. This is a proposal for some further modifications (include a change to data-plane-api).

It seems reasonable to me not to have to maintain alternate load balancer implementations (there are currently 5) that differ only by being configurable with subsets. So I propose to

Write a subsetting LB that wraps multiple underlying load
balancers, one for each configured subset of hosts. A default subset
may be configured as empty, all endpoints, or a specific
subset of endpoints. Configuring subsets is optional and when unspecified the
result is load balancers configured exactly as they are now. In the
subsetting case, the underlying LBs all use the same lb_policy, health
checks, and so forth as if subsetting were not enabled.

Add subset configuration to the Cluster message in cds.proto:

// Optionally divide the endpoints in this cluster into subsets defined by endpoint
// metadata.
message LbSubsetConfig {
  // The behavior used when no endpoint subset matches the LoadBalancerContext.
  // The options are no_fallback, any_endpoint, or default_subset. The value defaults
  // to no_fallback. If no_fallback is selected, a result equivalent to no healthy
  // hosts is reported. If any_endpoint is selected, any cluster endpoint may be
  // returned (subject to policy, health checks, etc). If default_subset is selected,
  // the load balancing is performed over the hosts matching the values in
  // default_subset.
  enum LbSubsetFallbackPolicy {
    NO_FALLBACK = 0;
    ANY_ENDPOINT = 1;
    DEFAULT_SUBSET = 2;
  }
  LbSubsetFallbackPolicy fallback_policy = 1;

  // Specifies the default subset used during fallback if fallback_policy is
  // default_subset. Each field in default_subset is compared to the matching
  // LbEndpoint.Metadata under the "envoy.lb" namespace. It is valid for no hosts to
  // match, in which case the behavior is the same as a fallback_policy of
  // no_fallback.
  google.protobuf.Struct default_subset = 2;

  // Specifications for subsets. For each entry, LbEndpoint.Metadata's "envoy.lb" 
  // namespace is traversed and a subset is created for each unique combination of 
  // key and value. For example:
  // { "subset_keys": [
  //     { "keys": [ "version" ] },
  //     { "keys": [ "stage", "hardware_type" ] }
  // ]}
  // Subsets may overlap. In the case of overlapping subsets, the first matching
  // subset is selected.
  message LbSubsetKeys {
    repeated string keys = 1;
  }
  repeated LbSubsetKeys subset_keys = 3;
}
LbSubsetConfig lb_subset_config = 22;

Modify the LoadBalancerContext interface to provide a
ProtobufWkt::Struct that represents the metadata for the desired
subset. The subsetting LB will look for a subset that matches the
given metadata and then delegate to the corresponding underlying LB to
actually select a host.
Wire up the Router to provide data from RouteAction.metadata_match
and weighted_clusters in the LoadBalancerContext.

htuch · 2017-09-14T21:11:47Z

We've discussed making LB's pluggable like filters before @rshriram. We could add an opaque config for these so we don't need to keep cramming stuff into CDS.

zuercher · 2017-09-15T17:45:29Z

I'm not entirely sure I follow. Is this just adding

google.protobuf.Struct lb_options = 22;

to the Cluster message and placing the subset config stuff in a separate file?

mattklein123 · 2017-09-15T17:47:11Z

@htuch I agree we should allow for pluggable LBs like we are now allowing for filters, stats, etc. but in this case I see no reason not to add the subsetting support directly mainly because as @zuercher points out it can be done in a way that does not change existing behavior.

@zuercher ^ sounds reasonable/useful to me. My only concern is actually perf. It's not great if we have to compare via reflection the struct from the route to the struct from the subset during LB. I think we are going to have to pre-parse somehow which we should be able to do when the route loads as well as when the LB refreshes. I'm not exactly sure how this would work though. @htuch any thoughts on how to best do this?

zuercher · 2017-09-15T18:34:31Z

I expect that the structs will usually be flat (e.g., the field values will be string/bool/number and not nested structs or lists). I'm fairly new to protobufs, but it looks like it's possible to treat the struct as map of field names to values where each value has a kind that determines how to extract the typed value. Nested lists and structs would be more expensive since you have to traverse the hierarchy.

That said, I think it makes sense to compute hashes to avoid doing many comparisons.

And just to clarify, I had been thinking that there might be more fields in the structs from the route than in the subset struct in the LB. For example, the route might specify {"version":"123", "stage":"prod"} where the subset struct only includes {"version":"123"}. In this case I'd expect the subset to be used. So the hashing would be one level down and there'd likely be a new type used to represent the route and subset structs to handle this.

htuch · 2017-09-15T19:49:15Z

Yeah, I think we're going to have to invent our own thing here, protobuf's internal representation of Struct is verbose. I agree that flat hashing is what we want for the common case. One idea is that we could build an index hashtable for each LB subset query we encounter as well, so we only pay this penalty once.

zuercher · 2017-09-15T19:58:20Z

Ok. I'll put up a review for the data-plane-api change and then get started on the actual implementation.

Provides a configuration point for a load balancer that can divide endpoints into subsets addressable by route metadata. See discussion in envoyproxy/envoy#1279. Signed-off-by: Stephan Zuercher <[email protected]>

Fixes #1279, and is the final part of splitting #1735. Provides the actual load balancer for subsets with basic stats for the number of subsets active, created, removed, selected (used for host selection), and fallbacks (used fallback path). Signed-off-by: Stephan Zuercher <[email protected]>

htuch added the api/v2 label Jul 18, 2017

htuch assigned zuercher Jul 23, 2017

zuercher mentioned this issue Sep 8, 2017

common/upstream: support EDS endpoint metadata in HostDescription #1615

Merged

zuercher mentioned this issue Sep 15, 2017

cds: config for load balancer subsets envoyproxy/data-plane-api#173

Merged

zuercher mentioned this issue Oct 14, 2017

upstream: subset load balancer #1857

Merged

htuch closed this as completed in #1857 Oct 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter endpoint in LB based on metadata #1279

Filter endpoint in LB based on metadata #1279

htuch commented Jul 18, 2017

9len commented Jul 18, 2017

htuch commented Jul 21, 2017

9len commented Jul 21, 2017

htuch commented Jul 21, 2017

9len commented Jul 21, 2017

rshriram commented Sep 3, 2017

zuercher commented Sep 7, 2017

zuercher commented Sep 14, 2017

htuch commented Sep 14, 2017

zuercher commented Sep 15, 2017

mattklein123 commented Sep 15, 2017

zuercher commented Sep 15, 2017 •

edited

Loading

htuch commented Sep 15, 2017

zuercher commented Sep 15, 2017

Filter endpoint in LB based on metadata #1279

Filter endpoint in LB based on metadata #1279

Comments

htuch commented Jul 18, 2017

9len commented Jul 18, 2017

htuch commented Jul 21, 2017

9len commented Jul 21, 2017

htuch commented Jul 21, 2017

9len commented Jul 21, 2017

rshriram commented Sep 3, 2017

zuercher commented Sep 7, 2017

zuercher commented Sep 14, 2017

htuch commented Sep 14, 2017

zuercher commented Sep 15, 2017

mattklein123 commented Sep 15, 2017

zuercher commented Sep 15, 2017 • edited Loading

htuch commented Sep 15, 2017

zuercher commented Sep 15, 2017

zuercher commented Sep 15, 2017 •

edited

Loading