Capabilities in multi-endpoints configuration #61

t-chaik · 2019-02-27T10:54:04Z

The REAPI describes four services: Execution (EXEC), ActionCache (AC), ContentAddressableStorage (CAS) and Capabilities. The services do have inter-dependencies but the specification doesn't restrict nor advocate on how separated a server implementation can or should expose them. Clients tend to support only some possible configurations:

Bazel has support for two (as far as I know):
- EXEC + CAS + AC available at one end-point.
- EXEC separated from CAS + AC.
BuildStream has support for any: three different endpoints can be specified for the three main services.
RECC has support for two (goal is to support any):
- EXEC + CAS + AC available at one end-point.
- EXEC + AC separated from CAS.

The Capabilities allow requesting server capabilities at a given endpoint. The ServerCapabilities messages contains two sets of capabilties:

CacheCapabilities relevant for CAS and AC.
ExecutionCapabilities relevant for EXEC.

The specification currently doesn't mention anything about how the server should advertise capabilities. Disparities on client expectations already exists. For example Bazel expects all capabilities to be served at the EXEC end-point (event if CAS + AC are separated) while BuildStream expects a Capabilities service to be available at every endpoint but only advertising relevant capabilities for the services hosted at that endpoint .

I think we should discuss and agree on how capabilities should be advertised by server implementations and how client implementations should query them depending on the endpoint configuration. Sensible conclusions should probably be part of the specification.

The text was updated successfully, but these errors were encountered:

ola-rozenfeld · 2019-02-27T15:17:54Z

Yes, agreed, we should be more explicit in the spec! One thing I wanted to avoid is to spec out the full system configuration in the ServerCapabilities -- which endpoints there are altogether, what services do they support, and with what properties. I think it's useful to distinguish between properties of a particular endpoint vs properties of the service as a whole, and I chose to have ServerCapabilities represent the service as a whole, because that is a lot easier to describe (common elements, inter-dependencies), and also more useful to clients. But I see how that can be misleading -- should all endpoints in the system return all the capabilities, then? My idea was this:

If an execution endpoint is present in the system, it should return the ServerCapabilities of the whole system. All other endpoints should return the same response, except for ExecutionCapabilities.

It is a bit ugly idea, but it is simple.
One alternative I see would be to formalize the EndpointServices idea and add something like:

message EndpointServices {
  boolean action_cache = 1;
  boolean cas = 2;
  boolean execution = 3;
  // Be more specific about services? Add others, like Operations?
}

message Endpoint {
  string address = 1;
  EndpointServices = 2;
}

message ServerCapabilities {
  ...
  repeated Endpoint service_endpoints = 6;
  repeated Endpoint current_endpoint = 7;
}

WDYT?

bergsieker · 2019-02-27T15:42:01Z

I'd suggest something similar to what BuildStream expects: each endpoint should implement the Capabilities service, and populate the fields relevant to the services available at that endpoint. Clients may choose to ignore irrelevant capabilities if the client does not plan to use a given service on the specified endpoint.

As an example, consider this setup:

endpointA hosts Exec+AC+CAS. The exec service at this endpoint is capable of using either the local CAS, or an external CAS hosted at a different endpoint. Calls to GetCapabilities return the relevant capabilities for all three services, but there's no explicit annotation that an external CAS is permitted.
endpointB hosts only CAS. Calls to GetCapabilities return only the capabilities relevant to the CAS.

A client specifying endpointA for Exec+AC+CAS would call GetCapabilities on endpointA, and use the returned values for all three services.

A client specifying endpointA for Exec+AC and endpointB for CAS would call GetCapabilities on endpointA and use the result for only Exec+AC, and would also call GetCapabilities on endpointB and use the result for CAS.

It's the responsibility of the user to understand what combinations are permitted by their services.

EricBurnett · 2019-02-27T16:18:09Z

A couple thoughts:

What is the purpose of Capabilities? I think it serves two distinct but related purposes, which we may want to try to separate, though it's not that easy in practice. First, it allows for specifying the capabilities of a "service". Can a CAS implementation support symlinks? May priorities be used on Execution? Etc. Second, it allows (implicitly, today) - for specifying how endpoints relate. For example, the same digest function needs to be used across the different services; for most implementations Execution may only be used with the corresponding CAS (they're not freely composable), etc.

These feel separable, but I'm not sure that's true - even the "independent" capabilities are often not. For example, whether or not the CAS supports symlinks and whether or not you can send an Execution request that includes symlinks are related.

In RBE for a while we had planned to separate the CAS and Execution APIs onto separate endpoints. I still like it in concept, but we ultimately cancelled (or at least shelved indefinitely) that plan, in large part because of the complexity it introduced throughout the system - clients like bazel needed to gain complicated configuration knobs to allow specifying the multiple endpoints and their relationships; we lost an authoritative source for "capabilities" across the system; we had to answer questions about composability of constructs that aren't freely composable in practice, and it didn't buy us as much as we expected in terms of flexibility we'd actually capitalize on.

How can we make this usable in practice? That's really the ultimate goal here - separate endpoints and separate capabilities sound great, but it ultimately has to be borne out in configurations that are sane for clients and servers to implement and for users to configure, and we've struggled to achieve this in the past. One idea that was floated internally (I can't remember by who - @ulfjack or @ola-rozenfeld I think?) that I lean more and more towards is to have a single configuration API that returns capabilities as well as endpoints across all the different services. It would be responsible for returning internally consistent information - endpoints that work together, capabilities that aren't mutually contradictory, etc. But there are still open questions in my mind - what do we do for simple implementations that don't want to implement the capabilities API? If anyone actually wants to have composable endpoints, who hosts this API and how does it know what to return? Does this lead us to putting too much burden on clients to support a lot of configurability under the hood? Etc.

EricBurnett · 2019-02-27T16:22:22Z

Relevant context: https://groups.google.com/d/topic/bazel-buildfarm/Jl8yDjjLNSs/discussion

bergsieker · 2019-02-27T16:28:19Z

For this iteration of the API, I strongly prefer NOT to switch to a "capabilities endpoint returns all the configuration information you need, including the relevant endpoints" model. That model may be preferable for a v3 (or later) version of the API, but it introduces significant complexity in the clients, for the sake of a benefit that's not yet really clear. I think we don't yet really know what configurations of endpoints are practically useful, and it would be better to determine that over time than designing a complicated system to manage arbitrary complexity that ends up not being necessary.

t-chaik · 2019-02-28T10:15:01Z

Each endpoint should implement the Capabilities service, and populate the fields relevant to the services available at that endpoint. Clients may choose to ignore irrelevant capabilities if the client does not plan to use a given service on the specified endpoint.

I would prefer that option too, think it's the most intuitive solution.

It's the responsibility of the user to understand what combinations are permitted by their services.

Very much agreed, and least in the context of the v2 API:

One idea that was floated internally (...) that I lean more and more towards is to have a single configuration API that returns capabilities as well as endpoints across all the different services. It would be responsible for returning internally consistent information - endpoints that work together, capabilities that aren't mutually contradictory, etc.

Inverting the current logic and have the service advertise configuration for its client indeed is a interesting idea. But I quite agree with @bergsieker here, this would be a major change that we probably don't want to sneak into v2 of the API.

EricBurnett · 2019-02-28T16:25:59Z

Yes, sorry, I didn't mean to imply that we should make such a significant change to the semantics of V2 (we couldn't, it's a breaking change). To that end, yeah, Steven's suggested approach for V2 SGTM as well. I'm interested to see how well "It's the responsibility of the user to understand what combinations are permitted by their services" will hold up in practice - if it does, great, no need for any major restructurings at all :).

ola-rozenfeld · 2019-02-28T16:54:09Z

Am I understanding correctly that the difference between the suggestion and the status quo is that an execution-only endpoint would no longer return CacheCapabilities? If so, that is fine with me, but I will need to change Bazel accordingly.

…

On Thu, Feb 28, 2019 at 11:26 AM Eric Burnett ***@***.***> wrote: Yes, sorry, I didn't mean to imply that we should make such a significant change to the semantics of V2 (we couldn't, it's a breaking change). To that end, yeah, Steven's suggested approach for V2 SGTM as well. I'm interested to see how well *"It's the responsibility of the user to understand what combinations are permitted by their services"* will hold up in practice - if it does, great, no need for any major restructurings at all :). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#61 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AYoKuNWAPrbHcpIgjYCG9lDKxMfVXDjbks5vSAMZgaJpZM4bUP_2> .

t-chaik · 2019-03-07T13:30:03Z

Am I understanding correctly that the difference between the suggestion and the status quo is that an execution-only endpoint would no longer return CacheCapabilities?

At least it would be acceptable for such an endpoint (Exec only) not to, and clients should ignore it if-ever it was returned by such an endpoint anyway.

Each endpoint should implement the Capabilities service, and populate the fields relevant to the services available at that endpoint.

My understanding then is:

Exec+CAS+AC endpoints should return CacheCapabilities + ExecutionCapabilities
Exec endpoints should return ExecutionCapabilities
CAS+AC endpoints should return CacheCapabilities
...

ola-rozenfeld · 2019-03-07T20:17:52Z

Sounds good! I will send a clarification patch soon.

…

On Thu, Mar 7, 2019 at 8:30 AM Martin Blanchard ***@***.***> wrote: Am I understanding correctly that the difference between the suggestion and the status quo is that an execution-only endpoint would no longer return CacheCapabilities? At least it would be acceptable for such an endpoint (Exec only) not to, and clients should ignore it if-ever it was returned by such an endpoint anyway. Each endpoint should implement the Capabilities service, and populate the fields relevant to the services available at that endpoint. My understanding then is: - Exec+CAS+AC endpoints should return CacheCapabilities + ExecutionCapabilities - Exec endpoints should return ExecutionCapabilities - CAS+AC endpoints should return CacheCapabilities - ... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#61 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AYoKuFCm_BVWJnNivGAXe8WnB36cTS8Zks5vURRdgaJpZM4bUP_2> .

bazelbuild#61) * Transforming the provided local output paths into uploadable Chunkers. also constructing the ActionResult to be updated in the ActionCache. bazelbuild#58 * Addressing comments

According to changes in bazelbuild/remote-apis#76 (and discussion in bazelbuild/remote-apis#61): > Each endpoint should implement the Capabilities service, and populate the fields relevant to the services available at that endpoint. Clients may choose to ignore irrelevant capabilities if the client does not plan to use a given service on the specified endpoint. This PR: 1. Refactor `RemoteModule` to use a `ChannelFactory` when creating channels so that tests can mock into channels connected to a fake server. 2. Add tests for verifying server capabilities. 3. Only check required capabilities for a given endpoint. - If `--remote_executor` and `--remote_cache point` to the same endpoint, we require that endpoint has both execution and cache capabilities. - If they point to different endpoints, we check the endpoint with execution or cache capabilities respectively. Fixes #11901. Closes #12008. PiperOrigin-RevId: 329875749

t-chaik mentioned this issue Feb 27, 2019

Bazel remote execution services capabilities bazelbuild/bazel#7476

Closed

bergsieker added this to the Remote Execution API v2 milestone Mar 12, 2019

ola-rozenfeld mentioned this issue May 9, 2019

Split the Capabilities by endpoints #76

Merged

ola-rozenfeld closed this as completed in #76 Jun 3, 2019

ColdrickSotK mentioned this issue Aug 6, 2020

Remote Execution: Bazel should make multiple GetCapabilities requests when using multiple endpoints bazelbuild/bazel#11901

Closed

This was referenced Aug 25, 2020

Set --remote_cache to --remote_executor if it is empty. bazelbuild/bazel#11998

Closed

RemoteModule only check required capabilities for a given endpoint bazelbuild/bazel#12008

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capabilities in multi-endpoints configuration #61

Capabilities in multi-endpoints configuration #61

t-chaik commented Feb 27, 2019

ola-rozenfeld commented Feb 27, 2019

bergsieker commented Feb 27, 2019

EricBurnett commented Feb 27, 2019

EricBurnett commented Feb 27, 2019

bergsieker commented Feb 27, 2019

t-chaik commented Feb 28, 2019

EricBurnett commented Feb 28, 2019

ola-rozenfeld commented Feb 28, 2019 via email

t-chaik commented Mar 7, 2019

ola-rozenfeld commented Mar 7, 2019 via email

Capabilities in multi-endpoints configuration #61

Capabilities in multi-endpoints configuration #61

Comments

t-chaik commented Feb 27, 2019

ola-rozenfeld commented Feb 27, 2019

bergsieker commented Feb 27, 2019

EricBurnett commented Feb 27, 2019

EricBurnett commented Feb 27, 2019

bergsieker commented Feb 27, 2019

t-chaik commented Feb 28, 2019

EricBurnett commented Feb 28, 2019

ola-rozenfeld commented Feb 28, 2019 via email

t-chaik commented Mar 7, 2019

ola-rozenfeld commented Mar 7, 2019 via email