Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capabilities in multi-endpoints configuration #61

Closed
t-chaik opened this issue Feb 27, 2019 · 10 comments · Fixed by #76
Closed

Capabilities in multi-endpoints configuration #61

t-chaik opened this issue Feb 27, 2019 · 10 comments · Fixed by #76

Comments

@t-chaik
Copy link

t-chaik commented Feb 27, 2019

The REAPI describes four services: Execution (EXEC), ActionCache (AC), ContentAddressableStorage (CAS) and Capabilities. The services do have inter-dependencies but the specification doesn't restrict nor advocate on how separated a server implementation can or should expose them. Clients tend to support only some possible configurations:

  • Bazel has support for two (as far as I know):
    • EXEC + CAS + AC available at one end-point.
    • EXEC separated from CAS + AC.
  • BuildStream has support for any: three different endpoints can be specified for the three main services.
  • RECC has support for two (goal is to support any):
    • EXEC + CAS + AC available at one end-point.
    • EXEC + AC separated from CAS.

The Capabilities allow requesting server capabilities at a given endpoint. The ServerCapabilities messages contains two sets of capabilties:

  • CacheCapabilities relevant for CAS and AC.
  • ExecutionCapabilities relevant for EXEC.

The specification currently doesn't mention anything about how the server should advertise capabilities. Disparities on client expectations already exists. For example Bazel expects all capabilities to be served at the EXEC end-point (event if CAS + AC are separated) while BuildStream expects a Capabilities service to be available at every endpoint but only advertising relevant capabilities for the services hosted at that endpoint .

I think we should discuss and agree on how capabilities should be advertised by server implementations and how client implementations should query them depending on the endpoint configuration. Sensible conclusions should probably be part of the specification.

@ola-rozenfeld
Copy link
Contributor

Yes, agreed, we should be more explicit in the spec! One thing I wanted to avoid is to spec out the full system configuration in the ServerCapabilities -- which endpoints there are altogether, what services do they support, and with what properties. I think it's useful to distinguish between properties of a particular endpoint vs properties of the service as a whole, and I chose to have ServerCapabilities represent the service as a whole, because that is a lot easier to describe (common elements, inter-dependencies), and also more useful to clients. But I see how that can be misleading -- should all endpoints in the system return all the capabilities, then? My idea was this:

If an execution endpoint is present in the system, it should return the ServerCapabilities of the whole system. All other endpoints should return the same response, except for ExecutionCapabilities.

It is a bit ugly idea, but it is simple.
One alternative I see would be to formalize the EndpointServices idea and add something like:

message EndpointServices {
  boolean action_cache = 1;
  boolean cas = 2;
  boolean execution = 3;
  // Be more specific about services? Add others, like Operations?
}

message Endpoint {
  string address = 1;
  EndpointServices = 2;
}

message ServerCapabilities {
  ...
  repeated Endpoint service_endpoints = 6;
  repeated Endpoint current_endpoint = 7;
}

WDYT?

@bergsieker
Copy link
Collaborator

I'd suggest something similar to what BuildStream expects: each endpoint should implement the Capabilities service, and populate the fields relevant to the services available at that endpoint. Clients may choose to ignore irrelevant capabilities if the client does not plan to use a given service on the specified endpoint.

As an example, consider this setup:

  • endpointA hosts Exec+AC+CAS. The exec service at this endpoint is capable of using either the local CAS, or an external CAS hosted at a different endpoint. Calls to GetCapabilities return the relevant capabilities for all three services, but there's no explicit annotation that an external CAS is permitted.

  • endpointB hosts only CAS. Calls to GetCapabilities return only the capabilities relevant to the CAS.

A client specifying endpointA for Exec+AC+CAS would call GetCapabilities on endpointA, and use the returned values for all three services.

A client specifying endpointA for Exec+AC and endpointB for CAS would call GetCapabilities on endpointA and use the result for only Exec+AC, and would also call GetCapabilities on endpointB and use the result for CAS.

It's the responsibility of the user to understand what combinations are permitted by their services.

@EricBurnett
Copy link
Collaborator

A couple thoughts:

What is the purpose of Capabilities? I think it serves two distinct but related purposes, which we may want to try to separate, though it's not that easy in practice. First, it allows for specifying the capabilities of a "service". Can a CAS implementation support symlinks? May priorities be used on Execution? Etc. Second, it allows (implicitly, today) - for specifying how endpoints relate. For example, the same digest function needs to be used across the different services; for most implementations Execution may only be used with the corresponding CAS (they're not freely composable), etc.

These feel separable, but I'm not sure that's true - even the "independent" capabilities are often not. For example, whether or not the CAS supports symlinks and whether or not you can send an Execution request that includes symlinks are related.

In RBE for a while we had planned to separate the CAS and Execution APIs onto separate endpoints. I still like it in concept, but we ultimately cancelled (or at least shelved indefinitely) that plan, in large part because of the complexity it introduced throughout the system - clients like bazel needed to gain complicated configuration knobs to allow specifying the multiple endpoints and their relationships; we lost an authoritative source for "capabilities" across the system; we had to answer questions about composability of constructs that aren't freely composable in practice, and it didn't buy us as much as we expected in terms of flexibility we'd actually capitalize on.

How can we make this usable in practice? That's really the ultimate goal here - separate endpoints and separate capabilities sound great, but it ultimately has to be borne out in configurations that are sane for clients and servers to implement and for users to configure, and we've struggled to achieve this in the past. One idea that was floated internally (I can't remember by who - @ulfjack or @ola-rozenfeld I think?) that I lean more and more towards is to have a single configuration API that returns capabilities as well as endpoints across all the different services. It would be responsible for returning internally consistent information - endpoints that work together, capabilities that aren't mutually contradictory, etc. But there are still open questions in my mind - what do we do for simple implementations that don't want to implement the capabilities API? If anyone actually wants to have composable endpoints, who hosts this API and how does it know what to return? Does this lead us to putting too much burden on clients to support a lot of configurability under the hood? Etc.

@EricBurnett
Copy link
Collaborator

@bergsieker
Copy link
Collaborator

For this iteration of the API, I strongly prefer NOT to switch to a "capabilities endpoint returns all the configuration information you need, including the relevant endpoints" model. That model may be preferable for a v3 (or later) version of the API, but it introduces significant complexity in the clients, for the sake of a benefit that's not yet really clear. I think we don't yet really know what configurations of endpoints are practically useful, and it would be better to determine that over time than designing a complicated system to manage arbitrary complexity that ends up not being necessary.

@t-chaik
Copy link
Author

t-chaik commented Feb 28, 2019

Each endpoint should implement the Capabilities service, and populate the fields relevant to the services available at that endpoint. Clients may choose to ignore irrelevant capabilities if the client does not plan to use a given service on the specified endpoint.

I would prefer that option too, think it's the most intuitive solution.

It's the responsibility of the user to understand what combinations are permitted by their services.

Very much agreed, and least in the context of the v2 API:

One idea that was floated internally (...) that I lean more and more towards is to have a single configuration API that returns capabilities as well as endpoints across all the different services. It would be responsible for returning internally consistent information - endpoints that work together, capabilities that aren't mutually contradictory, etc.

Inverting the current logic and have the service advertise configuration for its client indeed is a interesting idea. But I quite agree with @bergsieker here, this would be a major change that we probably don't want to sneak into v2 of the API.

@EricBurnett
Copy link
Collaborator

Yes, sorry, I didn't mean to imply that we should make such a significant change to the semantics of V2 (we couldn't, it's a breaking change). To that end, yeah, Steven's suggested approach for V2 SGTM as well. I'm interested to see how well "It's the responsibility of the user to understand what combinations are permitted by their services" will hold up in practice - if it does, great, no need for any major restructurings at all :).

@ola-rozenfeld
Copy link
Contributor

ola-rozenfeld commented Feb 28, 2019 via email

@t-chaik
Copy link
Author

t-chaik commented Mar 7, 2019

Am I understanding correctly that the difference between the suggestion and the status quo is that an execution-only endpoint would no longer return CacheCapabilities?

At least it would be acceptable for such an endpoint (Exec only) not to, and clients should ignore it if-ever it was returned by such an endpoint anyway.

Each endpoint should implement the Capabilities service, and populate the fields relevant to the services available at that endpoint.

My understanding then is:

  • Exec+CAS+AC endpoints should return CacheCapabilities + ExecutionCapabilities
  • Exec endpoints should return ExecutionCapabilities
  • CAS+AC endpoints should return CacheCapabilities
  • ...

@ola-rozenfeld
Copy link
Contributor

ola-rozenfeld commented Mar 7, 2019 via email

@bergsieker bergsieker added this to the Remote Execution API v2 milestone Mar 12, 2019
santigl pushed a commit to santigl/remote-apis that referenced this issue Aug 26, 2020
bazelbuild#61)

* Transforming the provided local output paths into uploadable Chunkers.

also constructing the ActionResult to be updated in the ActionCache. bazelbuild#58

* Addressing comments
bazel-io pushed a commit to bazelbuild/bazel that referenced this issue Sep 3, 2020
According to changes in bazelbuild/remote-apis#76 (and discussion in bazelbuild/remote-apis#61):
> Each endpoint should implement the Capabilities service, and populate the fields relevant to the services available at that endpoint. Clients may choose to ignore irrelevant capabilities if the client does not plan to use a given service on the specified endpoint.

This PR:
1. Refactor `RemoteModule` to use a `ChannelFactory` when creating channels so that tests can mock into channels connected to a fake server.
2. Add tests for verifying server capabilities.
3. Only check required capabilities for a given endpoint.
    - If `--remote_executor` and `--remote_cache point` to the same endpoint, we require that endpoint has both execution and cache capabilities.
    - If they point to different endpoints, we check the endpoint with execution or cache capabilities respectively.

Fixes #11901.

Closes #12008.

PiperOrigin-RevId: 329875749
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants