-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agents should report supported components #198
Comments
Related issue in the collector: open-telemetry/opentelemetry-collector#10570 |
We discussed in the SIG meeting that a generic We also discussed making a separate top level |
I think logically, you'd consider this part of the I think the primary advantage we discussed for the separate message was that it did not need to be sent if the OpAMP server does not use it. I'm not sure if that's a huge advantage or not though, given the |
Regarding the structure of the field, how would you propose representing the components for the collector in a map[string]string? Additionally, I feel like arbitrary metadata describing agent is already covered in the identifying/non-identifying attributes, how does this differ? |
We discussed this again today at the SIG. We primarily discussed adding this information to the ComponentHealth message, or to the AgentDescription. |
Here's what I'm thinking in terms of concrete details: We could imagine doing something like this: message AgentDescription {
// ... snip ... //
// The list of components that are available for this agent to configure.
repeated AvailableComponent available_components = 4
}
message AvailableComponent {
// The ID of the component. This ID MUST be unique within the list of
// available components.
string id = 1
// Extra key/value pairs that may be used to describe the component.
// The key/value pairs are according to semantic conventions, see:
// https://opentelemetry.io/docs/specs/semconv/
repeated KeyValue metadata = 2
} The available components is pretty free-form; You are only required to specify an ID for each component. In the case of the collector, ID would map to the type string + class of the component (e.g. For extra metadata, we'd report the following:
We could report these either as two separate fields, or as one field: two fields
one field
Additionally, the component type string and class can be derived from the ID, but you could consider also reporting these in separate fields in the metadata so we don't have to rely on IDs being formatted a specific way. I think we were debating putting this on ComponentHealth; I think the two are a little different. ComponentHealth is more about instances of components, as opposed to the types of components. In addition, ComponentHealth would be sent pretty frequently, which could end up bloating network traffic. I'd expect AgentDescription to be sent very infrequently (essentially only at agent startup or when agent identity changes), which would help keep the amount of network data sent down. |
@tigrannajaryan @andykellr Seeing any immediate issues with the above? If not, I can write it up and put it in a PR. |
I think the general structure of the message and making it a part of An alternative would be using a map where the
I am less certain on the best semantic conventions to follow for the module name and version fields, but that is specific to the implementation in the OpAMP extension. It would be useful to show an example using go in the spec, but in theory this could be any agent in any language reporting its available components. |
Since
|
I agree it belongs to AgentDescription. It would be nice if we could somehow make it uniform with ComponentHealth unless we think What would be some other agent examples (besides Otel Collector) where there is a concept of components and which could use this new data structure? Is there a |
Mirroring ComponentHealth definitely makes sense, I think that would end up something like this: message AgentDescription {
// ... snip ... //
ComponentDetails component_details = 4;
}
message ComponentDetails {
// A map of component ID to sub components details. It can nest as deeply as needed to
// describe the underlying system.
map<string, ComponentDetails> sub_component_map = 1;
// Extra key/value pairs that may be used to describe the component.
// The key/value pairs are according to semantic conventions, see:
// https://opentelemetry.io/docs/specs/semconv/
repeated KeyValue metadata = 2;
} This would be the closes to mirroring ComponentHealth; One thing to note here is that there is a single "root" component. On I think there are a lot of "pipeline"-based agents out there that have this concept of type, e.g.:
So I think it could make sense to have some concept of One thing I'm not clear on is You would already think I'll write this up into a PR, I think it'll be easier to comment on in that format. |
Thinking about this some more, I'm leaning toward making this a separate message with a separate flag for The issue is that this will likely be a large message and if you deploy 1,000 agents of the same type and version, they will likely all have the same components. I think it should be up to the server to decide that it needs the agent to report this information instead of assuming that it should always be sent as part of AgentDescription. |
This is a good point. We can add |
This makes sense to me. I think we'd use the same structure that exists in the current PR right now, just move it out of AgentDetails to the top level: message AgentToServer {
// -- snip --
map<string, ComponentDetails> available_components = 14;
} We would expand ServerToAgentFlags to include a ReportComponents flag: enum ServerToAgentFlags {
ServerToAgentFlags_ReportComponents = 0x00000002;
} In this case, the components would be requested separately from other state. That is, The assumption would be that if From the above, I'm wondering:
|
would adding a hash for the entire message be of use? - Meaning the server could compare and only take whatever action it desired should the hash be different than the previous reporting. |
Including a hash could help mitigate cases where the service name + version are the same, but there are actually different components there, which could definitely be useful. If we were to do that, I would definitely want another message wrapping the component map: message AvailableComponents {
map<string, ComponentDetails> available_components = 1;
bytes hash = 2;
} In this scheme, the hash would be reported when the If we feel that the name + version assumption is not solid enough to rely on, I think this doesn't that much extra complexity to it for some extra safety. |
I like the separate AvailableComponents message. It allows us to add additional fields in the future as needed. I also like reporting the hash, but I think the spec should be clear that we expect the message with the hash on every request and the message with the full component list when the flag is set. I would propose changing the flag to |
@tigrannajaryan Does this make sense to you? I'll update the PR if you're good with it. |
What's the typical expected agent<->server interaction here? Is it going to be this:
I am a bit worried about the number of roundtrips, especially for http transport, where it does not necessarily use the same connection. Should we make it a client choice to include available_components in the first AgentToServer if the client wishes so? That way we eliminate one roundtrip. Agents which know they have a very large list of components may choose to omit it in the first message. It would be also useful to quantify this. How many / how large is the available_components list for Otel Collector contrib? |
I imagine the interaction to be like this:
So in reality the extra roundtrips would only be on initial connection for a given I do think it would be fine for the agent to decide to send the available components before the server asks for them. As for quantification, in contrib, there are: Which makes a total of 200 separate components. I'm not sure where the cutoff of what would be considered "large" would be, but this seems on the large side of things for sure. |
|
@tigrannajaryan If you don't have any objections with the above, I'll get started on updating the PR to capture what was discussed here. |
SGTM |
I'll describe this in the context of OpenTelemetry, but similar issues will exist for other agents.
OpenTelemetry distributions may have different components included during the build. Before sending a configuration to an agent, the management server will want to know what components are available. Sending a configuration with unsupported components will cause an error. It may be desirable to avoid errors and only allow configurations to be applied to agents that are able to use them.
This is similar to agent capabilities but will need more than a bitmask to support arbitrary component names and versions.
This is similar to what is available in component health, but components not in use (but still supported) would not be reported.
This is similar to package statuses but components are not necessarily packages.
This could be reported with remote configuration but doesn't feel right.
The text was updated successfully, but these errors were encountered: