Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename cloud.platform to cloud.service.name #344

Closed

Conversation

kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented Sep 25, 2023

Changes

This PR is to rename cloud.platform field name to cloud.service.name which is from ECS: https://www.elastic.co/guide/en/ecs/current/ecs-cloud.html#field-cloud-service-name. These two fields represent the same thing but cloud.service.name is more readable IMHO.

cc @AlexanderWert @ChrsMark

Merge requirement checklist

@kaiyan-sheng kaiyan-sheng requested review from a team September 25, 2023 20:50
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Sep 25, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@Oberon00
Copy link
Member

Oberon00 commented Sep 26, 2023

service.name is something completely different from cloud.platform (proposed cloud.service.name). IMHO this renaming is confusing.

@trask
Copy link
Member

trask commented Sep 26, 2023

I also prefer cloud.platform, primarily because of potential confusion between cloud.service.name and service.name.

@kaiyan-sheng I'd recommend raising this PR in next week's semconv group since we've been discussing there what to do in case of conflict between ECS and OpenTelemetry naming.

@pyohannes
Copy link
Contributor

I don't think the co-existence of service.name and cloud.service.name is ideal.

However, in my opinion the term "platform" is unfortunate too, as without reading the attribute description I'd think it refers to the cloud provider. "Service" seems to me a more appropriate name of what we want to capture in this attribute.

From Wikipedia:

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services [...]

From What is Azure?:

The Azure cloud platform is more than 200 products and cloud services [...]

From the description of https://aws.amazon.com:

Amazon Web Services offers reliable, scalable, and inexpensive cloud computing services.

@AlexanderWert
Copy link
Member

AlexanderWert commented Sep 27, 2023

@Oberon00

service.name is something completely different from cloud.platform (proposed cloud.service.name). IMHO this renaming is confusing.

@trask

I also prefer cloud.platform, primarily because of potential confusion between cloud.service.name and service.name.

I don't fully get that concern. Do you mean that cloud.service.name would be confusing because it contains service.name in it's name? The proposal here is not related to service.name at all. And the cloud.* namespace makes it explicit that it's something different.

Following that logic, the attribute db.system would be also confusing because there is a system namespace with all the metrics like system.cpu.time.

Important clarification in this context:

I think some of the confusion comes also from the fact that ECS' cloud.service.name has a broader meaning than cloud.platform in semconv:

  • cloud.platform is "only" a resource attribute in OTel. So its intent is to describe compute-type cloud services (such as EC2, EKS, AWS Lambda, Azure functions, etc.). But, it would never describe other cloud services, such as SQS or DynamoDB.
  • cloud.service.name is an attribute that could be used in general to describe any kind of a cloud services including:
    • using it as a resource attribute to describe one of the compute-type cloud services that a service is running on
    • or it could be used as a signal-level attribute to describe a related cloud services. Examples:
      • Let's say we want to capture in an instrumentation that a message has been received from AWS SQS (e.g. as a more detailed information on the trigger in an AWS Lambda scenario, or any other messaging scenario). This scenario is not covered by cloud.platform at all.
      • Or, we simply collect / retrieve logs from cloud services such as AWS SQS, AWS ApiGateway, etc. and we want to annotate those logs with that information. Again, cloud.platform doesn't sound like a good fit for that.

So I think the general question is: Do we want to allow attribute names to be used as resource attributes AND signal-level attributes depending on the context or not?

If yes, IMHO replacing cloud.platform with something more intuitive (like cloud.service.name) would be beneficial.
If not, then I think cloud.platform (as a resource attribute) and cloud.service.name (as a signal-level attribute) could co-exist.

@joaopgrassi
Copy link
Member

joaopgrassi commented Sep 27, 2023

I don't fully get that concern. Do you mean that cloud.service.name would be confusing because it contains service.name in it's name? The proposal here is not related to service.name at all. And the cloud.* namespace makes it explicit that it's something different.

Not sure if it's far fetched or even part of other's concerns, but I can see maybe people that are now familiar with service.name might think cloud.service.name is a different attribute to report service.name when running "in the cloud", so they would start using that instead of service.name.

I don't think having cloud. makes it explicit that it's something different.

Different from ECS, but another idea could be: cloud.compute_service?

@AlexanderWert
Copy link
Member

@joaopgrassi

What do you think about that aspect?:

So I think the general question is: Do we want to allow attribute names to be used as resource attributes AND signal-level attributes depending on the context or not?

If yes, IMHO replacing cloud.platform with something more intuitive (like cloud.service.name) would be beneficial.

If not, then I think cloud.platform (as a resource attribute) and cloud.service.name (as a signal-level attribute) could co-exist.

If we keep cloud.platform as a pure resource attribute, then the naming is not that bad.
But what about the other scenarios I described above (i.e. in use-cases where we need that as signal-level attribute)?

@joaopgrassi
Copy link
Member

About using resource attributes as signal attributes is something that we need to further discuss and assess probably. I haven't thought much of the possible implications with it.

For the other scenarios, like the log collection, maybe there could be a "source" attribute to denote where the log comes from. For the other, I'm not sure maybe there's value in adding such attributes but I usually suspect people already know where their things are coming from? Also, vendor-specific attributes exist like the ones for AWS s3 and instrumentations of such providers could also add their own things. Not sure that's something us as semconv should do here.

@Oberon00
Copy link
Member

Oberon00 commented Sep 27, 2023

@AlexanderWert For a signal-level attribute, we would use rpc.service I think. This is what we did for AWS at least, and it feels quite natural to me: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/cloud-providers/aws-sdk.md

@AlexanderWert
Copy link
Member

For a signal-level attribute, we would use rpc.service I think. This is what we did for AWS at least, and it feels quite natural to me: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/cloud-providers/aws-sdk.md

@Oberon00 I agree that rpc.service makes sense when we are considering tracing (i.e. a span representing a call (RPC) to a cloud service). But I'd like us to think broader (than just tracing use cases) when we discuss semantic attributes (especially in the case of the ECS merger). I really think that one of the biggest values of semantic attributes is the cross signal / cross use case correlation of data through consistent naming of information. So, if we focus only on tracing now and ignore other relevant use cases we might need to break (stable) attributes in the future or, otherwise, not achieve that mentioned consistent naming. Also, I think it's important to think about the consumer-side of the data (not only about collection-side).

A very simple use case / example:
Let's say we want a simple dashboard that just lists all the AWS services (of any type, messaging, compute services, storage, etc.) in use, independent of what signals are being collected for those or whether those are resources, being called from my other services or metrics are being gathered from externally. If we have a consistent name that is cloud.service.name, that's an easy task, right? The query would be cloud.provider = aws AND GROUP BY cloud.service.name. Otherwise, you'd need to know that this information can be spread across cloud.platform, rpc.service, logs.source, etc..

Don't get me wrong, I'm not pushing for the rename of the cloud.platform.
But with the example in this PR I'd like us discussing in general:

  • whether we want resource attribute names to be reusable for signal-level attributes (I think there are multiple examples when this can be useful, I'll create a separate issue for that)
  • when defining new attributes think about how they might be related to other signals and use cases (especially logging use cases that are currently only defined on a general level in OTel and likely to evolve into more use-case specific definitions / semantic conventions)

@pyohannes
Copy link
Contributor

  • whether we want resource attribute names to be reusable for signal-level attributes (I think there are multiple examples when this can be useful, I'll create a separate issue for that)

@AlexanderWert There already is the quite old (and unsolved) open-telemetry/opentelemetry-specification#1367 that touches the topic from the other way round: using signal-specific attributes as resource attributes.

@Oberon00
Copy link
Member

Oberon00 commented Sep 28, 2023

I agree that rpc.service makes sense when we are considering tracing (i.e. a span representing a call (RPC) to a cloud service). But I'd like us to think broader (than just tracing use cases) when we discuss semantic attributes

@AlexanderWert Why do you think that rpc.service would only make sense for tracing? I think it should just as well make sense for other signal types.

If we have a consistent name that is cloud.service.name, that's an easy task, right?

If that name is rpc.service, it will also work, won't it?

cloud.platform on the resource means: This traced service / process is running on that cloud platform (e.g. this is an AWS Lambda function). On the other hand the signal-level cloud service will mean "this relates to a call to this cloud service" (typically on the client side, but in principle also possibly on the server side). These are different things semantically.

You will very typically have a cloud.platform that is different from the rpc.service when calling a cloud service, e.g. something running on cloud.platform aws_ec2 might be calling into rpc.service awslambda. If both are called cloud.service.name, then you have a name collision (at least as far as I know, most backends will have trouble with dealing with overlapping signal/resource attributes or simply override in a particular direction and AFAIK it is even an open spec point if resources + signal attributes are logically in the same "object" or should be distinguishable sematically)

@AlexanderWert
Copy link
Member

@AlexanderWert Why do you think that rpc.service would only make sense for tracing? I think it should just as well make sense for other signal types.

rpc.service implies through the namespace that there's a call (Remote Procedure Call) being made to that service and that the information is about that remote call. Agree, not necessarily tracing but in the context of a call being made.
But let's take the case when you collect data (e.g. logs, events or metrics) from a cloud service (and this data is not related to a remote call) that is not a compute-type service (e.g. messaging service, storage, etc.). Do you agree that both cloud.platform and rpc.service are not suitable to capture the information about the cloud service it refers to?

You will very typically have a cloud.platform that is different from the rpc.service when calling a cloud service, e.g. something running on cloud.platform aws_ec2 might be calling into rpc.service awslambda. If both are called cloud.service.name, then you have a name collision (at least as far as I know, most backends will have trouble with dealing with overlapping signal/resource attributes or simply override in a particular direction and AFAIK it is even an open spec point if resources + signal attributes are logically in the same "object" or should be distinguishable sematically)

Agree with these challenges! At the same time there are several examples where reusing attribute names for resource and normal attributes would make sense (also some listed here: open-telemetry/opentelemetry-specification#1367). And that's why I think we should discuss it and make it explicit (allow or disallow). But let's maybe move the discussion on this topic to new, separate issue.

@Oberon00
Copy link
Member

But let's take the case when you collect data (e.g. logs, events or metrics) from a cloud service (and this data is not related to a remote call) that is not a compute-type service (e.g. messaging service, storage, etc.). Do you agree that both cloud.platform and rpc.service are not suitable to capture the information about the cloud service it refers to?

Ok, so this would apply e.g. when you have a solution that actively queries, for example metrics, from dynamodb but not related to particular calls? I think this scenario has been outside the scope of OTel so far...
Was this discussed already, maybe on the ECS OTEP or elsewhere? I think I may be lacking in knowledge about this discussion.

@kaiyan-sheng
Copy link
Contributor Author

Thanks everyone for the comment. In elastic, we have integrations that collects metrics and logs from cloud providers like AWS, GCP and Azure. For example, metrics about s3 bucket would have cloud.service.name set to s3 or aws_s3. When I first see the field name cloud.platform, I immediately thought about aws, gcp and etc, basically cloud.provider instead of the actual public cloud service name.

@kaiyan-sheng I'd recommend raising this PR in next week's semconv group since we've been discussing there what to do in case of conflict between ECS and OpenTelemetry naming.

@trask Sorry Im on vacation next week but I can definitely raise it the week after! Thanks!

@AlexanderWert
Copy link
Member

AlexanderWert commented Sep 29, 2023

I think this scenario has been outside the scope of OTel so far...
Was this discussed already, maybe on the ECS OTEP or elsewhere? I think I may be lacking in knowledge about this discussion.

@Oberon00 I don't think we explicitly discussed this, yet, but it's described on a high level in the OTEP for the ECS merger. To make OTel the ubiquitous framework and standard for observability I think these use cases (as @kaiyan-sheng outlined in #344 (comment)) is something we should include into the scope for OpenTelemetry in the future. And that's also the big value proposition of merging ECS with semconv as it covers many of these use cases.

@Oberon00
Copy link
Member

Oberon00 commented Sep 29, 2023

Just to note, I agree that service.platform is not a particularly good name, especially if we use it for non-compute services (platform is somewhat better when something runs "on" it, like AWS Lambda being the platform where you deploy your service).

Maybe it would make sense to have all 3:

  • One attribute for signals about operations involving calls to a cloud service: The existing rpc.service
  • One to denote the (compute) cloud service the service the signal is about is running on: The existing cloud.platform
  • A new one for when a signal should be annotated as relating to a cloud service but where no operation is directly involved. I still think cloud.service.name is a bit confusing (also: Why not cloud.service_name?) but probably the ECS compatibility could be worth the confusion.

For example, imagine you are instrumenting something part of DynamoDB (new attribute) itself making a call to CloudWatch (rpc.service) and it runs on EC2 (cloud.platform).

On second thought... For that 3rd use case, why can't we just use the existing service.name (without cloud.)?

@kaiyan-sheng
Copy link
Contributor Author

Thank you all for your inputs!! Sorry I misunderstood the field cloud.platform. Original I thought the cloud.platform field from otel and cloud.service.name from ECS represented the same thing but turned out cloud.platform is a subset of cloud.service.name. I will close this PR for now.

Instead of renaming cloud.platform, I think it's better to add cloud.service.name into semantic convention. Please see #425 for more details.

@kaiyan-sheng kaiyan-sheng deleted the cloud_service_name branch October 18, 2023 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants