Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Standard tag for service name #77

Closed
StephenWithPH opened this issue Jun 17, 2017 · 31 comments
Closed

Standard tag for service name #77

StephenWithPH opened this issue Jun 17, 2017 · 31 comments

Comments

@StephenWithPH
Copy link

@tedsuo and @yurishkuro: this is a continuation of the conversation from #75.

This also has overlay with #58 from @ruinanchen

Referring back to that issue...several of the existing OT-compatible tracers make service name a first-class item. See:

https://github.com/openzipkin/zipkin-go-opentracing/blob/master/zipkin-recorder.go#L71
https://github.com/uber/jaeger-client-go/blob/master/tracer.go#L71
https://github.com/instana/golang-sensor/blob/master/options.go#L6
https://github.com/hawkular/hawkular-apm-opentracing-javascript/blob/master/lib/deployment-meta-data.js#L28

I believe guidance on a standard tag for service name would be useful. Yes, this need is largely driven by UI considerations, but "which service?" is a valid question.

Quoting @tedsuo

We set the lightstep.component tag to mean "service/tracer", which was then poached by the official component tag, which means something else. So we would also like an official tag for this. Otherwise we will change our tag to service only to have that get poached to mean something else.

Quoting @yurishkuro

We use the term "service name" to describe the process (aka microservice) that uses an instance of a tracer. It's not a span-level information, but process-level, so there is no need for a span tag, we capture it as a "tracer-level tag" if you will.

In our case, our "tracer" is just raw spans to stdout. That's why we want the span itself to reflect the service. We could certainly append something in the process that picks up stdout and ships, but we feel it's cleaner to just add a tag on the span.

Thoughts?

@yurishkuro
Copy link
Member

@StephenWithPH

In our case, our "tracer" is just raw spans to stdout.

If you're implementing an OpenTracing API, you still have a Tracer object, don't you? So you can record the service name in that object and when instrumentation calls tracer.StartSpan() you can attach that as an attribute of the span.

The main point of OpenTracing is that people can reuse instrumentation of various frameworks, we already have plenty of those in opentracing-contrib. How would all those frameworks know what your service name is when they create new spans?

@StephenWithPH
Copy link
Author

@yurishkuro "you can attach that as an attribute of the span"... that's exactly what we're doing.

I wanted to (re)open the conversation about a standard, suggested tag for this. I believe @tedsuo expressed similar interest.

Thoughts on whether or not you'd like to see a PR to update the documentation with a standard tag (service) for this?

@yurishkuro
Copy link
Member

Well, the standard tags exist for the purpose of letting instrumentation know which tags to use for which semantic data elements. In your case the tag is added by your tracer implementation, not instrumentation, so it does not need any standardization, you can name it anything you want, as long as your tracer and the tracing backend agree on what it is. Am I missing something?

@StephenWithPH
Copy link
Author

Nope, not missing anything. This is more of a "people (n >= 3) keep asking for it because it seems like it should be there."

I'll wait for any input from @tedsuo; otherwise, I'll close this in a few days.

@yurishkuro
Copy link
Member

We just need a clear use case that affects vendor-neutral instrumentation.

@tedsuo
Copy link
Member

tedsuo commented Jun 20, 2017

@yurishkuro we have some cases where instrumentations want to override the service tag we are setting by default, especially proxies and sidecar services that are reporting on behalf-of/multiplexing other services. Right now these instrumentations must override lightstep.component_name to change this, which doesn't feel vendor-neutral, it feels like a leak. I would prefer they do the override with a standard tag like service instead, and hopefully other tracers will choose support this behavior. Does that make sense?

@tedsuo
Copy link
Member

tedsuo commented Jun 20, 2017

To be clear: if tracers don't support the service tag as an override for their service-name concept, then fine. The instrumentation/tagging is still useful to users of that tracer, they can still do searches/filters using the service tag. The issue is if tracers want to support the overriding of the tracer/process-level service-name concept without a vendor-specific tag.

So right now, the problem is:
a) Telling instrumentors to use lightstep.component_name is weird.
b) If we instead use the service tag to mean lightstep.component_name without standardization, we risk having the meaning of service be defined later to be something else, which risks instrumentations setting it for that purpose and creating unintended behavior. This has already happened to us with the component tag and we are concerned about it happening again.

@yurishkuro
Copy link
Member

@tedsuo to make sure I understand: a single process (e.g. a routing proxy like Envoy) creates spans on behalf of many different services, via a single instance of the Tracer, therefore you want the instrumentation to tell the Tracer on a per-Span basic the service name it represents.

If that's correct, then it's a much large scope than simply defining a standard tag, it's also prescribing new behavior to the tracers. For example, if you use Jaeger and do span.SetTag("service", "whatever"), it will have no impact on the service name reported in Jaeger backend, because per-Span service name is simply not supported (the tag will be stored on the span, but not interpreted). It wouldn't be too difficult for us to support it, but we should be clear about the scope of this change.

@tedsuo
Copy link
Member

tedsuo commented Jun 20, 2017

Yes, that exactly it! Only I think there's no problem if Jaeger does not support it. You can still search by tag:service="myService" in Jaeger or Zipkin and gain insight with these instrumentations. So it's opt-in. But it would be a problem if Jaeger and other tracers started interpreting service to mean something else. That's why we would like to reserve it's meaning.

Specifically, we want to reserve the tag service to mean a "standalone" service, which by default matches the "tracer name" or equivalent field set on global tracer initialization. It's the right complement to the component tag that exists to identify packages, frameworks and modules such as gRPC and JDBI that are incorporated into services.

@yurishkuro
Copy link
Member

I think there's no problem if Jaeger does not support it. You can still search ...

Yes, one can search by the tag, but all aggregations will be incorrect as the spans will be attributed to the service name given at Tracer construction, not according to the "service" tag.

I don't mind reserving the service tag with a description "reserved for future use". To define the tag with the actual semantics discussed above is a bit premature, imo, because we're saying it is a standard tag while no tracers actually support it yet.

@jpkrohling
Copy link

We had something similar for Hawkular APM: during the "report" phase, we'd check if the application has set the service name. If not, we'd derive it from a configuration option, or from an env var:

https://github.com/hawkular/hawkular-apm/blob/master/client/opentracing/src/main/java/org/hawkular/apm/client/opentracing/EnvironmentAwareTraceRecorder.java#L43

I do think it makes sense to allow specific spans to set their service names, specially if they are dealing with out-of-band data or processing (like parts of a batch, queued items, ...).

@tedsuo
Copy link
Member

tedsuo commented Jun 20, 2017

Thanks @jpkrohling.

@yurishkuro maybe we are still lacking some process around standardizing tags. I don't know what it means to reserve a tag but not define it's meaning. Are we missing an incubation step? It feels kind of chicken and egg: either we provide a vendor-neutral tag like service for tracers to support multiplexing and changing the service name, or each tracer has to make up a vendor-specific solution. The second is already happening, it's what I'm trying to get in front of right now.

I agree that we should should be slow to add standard tags, while avoiding overly narrow solutions and non-existent problems. But, I believe service qualifies for standardization:

  • The concept of a standalone service is nearly universal.
  • Virtually every tracer supports a "tracer name" that maps to this concept.
  • Multiplexed/out-of-band reporting is a real, established pattern, which needs a facility to identify which service it is reporting for.

To solve this issue without vendor-specific solutions, a service tag can be introduced with the following properties:

a) the service tag defines a span to be the start of a standalone service, rather than a component or library. Libraries and shared components should not set this tag unless they are multiplexing and reporting on behalf of other services.

b) if a tracer provides a "tracer name," "service name," or similar concept, and would like to allow OT instrumentation to override it, the service tag is recommended for that facility.

@yurishkuro, is your preference to standardize on a), and "incubate" b)?

@yurishkuro
Copy link
Member

maybe we are still lacking some process around standardizing tags.

Indeed. Partially because until now support for tags was entirely optional for tracer implementations, in the sense that almost none of them required any special behavior of the tracer aside from simply recording the tag on the span. The two minor exceptions are span.kind=server (only critical for Zipkin's single-span-per-RPC model) and sampling.priority (only relevant to tracers that use consistent sampling). The impact of the "service" tag, on the other hand, is very major. The ability of a single tracer instance to represent multiple services wasn't a use case that we thought of when developing OT API; if we did , then it probably would've been a part of the API itself. For example, it's quite difficult to make any non-trivial upfront sampling decision if you don't even know the service name when starting the span.

So what I am really trying to do is to take into account (intensive) past criticism of introducing features into the API before they are widely supported by the existing implementations. If we define the tag with a) and/or b) definition, it sets the expectation that such behavior should be generally supported by many implementations, which in case of Jaeger is a non-trivial amount of work across multiple languages.

So I am not sure how to answer your question. I do think we need some incubation process for the new tags of such critical impact. One possible way to do that is to say "It's incubating with a) and b) semantics, but the semantics might change prior to graduation".

@beberlei
Copy link

beberlei commented Jul 7, 2017

Question about service, if you have multiple MySQL databases, would service be service=mysql or would it be service=mysql://cluster or dbname

@tedsuo
Copy link
Member

tedsuo commented Jul 14, 2017

@yurishkuro I think you are right that we should develop the usecase further. I disagree that this tag would force anyone to change or be "broken" in regard to OT: your tracer will work fine, there is just nothing special about the service tag. I'm interested in getting out-of-band reporting to work via a tag convention specifically because it would allow us to add this concept to OT without making breaking changes, such as a new StartSpanOption or other changes that would literally break the API for tracers, regardless of whether you support this usecase.

So really, this is about incubating an entire concept: out of band reporting. I will think more about how we should do this: working groups, etc, so that when things are finally standardized they have been reviewed properly. You may well be right that there is interest in supporting out-of-band reporting in the OT community, but it requires deeper changes that need review and new API surface area.

In the meantime... we're going to start setting the service tag to mean this in Envoy, NGINX, Vanish, and other load balancers, to try this out. This gets that usecase unblocked and in a state where other tracers can choose to adopt or experiment with it. And I am going to try to prevent the term "Service" from being associated with a different concept in OT so that tracers which respond to the service tag in this manner will not run into trouble, or create confusion. If we choose a different, more official mechanism later we will switch from tags to that. But I would still like this concept to be named "Service".

@beberlei I think service naming conventions are user preference, based on how you think of your system. If you were going to draw a boxes and arrows diagram of your system, "service" would be the name your wrote on each box. So for systems that are more complicated "mysql-db" may not be enough. In general, db services I have seen tend to be named things like auth-db and image-cache.

As far as reserving the name "Service" for this concept in the OT universe: Tracers in general have a concept of "service name," often set on tracer initialization, that is a very important part of how they index things. What appears to be shaping up in OT is that "operations" exist in a "component" namespace, and components exist in a service namespace. So the "fully qualified span name" is often something like service:operation or service:component:operation. It's not too interesting to compare /user/account operations that are part of different services. So you end up wanting to look at app:/user/account in order to eliminate noise. I feel like this is fundamental and all tracers must deal with it somehow; in practice it's not possible to disambiguate all operation names in a distributed system without a namespace like this. But half of this concept lives outside OT at the moment. We should at least name it.

@tylerbenson
Copy link
Contributor

At Datadog we have internally added the some tags, but are seeing how this issue resolves to influence further decision.
The tag names we're urgently using are "span-type", "service-name", and "resource-name". Type being web, db, cache, etc. Service name is obviously synonymous with "service" here. Resource name is provided when an additional level of grouping makes sense like table name or controller.

I guess my point is that I support better standardization of tags that would support finer grained grouping.

@StephenWithPH
Copy link
Author

@tedsuo's entire paragraph from #77 (comment) ...

As far as reserving the name "Service" for this concept...

... perfectly summarizes what I was trying to articulate earlier on in this discussion.

@yurishkuro
Copy link
Member

Is there anything blocking this from moving forward?

Does the following plan make sense?

  • add service tag
  • mark it as "incubating"; explain elsewhere that "incubating" means the meaning of the flag might (although unlikely to) change in the future if the current definition doesn't work out
  • describe it as a way to generate spans for different services from a single Tracer instance

@wu-sheng
Copy link
Member

How to define the diff between peer.service and service tag?

This question is similar with @beberlei . According to @tedsuo 's explanation:

@beberlei I think service naming conventions are user preference, based on how you think of your system. If you were going to draw a boxes and arrows diagram of your system, "service" would be the name your wrote on each box. So for systems that are more complicated "mysql-db" may not be enough. In general, db services I have seen tend to be named things like auth-db and image-cache.

It it up to user, if so, hard to tell the difference.

@yurishkuro
Copy link
Member

"peer service" means "the other service". I don't think its confusing. Both tags refer to the same domain of values - the names of the services in the architecture.

@wu-sheng
Copy link
Member

IMO, if a span represents a client for calling remote service, the service tag for this span is also peer.service.

@mabn
Copy link

mabn commented Jul 18, 2017

@wu-sheng When A calls B and they report spans on both sides of the RPC then:

A reports (the client-side of the RPC):

  • service: A
  • peer.service: B (because A called B)

B reports (the server-side of the RPC):

  • service: B
  • peer.service: A (because it was called by A)

peer.service may be unknown

@wu-sheng
Copy link
Member

@mabn If A is just a client, I didn't think A will set itself as a service tag, so as B.

IMO, A and b are a pair for this RPC call, they share the same service name. e.g. /prod/order service name is both correct for http client and server sides. Btw maybe operation names are different, like apache/httpclient/post/order at client, tomcat/http/post/order at server.

@tedsuo
Copy link
Member

tedsuo commented Jul 19, 2017

@wu-sheng I know, this is hard, right? There are only so many words, we are going to use them up quickly. :)

In this case I agree with @mabn, service and peer.service are different. I understand what you are saying about the client being an embedded part of the peer service, and so that is the name of it's service, but I think that is the purpose of peer.service - to allow the client a place to put that information. So client code sets the peer.service tag to the target service, and out-of-band reporting code would set service as the reporting service. RPC client code (and application code in general) should never set the service tag - either it is set implicitly on the tracer, or explicitly by whatever mechanism is trying report on behalf of multiple services, or some other special case where you have multiple services but one tracer.

I'm glad to see that @StephenWithPH @tylerbenson and others have a similar concept for service. @wu-sheng are you satisfied with this reasoning enough to try it out? If so I will make a PR along the lines of what @yurishkuro has suggested.

@wu-sheng
Copy link
Member

wu-sheng commented Jul 20, 2017

RPC client code (and application code in general) should never set the service tag - either it is set implicitly on the tracer, or explicitly by whatever mechanism is trying report on behalf of multiple services, or some other special case where you have multiple services but one tracer.

If we have service and peer.service at the same time, we really should add explicit usages for them, like you said. @tedsuo Otherwise, it hard to use and support.

  • peer.service : Client side only. Remote service name (for some unspecified definition of "service"). E.g., "elasticsearch", "a_custom_microservice", "memcache"
  • service : The server side provided. E.g. http:/prod/order/ as a HTTP service.

@tedsuo
Copy link
Member

tedsuo commented Jul 21, 2017

Thanks @wu-sheng, I'll try to make it clear.

BTW @tylerbenson, your tag service-name matches service, and span-type sounds like it matches component. But there is not currently an OT equivalent of resource-name. Possibly this is because most of the tags to date come from instrumenting libraries, which tend to have a narrow focus. I can see service:component:resource:operation being useful in large applications and frameworks; I wonder if that's what people are searching for in this issue: #72.

@yurishkuro
Copy link
Member

I was thinking more about this recently. In Jaeger we identify the service (originator of the span) via a Process object that contains not only service name, but also a collection of key/value tags that typically represent other metadata about the service, such as host/ip where the service is running, maybe software version, zone/datacenter, deployment group, etc. Simply setting the service name via tag doesn't allow for expressing this level of service identity & metadata. It's also going to be quite inefficient to do for every span since the tracer internally pre-processes the service metadata into a format ready to be sent over the wire (or even communicating it to the backend upon establishing the connection). So it seems like instantiating multiple tracers in the proxy service for this scenario would be a better approach.

@tylerbenson
Copy link
Contributor

@yurishkuro in our case, we only set the service name tag on the top span of each name, then it implicitly cascades (until it is manually changed to something else). In my opinion, is much easier to deal with than managing multiple different tracer instances.

@yurishkuro
Copy link
Member

I don't have strong objection to introducing this tag.

@tylerbenson
Copy link
Contributor

@yurishkuro @tedsuo Should I just submit a PR, or is there a different process for this?

@yurishkuro
Copy link
Member

I think a PR is fine. As it's adding a new tag it doesn't need to go through the full RFC cycle. The PR is just to add the tag to the data conventions. It's description should spell out what the tracers are expected to do with it if they support it.

tylerbenson added a commit that referenced this issue May 30, 2018
Per discussion in #77, service name is deemed a widely enough used concept to warrant adoption by the community.

(Closes #77)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants