Amazon X-Ray interop #1754

codefromthecrypt · 2017-10-02T02:28:50Z

Similar to how last year, we had users requesting for Google Stackdriver compatibility, there are users explicitly requesting Amazon X-Ray, as well general support and interest from @abhiksingh (X-Ray product lead), and indirect comments which I can't currently find about tension between lambda and zipkin architecture*.

There's no doubt that Zipkin and Amazon interop have been important in the past. Many of our core team rely on AWS and/or make custom components for AWS. This issue will explore how we could fit-in, and how we could allow 3rd party tracers designed for B3 to have the smallest impact to support X-Ray.

Similar to StackDriver, there's two major concerns: propagation and out-of-band data.

Unlike StackDriver, propagation is wider than the X-Ray service. For example, in AWS the same propagation format is used even when X-Ray isn't: ELB uses the same format eventhough ELB doesn't write to X-Ray. There are also interesting concerns such as that Api Gateway restarts traces at its edge, also in X-Ray format. These types of concerns weren't present when we integrated with StackDriver, although they are building in the new trace-context format, targeted initially inside Google to StackDriver and gRPC services.

Also unlike StackDriver, the trace ID requires a 32-bit timestamp. This has some impact, as if there's an invalid timestamp, the service will drop any related data. For this reason, pragmatic ID generation strategy is something we must consider. For example, creating "interop" IDs by default where the first 32-bits are a timestamp and the following are 96-bits of random. This discussion occurred at the end of #1262.

Reporting is very much like what we did in StackDriver. The X-Ray format is span structured, with an api to post data to. While it has more structure than Zipkin data, mapping rules like exist in StackDriver are more than possible. Zipkin-compatible (or otherwise) tracers could send data to X-Ray's daemon (automatically present in lambda), the X-Ray POST api, or to a zipkin destination that does one of the two.

The above is fairly reliable from initial explorations, but could change with experience. This issue will track the exploration and any related issues on Zipkin's side.

If you are running a lambda (serverless) architecture, running a zipkin server, even a stupid proxy to X-Ray one, could feel heavyweight. In this case, writing directly to X-Ray, or translating zipkin to AWS via a kinesis lambda could make sense.

codefromthecrypt · 2017-10-02T02:30:21Z

cc @llinder @devinsba @openzipkin/core

yurishkuro · 2017-10-02T04:14:20Z

So these are the interop models for the out-of-band data?

[zipkin SDK] -> [X-Ray backend] directly
[zipkin SDK] -> [zipkin collector] -> [X-Ray backend]

codefromthecrypt · 2017-10-02T04:53:14Z

So these are the interop models for the out-of-band data? - [zipkin SDK] -> [X-Ray backend] directly - [zipkin SDK] -> [zipkin collector] -> [X-Ray backend] TL;DR; correct :)

nit: the component relevant to out-of-band in zipkin lingo is "reporter". We don't define the term SDK. http://zipkin.io/pages/architecture.html So, something already recording data in a zipkin format sends directly to X-Ray or via a "collector". A collector can be any component on a transport (streaming or otherwise, zipkin-server or otherwise) that takes standard zipkin encoding and pushes it to X-Ray. I'm intentionally being abstract as people have different ways they accept data. The first (direct) technique allows folks for example who are running lambdas and want to pay by the trace a way to proceed without any infrastructure. They just align permissions and go. For example, one can write a lambda and run it inside or outside AWS (reporting to zipkin when outside)

codefromthecrypt · 2017-10-02T06:15:49Z

random note: http://docs.aws.amazon.com/xray/latest/api/API_PutTraceSegments.html The POST format is a list of escaped json, as the doc implies.

Ex.

{"TraceSegmentDocuments": [
"{\"id\": \"0b89f1dec76af795\", ..."
]}

This customizes the trace ID generator to make the high bits convertable to Amazon X-Ray trace ID format v1. See openzipkin/zipkin#1754

codefromthecrypt · 2017-10-03T12:58:14Z

Added an example implementation of trace ID with a time component. Unsurprisingly, it is slower than a fully random ID. However, the scale is still sub microsecond (on my laptop™), and only affects the root span: openzipkin/brave#509

Next step is to add a converter which proves the concept.

codefromthecrypt · 2017-10-04T04:55:10Z

experimental work starting in Brave here openzipkin/brave#510

codefromthecrypt · 2017-10-05T02:31:21Z

Thanks to @jcarres-mdsol for making new trace ID provisioning instructions a bit simpler:

|---- 32 bits for epoc seconds --- | ----- 96 bits for random number --- |
it can potentially be implemented by:
High 64:
|---- 32 bits for epoc seconds --- | ----- 32 bits for random number --- |
Low 64:
| ----- 64 bits for random number --- |

Optional cheap sanity check the high 32 bits are epoch seconds
58000000 = 1476395008 = 2016-10-13 < prior to X-Ray and zipkin supporting 128-bit trace IDs
60000000 = 1610612736 = 2021-01-14 < of course you can even more future proof

with this pull request we have rewritten the whole Sleuth internals to use Brave. That way we can leverage all the functionalities & instrumentations that Brave already has (https://github.com/openzipkin/brave/tree/master/instrumentation). Migration guide is available here: https://github.com/spring-cloud/spring-cloud-sleuth/wiki/Spring-Cloud-Sleuth-2.0-Migration-Guide fixes #711 - Brave instrumentation fixes #92 - we move to Brave's Sampler fixes #143 - Brave is capable of passing context fixes #255 - we've moved away from Zipkin Stream server fixes #305 - Brave has GRPC instrumentation (https://github.com/openzipkin/brave/tree/master/instrumentation/grpc) fixes #459 - Brave (openzipkin/brave#510) & Zipkin (openzipkin/zipkin#1754) will deal with the AWS XRay instrumentation fixes #577 - Messaging instrumentation has been rewritten

devinsba · 2019-05-09T14:50:23Z

Closing as I believe we have addressed this from zipkin-aws

msmsimondean · 2019-05-24T10:40:23Z

@devinsba the issue is about providing interoperability with AWS X-Ray. Looking at zipkin-aws, that doesn't seem to provide that interopability; from what I can tell, it interops with other AWS services (SQS, Kinesis and Elasticsearch Service) but not AWS X-Ray. I can see some code in zipkin-aws that mentions X-Ray (e.g. reporter-xray-udp and storage-xray-udp). Does zipkin-aws provide some X-Ray integration that isn't documented in the codebase's README? Thanks in advance!

devinsba · 2019-05-24T13:02:01Z

We have support for the XRay propagation format and sending traces/spans to XRay, is there some kind of integration that you are looking for that isn't either of these? Also any feature requests for more integration should be handled in the zipkin-aws repo

msmsimondean · 2019-05-24T13:22:19Z

@devinsba that's great. Is there any documentation available for setting it up or is that still to come? I'm just thinking of the good documentation at https://cloud.google.com/trace/docs/zipkin and https://github.com/openzipkin/zipkin-gcp for the equivalent Google Stackdriver Trace integration. Thanks

codefromthecrypt · 2019-05-28T13:04:35Z

what's in the repo are in the README files at the moment. Our penalty for lack of docs is answering 1-1 :) Mind doing q/a on gitter as we don't use issues for Q/A https://gitter.im/openzipkin/zipkin

…

On Fri, May 24, 2019 at 3:22 PM msmsimondean ***@***.***> wrote: @devinsba <https://github.com/devinsba> that's great. Is there any documentation available for setting it up or is that still to come? Thanks — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1754>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAPVV6E5S5HPRXMJXTXDG3PW7TY3ANCNFSM4D5IRF4A> .

codefromthecrypt mentioned this issue Oct 2, 2017

RFC: Transition to 128bit trace ids (still 64bit span ids) #1262

Closed

codefromthecrypt pushed a commit to openzipkin/brave that referenced this issue Oct 3, 2017

Makes traceIdHigh convertable to AWS X-Ray

ebde177

This customizes the trace ID generator to make the high bits convertable to Amazon X-Ray trace ID format v1. See openzipkin/zipkin#1754

codefromthecrypt mentioned this issue Oct 3, 2017

Makes traceIdHigh convertable to AWS X-Ray openzipkin/brave#509

Merged

codefromthecrypt pushed a commit to openzipkin/brave that referenced this issue Oct 3, 2017

Makes traceIdHigh convertable to AWS X-Ray

89f93f4

This customizes the trace ID generator to make the high bits convertable to Amazon X-Ray trace ID format v1. See openzipkin/zipkin#1754

codefromthecrypt mentioned this issue Oct 4, 2017

Enable propagation of 128-bit TraceID twitter/finagle#651

Closed

codefromthecrypt mentioned this issue Oct 17, 2017

Track support of 128 bit X-B3-TraceId openzipkin/b3-propagation#6

Closed

marcingrzejszczak mentioned this issue Jan 19, 2018

Sleuth now uses Brave spring-cloud/spring-cloud-sleuth#829

Merged

ewhauser mentioned this issue Apr 15, 2019

Change algorithm for span ID generation census-instrumentation/opencensus-python#619

Merged

devinsba closed this as completed May 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amazon X-Ray interop #1754

Amazon X-Ray interop #1754

codefromthecrypt commented Oct 2, 2017 •

edited

Loading

codefromthecrypt commented Oct 2, 2017

yurishkuro commented Oct 2, 2017

codefromthecrypt commented Oct 2, 2017 via email

codefromthecrypt commented Oct 2, 2017

codefromthecrypt commented Oct 3, 2017

codefromthecrypt commented Oct 4, 2017

codefromthecrypt commented Oct 5, 2017

devinsba commented May 9, 2019

msmsimondean commented May 24, 2019

devinsba commented May 24, 2019

msmsimondean commented May 24, 2019 •

edited

Loading

codefromthecrypt commented May 28, 2019 via email

Amazon X-Ray interop #1754

Amazon X-Ray interop #1754

Comments

codefromthecrypt commented Oct 2, 2017 • edited Loading

codefromthecrypt commented Oct 2, 2017

yurishkuro commented Oct 2, 2017

codefromthecrypt commented Oct 2, 2017 via email

codefromthecrypt commented Oct 2, 2017

codefromthecrypt commented Oct 3, 2017

codefromthecrypt commented Oct 4, 2017

codefromthecrypt commented Oct 5, 2017

devinsba commented May 9, 2019

msmsimondean commented May 24, 2019

devinsba commented May 24, 2019

msmsimondean commented May 24, 2019 • edited Loading

codefromthecrypt commented May 28, 2019 via email

codefromthecrypt commented Oct 2, 2017 •

edited

Loading

msmsimondean commented May 24, 2019 •

edited

Loading