-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add basic OT tracing for incoming requests #283
Conversation
OK, so I'm adding some error paths to this, I can see how this can be done in a fairly minimal way. But the problem I have now is that some of the tests are not getting through to the |
Implemented a wait condition on the TaskQueue to avoid the timing problems of testing this, see #284. This can also be used to deal with the flaky introduced in the responsemanager recently. |
Have added some test assertions around collecting errors in here too, all looks pretty reasonable. I did wonder whether it's worthwhile plumbing in some stats too - like block and byte count, but that's pretty deep in the call stack, deeper than the current spans. So maybe later if a user says that's going to be actually helpful. |
These traces stop at
edit: perhaps better visualised as:
Should the last one be If we want to hang this off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is how we're terminating spans. It doesn't look like it's taking into account go routines.
We should define what we expect these spans to cover, and then work to implement them properly.
I am assuming:
reqest -> call to request all the way to all responses collected?
newRequest -> request popped off queue all the way to end of request
executeTask -> length of execute task.
If I am correct, only executeTask is currently correct.
OK, good points, I'll investigate. |
40d0926
to
a6c77ee
Compare
I think this is ready for review. Rearranged how we're managing spans. There's a parent
Example spans and their data seen in the details below; note in this one that the first I also implemented the mirror of #287 but for the RequestManager since I needed the executor to block until the request was actually done. As I said over there, I think this will be a good addition to both sides to make sure our async executors have a strict parallelism and don't do funky overlaps at the edges. `TestPauseResumeResponse` spans {
"Name": "newRequest",
"SpanContext": {
"TraceID": "4ec0852efb6fb6c3cdd0a8b081bd1f65",
"SpanID": "0f81fd9be8d71479",
"TraceFlags": "01",
"TraceState": "",
"Remote": false
},
"Parent": {
"TraceID": "4ec0852efb6fb6c3cdd0a8b081bd1f65",
"SpanID": "8553f61a304a42ca",
"TraceFlags": "01",
"TraceState": "",
"Remote": false
},
"SpanKind": 1,
"StartTime": "2021-11-30T15:12:50.136492228+11:00",
"EndTime": "2021-11-30T15:12:50.136548264+11:00",
"Attributes": null,
"Events": null,
"Links": null,
"Status": {
"Code": "Unset",
"Description": ""
},
"DroppedAttributes": 0,
"DroppedEvents": 0,
"DroppedLinks": 0,
"ChildSpanCount": 0,
"Resource": [
{
"Key": "service.name",
"Value": {
"Type": "STRING",
"Value": "unknown_service:impl.test"
}
},
{
"Key": "telemetry.sdk.language",
"Value": {
"Type": "STRING",
"Value": "go"
}
},
{
"Key": "telemetry.sdk.name",
"Value": {
"Type": "STRING",
"Value": "opentelemetry"
}
},
{
"Key": "telemetry.sdk.version",
"Value": {
"Type": "STRING",
"Value": "1.2.0"
}
}
],
"InstrumentationLibrary": {
"Name": "graphsync",
"Version": "",
"SchemaURL": ""
}
},
{
"Name": "executeTask",
"SpanContext": {
"TraceID": "4ec0852efb6fb6c3cdd0a8b081bd1f65",
"SpanID": "59380d18efc2aa6a",
"TraceFlags": "01",
"TraceState": "",
"Remote": false
},
"Parent": {
"TraceID": "4ec0852efb6fb6c3cdd0a8b081bd1f65",
"SpanID": "8553f61a304a42ca",
"TraceFlags": "01",
"TraceState": "",
"Remote": false
},
"SpanKind": 1,
"StartTime": "2021-11-30T15:12:50.136588429+11:00",
"EndTime": "2021-11-30T15:12:50.139372173+11:00",
"Attributes": null,
"Events": [
{
"Name": "exception",
"Attributes": [
{
"Key": "exception.type",
"Value": {
"Type": "STRING",
"Value": "github.com/ipfs/go-graphsync/requestmanager/hooks.ErrPaused"
}
},
{
"Key": "exception.message",
"Value": {
"Type": "STRING",
"Value": "request has been paused"
}
}
],
"DroppedAttributeCount": 0,
"Time": "2021-11-30T15:12:50.139360731+11:00"
}
],
"Links": null,
"Status": {
"Code": "Unset",
"Description": ""
},
"DroppedAttributes": 0,
"DroppedEvents": 0,
"DroppedLinks": 0,
"ChildSpanCount": 0,
"Resource": [
{
"Key": "service.name",
"Value": {
"Type": "STRING",
"Value": "unknown_service:impl.test"
}
},
{
"Key": "telemetry.sdk.language",
"Value": {
"Type": "STRING",
"Value": "go"
}
},
{
"Key": "telemetry.sdk.name",
"Value": {
"Type": "STRING",
"Value": "opentelemetry"
}
},
{
"Key": "telemetry.sdk.version",
"Value": {
"Type": "STRING",
"Value": "1.2.0"
}
}
],
"InstrumentationLibrary": {
"Name": "graphsync",
"Version": "",
"SchemaURL": ""
}
},
{
"Name": "request",
"SpanContext": {
"TraceID": "4ec0852efb6fb6c3cdd0a8b081bd1f65",
"SpanID": "8553f61a304a42ca",
"TraceFlags": "01",
"TraceState": "",
"Remote": false
},
"Parent": {
"TraceID": "00000000000000000000000000000000",
"SpanID": "0000000000000000",
"TraceFlags": "00",
"TraceState": "",
"Remote": false
},
"SpanKind": 1,
"StartTime": "2021-11-30T15:12:50.136446021+11:00",
"EndTime": "2021-11-30T15:12:50.242104668+11:00",
"Attributes": [
{
"Key": "peerID",
"Value": {
"Type": "STRING",
"Value": "1WRsHVXpAnpLmF"
}
},
{
"Key": "root",
"Value": {
"Type": "STRING",
"Value": "bafyreih6oouc26gzpf7okbwixooccyxdftbe42x3pwqzjdd4ff3mm37g2u"
}
},
{
"Key": "extensions",
"Value": {
"Type": "STRINGSLICE",
"Value": [
"AppleSauce/McGee"
]
}
},
{
"Key": "requestID",
"Value": {
"Type": "INT64",
"Value": 0
}
}
],
"Events": null,
"Links": null,
"Status": {
"Code": "Unset",
"Description": ""
},
"DroppedAttributes": 0,
"DroppedEvents": 0,
"DroppedLinks": 0,
"ChildSpanCount": 4,
"Resource": [
{
"Key": "service.name",
"Value": {
"Type": "STRING",
"Value": "unknown_service:impl.test"
}
},
{
"Key": "telemetry.sdk.language",
"Value": {
"Type": "STRING",
"Value": "go"
}
},
{
"Key": "telemetry.sdk.name",
"Value": {
"Type": "STRING",
"Value": "opentelemetry"
}
},
{
"Key": "telemetry.sdk.version",
"Value": {
"Type": "STRING",
"Value": "1.2.0"
}
}
],
"InstrumentationLibrary": {
"Name": "graphsync",
"Version": "",
"SchemaURL": ""
}
},
{
"Name": "terminateRequest",
"SpanContext": {
"TraceID": "4ec0852efb6fb6c3cdd0a8b081bd1f65",
"SpanID": "07e33ccce2569478",
"TraceFlags": "01",
"TraceState": "",
"Remote": false
},
"Parent": {
"TraceID": "4ec0852efb6fb6c3cdd0a8b081bd1f65",
"SpanID": "8553f61a304a42ca",
"TraceFlags": "01",
"TraceState": "",
"Remote": false
},
"SpanKind": 1,
"StartTime": "2021-11-30T15:12:50.242062248+11:00",
"EndTime": "2021-11-30T15:12:50.242109046+11:00",
"Attributes": null,
"Events": null,
"Links": null,
"Status": {
"Code": "Unset",
"Description": ""
},
"DroppedAttributes": 0,
"DroppedEvents": 0,
"DroppedLinks": 0,
"ChildSpanCount": 0,
"Resource": [
{
"Key": "service.name",
"Value": {
"Type": "STRING",
"Value": "unknown_service:impl.test"
}
},
{
"Key": "telemetry.sdk.language",
"Value": {
"Type": "STRING",
"Value": "go"
}
},
{
"Key": "telemetry.sdk.name",
"Value": {
"Type": "STRING",
"Value": "opentelemetry"
}
},
{
"Key": "telemetry.sdk.version",
"Value": {
"Type": "STRING",
"Value": "1.2.0"
}
}
],
"InstrumentationLibrary": {
"Name": "graphsync",
"Version": "",
"SchemaURL": ""
}
},
{
"Name": "executeTask",
"SpanContext": {
"TraceID": "4ec0852efb6fb6c3cdd0a8b081bd1f65",
"SpanID": "6d6240e1091eaf9a",
"TraceFlags": "01",
"TraceState": "",
"Remote": false
},
"Parent": {
"TraceID": "4ec0852efb6fb6c3cdd0a8b081bd1f65",
"SpanID": "8553f61a304a42ca",
"TraceFlags": "01",
"TraceState": "",
"Remote": false
},
"SpanKind": 1,
"StartTime": "2021-11-30T15:12:50.240871254+11:00",
"EndTime": "2021-11-30T15:12:50.242113534+11:00",
"Attributes": null,
"Events": null,
"Links": null,
"Status": {
"Code": "Unset",
"Description": ""
},
"DroppedAttributes": 0,
"DroppedEvents": 0,
"DroppedLinks": 0,
"ChildSpanCount": 0,
"Resource": [
{
"Key": "service.name",
"Value": {
"Type": "STRING",
"Value": "unknown_service:impl.test"
}
},
{
"Key": "telemetry.sdk.language",
"Value": {
"Type": "STRING",
"Value": "go"
}
},
{
"Key": "telemetry.sdk.name",
"Value": {
"Type": "STRING",
"Value": "opentelemetry"
}
},
{
"Key": "telemetry.sdk.version",
"Value": {
"Type": "STRING",
"Value": "1.2.0"
}
}
],
"InstrumentationLibrary": {
"Name": "graphsync",
"Version": "",
"SchemaURL": ""
}
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Looking good!
* update to context datastores * Update go-ds-versioning, go mod tidy Co-authored-by: Aayush Rajasekaran <[email protected]>
Closes: #271
Implements some very basic opentelemetry tracing for incoming requests.
Request()
call, we addpeerID
,root
andextensions
to the attributesnewRequest()
, we addrequestID
to attributesexecuteTask()
, we don't add any more attributesSpan
around and joining it with the local contextrequest->newRequest->executeTask
). Some tests don't reachexecuteTask
, and in one test we assert that we got the attributes we expect on each of the spans.I haven't added any errors to the traces yet, there's just so many ways a request can go bad and I'm not sure we want to add all of those? We can
RecordError
for any of these errors, and where the error is fatal (pretty much always) weSetStatus(error)
for the span. That might make for interesting tracing output, but a lot of extra code I suspect. Thoughts on that welcome.Below is the trace data output as JSON for the 3 spans so far.
Trace data as JSON