-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Context Propagation for OTel trace mode #6785
Support for Context Propagation for OTel trace mode #6785
Conversation
span_name = span[0]["name"] | ||
span_names.append(span_name) | ||
span_events = span[0]["events"] | ||
event_names_only = rex_name_field.findall(str(span_events)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share sample event format that's getting parsed? Just curious what's being parsed by the regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or:
>>> traces = []
>>> with open(filename) as f:
... for json_obj in f:
... entry = json.loads(json_obj)
... traces.append(entry)
...
>>> rex_name_field = re.compile("(?<='name': ')[a-zA-Z0-9_\- ]+(?=')")
>>>
>>> parsed_spans = [
... entry["scopeSpans"][0]["spans"] for entry in traces[0]["resourceSpans"]
... ]
>>>
>>> parsed_spans
[[{'traceId': 'a26527be267cd796db6ca11a9da80d02', 'spanId': '328c3e08db9fb885', 'parentSpanId': 'dba197f625f0d47f', 'name': 'compute', 'kind': 2, 'startTimeUnixNano': '1705013284351565335', 'endTimeUnixNano': '1705013284352786278', 'events': [{'timeUnixNano': '1705013284351565335', 'name': 'COMPUTE_START'}, {'timeUnixNano': '1705013284352280806', 'name': 'COMPUTE_INPUT_END'}, {'timeUnixNano': '1705013284352734157', 'name': 'COMPUTE_OUTPUT_START'}, {'timeUnixNano': '1705013284352786278', 'name': 'COMPUTE_END'}], 'status': {}}], [{'traceId': 'a26527be267cd796db6ca11a9da80d02', 'spanId': 'dba197f625f0d47f', 'parentSpanId': 'f85c6361d1187a56', 'name': 'simple', 'kind': 2, 'startTimeUnixNano': '1705013284351347347', 'endTimeUnixNano': '1705013284355884227', 'attributes': [{'key': 'triton.model_name', 'value': {'stringValue': 'simple'}}, {'key': 'triton.model_version', 'value': {'intValue': '1'}}, {'key': 'triton.trace_id', 'value': {'intValue': '1'}}, {'key': 'triton.trace_parent_id', 'value': {'intValue': '0'}}], 'events': [{'timeUnixNano': '1705013284351347347', 'name': 'REQUEST_START'}, {'timeUnixNano': '1705013284351378305', 'name': 'QUEUE_START'}, {'timeUnixNano': '1705013284355884227', 'name': 'REQUEST_END'}], 'status': {}}], [{'traceId': 'a26527be267cd796db6ca11a9da80d02', 'spanId': 'f85c6361d1187a56', 'parentSpanId': '', 'name': 'InferRequest', 'kind': 2, 'startTimeUnixNano': '1705013284351158969', 'endTimeUnixNano': '1705013284357092351', 'events': [{'timeUnixNano': '1705013284351081189', 'name': 'HTTP_RECV_START'}, {'timeUnixNano': '1705013284351104757', 'name': 'HTTP_RECV_END'}, {'timeUnixNano': '1705013284352828926', 'name': 'INFER_RESPONSE_COMPLETE'}, {'timeUnixNano': '1705013284352956556', 'name': 'HTTP_SEND_START'}, {'timeUnixNano': '1705013284352983290', 'name': 'HTTP_SEND_END'}], 'status': {}}]]
>>> span_names = []
>>> for span in parsed_spans:
... span_name = span[0]["name"]
... span_names.append(span_name)
... span_events = span[0]["events"]
... event_names_only = rex_name_field.findall(str(span_events))
... print(event_names_only)
...
['COMPUTE_START', 'COMPUTE_INPUT_END', 'COMPUTE_OUTPUT_START', 'COMPUTE_END']
['REQUEST_START', 'QUEUE_START', 'REQUEST_END']
['HTTP_RECV_START', 'HTTP_RECV_END', 'INFER_RESPONSE_COMPLETE', 'HTTP_SEND_START', 'HTTP_SEND_END']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks! So is the regex needed? Or can it just be parsed as json/dict? Looks like it's well structured, unless I'm misunderstanding. Something like:
for span in parsed_spans:
...
event_names_only = [event["name"] for event in span_events]
print(event_names_only)
I only suggest because I find the regex hard to parse/understand at a glance. I'd have to sit and think about what it's doing as a reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it is easier to pass span[0]["events"]
as a string and do regex.
span[0]["events"]
is actually a list of dictionaries (on image it is seen better), that is why for this test I didn't want to parse span[0]["events"]
list to collect all names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, bad on my part. Will remove regex
def test_http_trace_simple_model(self): | ||
""" | ||
Tests trace, collected from executing one inference request | ||
for a `simple` model and HTTP client. | ||
""" | ||
triton_client_http = httpclient.InferenceServerClient( | ||
"localhost:8000", verbose=True | ||
) | ||
inputs = prepare_data(httpclient) | ||
triton_client_http.infer(self.simple_model_name, inputs) | ||
|
||
self._test_simple_trace() | ||
|
||
def test_http_trace_simple_model_context_propagation(self): | ||
""" | ||
Tests trace, collected from executing one inference request | ||
for a `simple` model, HTTP client and context propagation, | ||
i.e. client specifies OTel headers, defined in `self.client_headers`. | ||
""" | ||
triton_client_http = httpclient.InferenceServerClient( | ||
"localhost:8000", verbose=True | ||
) | ||
inputs = prepare_data(httpclient) | ||
triton_client_http.infer( | ||
self.simple_model_name, inputs, headers=self.client_headers | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is totally optional "follow-up PR" material if you like the idea, but I think a lot of these functions for:
- http/grpc
- with/without context propogation
- simple/ensemble/bls
Could be condensed a lot with something like unittest.subtest
or parameterized
:
def test_trace(self):
clients = [httpclient, grpcclient]
header_list = [None, self._client_headers]
models = self.test_models # [self.simple_model_name, self.ensemble, self.bls ...]
permutations = itertools.product(clients, header_list, models)
for p in permutations:
client_type, headers, model = p[0], p[1], p[2]
with self.subtest(permutation=p):
client = client_type.InferenceServerClient(...) # can probably be smart and include url somewhere
inputs = prepare_data(client_type)
client.infer(model, inputs, headers)
self._test_trace(model, headers=headers)
Doesn't have to be exactly like this, but seems like there's room for cleanup or re-using logic
src/grpc/infer_handler.cc
Outdated
std::move(trace_manager_->SampleTrace(request.model_name())); | ||
TraceStartOptions start_options; | ||
InferenceTraceMode mode = TRACE_MODE_TRITON; | ||
trace_manager_->GetTraceMode(request.model_name(), &mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this adds a lock to the hot path. Will DM you to discuss further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, I can do the following re-factoring:
std::shared_ptr<TraceSetting> trace_setting
can become a part of TraceStartOptions
.
And then when we initialize start_options
on line 919, I can get trace_setting
for current model through the steps. This way, lock is happenning only once (as it is now) and then we just pass start options to TraceManager::SampleTrace
, where we move on to this line since trace_setting
is already obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GuanLuo I would like to have your opinion on this as well if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic and tests LGTM - just want to discuss this one comment: https://github.com/triton-inference-server/server/pull/6785/files#r1449559949.
Most nits/cleanups/helper function comments can be a follow-up PR if it will impact code freeze and isn't changing any logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, leave comment for clarification
src/http_server.cc
Outdated
auto prop = | ||
otel_cntxt::propagation::GlobalTextMapPropagator::GetGlobalPropagator(); | ||
auto ctxt = opentelemetry::context::Context(); | ||
start_options.propagated_context = prop->Extract(carrier, ctxt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
propagated_context
will be set to an valid default if there is no context from client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Here we use an instance of class HttpTraceContext
as a global propagator. We set it here
Extract
method will use ExtractImpl
, meaning our Get
, e.g. HTTP Get, to extract traceparent
and tracestate
info from headers/metadata. In none is found, it will return invalid SpanContext
and this line will create invalid span. So in the returned context there will be invalid span
stored.
I believe the reason, why everything is working when no context is passed, is because by default StartSpanOptions.parent
is initialized to invalid context here,
So essentially, OpenTelemetry handles invalid spans as parents on their end to start a parent-less span.
However, I will add changes to handle this behavior better.
src/http_server.cc
Outdated
start_options.propagated_context = prop->Extract(carrier, ctxt); | ||
otel_trace_api::SpanContext span_context = | ||
otel_trace_api::GetSpan(start_options.propagated_context)->GetContext(); | ||
if (span_context.IsValid()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this always be evaluated to true
in the case of no client context? And causes force sampling as long as the trace is in OpenTelemetry mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've touched this in the above comment. When no context is propagated, and invalid span will be stored in start_options.propagated_context
, so span_context.IsValid()
will return false.
This behavior is tested in test_http_trace_triggered
, where I set rate=5
and send 5 requests without otel
headers -> only one is traced, then send 5 requests with otel
headers and check that all 5 are traced
Co-authored-by: Ryan McCormick <[email protected]>
867d3bf
to
aae29ca
Compare
auto m_it = model_settings_.find(model_name); | ||
trace_setting = | ||
(m_it == model_settings_.end()) ? global_setting_ : m_it->second; | ||
std::lock_guard<std::mutex> r_lk(r_mu_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to use more descriptive variable names. It is hard to know what r_mu_
means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I am re-using already existing locks. I believe next step (since we've moved to C++17) not only to rename them, but also change to dedicated read locks (std:shared_lock
I believe)
@@ -3055,11 +3055,13 @@ HTTPAPIServer::StartTrace( | |||
TRITONSERVER_InferenceTrace** triton_trace) | |||
{ | |||
#ifdef TRITON_ENABLE_TRACING | |||
HttpTextMapCarrier carrier(req->headers_in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these carriers just record all the headers associated with gRPC/HTTP requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is specific to HTTP
requests. Carries
are initialized with headers/metadata, and then OTel propagator will use them to extract relevant headers.
In this comment I provide a deep dive of what is going on:
#6785 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In HttpTextMapCarrier
we've also implemented Get
method, so that OTel knows how to extract headers
} | ||
|
||
TraceManager::TraceStartOptions | ||
TraceManager::GetTraceStartOptions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a future refactor it might be worth having separate classes for TritonTrace
and OTelTrace
. This could help with not having to propagate #ifdefs
for WiNDOWS and checking for open telemetry in every tracing function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree and similar idea was added to ticket: https://jirasw.nvidia.com/browse/DLIS-4803
@@ -507,9 +507,9 @@ if(${TRITON_ENABLE_TRACING}) | |||
tracer.cc tracer.h | |||
) | |||
|
|||
if (NOT WIN32) | |||
target_compile_features(tracing-library PRIVATE cxx_std_${TRITON_MIN_CXX_STANDARD}) | |||
target_compile_features(tracing-library PRIVATE cxx_std_${TRITON_MIN_CXX_STANDARD}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that this line was previously ignored by Windows, so in this PR as well
src/tracer.cc
Outdated
// [FIXME: DLIS-6033] | ||
// A current WAR for initiating trace based on propagated context only | ||
// Currently this is implemented through setting trace rate = 0 | ||
if (!force_sample && rate_ != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I'm understanding correctly, we will always trace when a context is propagated (client header provided -> force_sample=True) - but these propogation-initiated traces will not affect the counts/logic used when deciding to do normal traces (ex: every 1000th request) ?
So for example with default rate == 1000
, before:
- request 0 and request 1000 will be traced.
But now. if I added client headers to some intermediate request say request 7, then instead:
- request 0, request 7 (context propagation), and request 1001 would get traced? is that correct?
If so, will this have any impact or confusing behavior on related tools like trace_summary tool or others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also out of curiosity, can rate_ == 0
here? I didn't look through all the code, but I noticed it would probably hit a division by zero / floating point exception when running:
create_trace = (((++sample_) % rate_) == 0);
without this check you added, so I wanted to try it out:
tritonserver --model-store /mnt/triton/models --trace-rate 0 --trace-level TIMESTAMPS
and I sent some inferences, but didn't get any errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is also what I found. I think I talk about it in the description of this PR. % by 0
was definitely happening before this PR, and I have no idea why it wasn't segfaulting. With this PR, if rate_==0
, we are skipping this block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding earlier comment: Good catch. We've discussed with Guan to not update counters, when tracing is happening "by force". Let me correct this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to keep updating counters as is (if rate_
> 0) for now. I've started checking all parameters and it seems like e.g created_
is not used, so this area would benefit from a clean up.
I made it a scope of this ticket: DLIS-6033
Clean up is still planned, counters' handling is coming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. cleanup of traceStartOptions helpers into trace manager looks great!
Added support for OTel context propagation --------- Co-authored-by: Markus Hennerbichler <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>
Added support for OTel context propagation --------- Co-authored-by: Markus Hennerbichler <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>
Resolves issue: #5853
I've picked up @HennerM work, he has CLA: #5853 (comment)
This PR adds 2 classes:
HttpTextMapCarrier
andGrpcServerCarrier
, that take care of extracting OTel relevant header/metadata from an inference request. Next, when request comes to the server, OTel relevant context is extracted and stored inTraceStartOptions start_options
.TraceStartOptions
was added for the following reason. Currently we can set up trace options throughrate
andcount
. However, users may want to trace every request, for which OTel trace was initiated on client side. In this case, it is hard to properly tunerate
(I would say, even impossible). Thus, I've added logic to initiate trace every time a valid OTel context is passed and ignoringrate
andcount
.Interestingly, we are allowing to start
tritonserver
withrate=0
(checked locally). I've added a check forrate_ !=0
inTraceManager::TraceSetting::SampleTrace(bool force_sample)
and would like to keep this for now for the following reason. If users would like to trace only requests, for which trace was initiated on client side, they can start tritonserver withrate=0
. I will follow up with a separate PR, which will introduce a dedicated trace-config option, something like :--trace-config opentelemetry, context-based-tracing=ON
, which will eliminate currentrate=0
WAR, since this is slightly out of scope of this PR and current PR is already big.I've re-factored tests. Extracted small sub-routines and made open telemetry tests easy to work with and to expand.
Additionally, there were issues with ABSEIL, so I removed it from OTel's installation: triton-inference-server/third_party#52