Support for Context Propagation for OTel trace mode #6785

oandreeva-nv · 2024-01-11T19:20:54Z

Resolves issue: #5853
I've picked up @HennerM work, he has CLA: #5853 (comment)

This PR adds 2 classes: HttpTextMapCarrier and GrpcServerCarrier, that take care of extracting OTel relevant header/metadata from an inference request. Next, when request comes to the server, OTel relevant context is extracted and stored in TraceStartOptions start_options.

TraceStartOptions was added for the following reason. Currently we can set up trace options through rate and count. However, users may want to trace every request, for which OTel trace was initiated on client side. In this case, it is hard to properly tune rate (I would say, even impossible). Thus, I've added logic to initiate trace every time a valid OTel context is passed and ignoring rate and count.

Interestingly, we are allowing to start tritonserver with rate=0 (checked locally). I've added a check for rate_ !=0 in TraceManager::TraceSetting::SampleTrace(bool force_sample) and would like to keep this for now for the following reason. If users would like to trace only requests, for which trace was initiated on client side, they can start tritonserver with rate=0. I will follow up with a separate PR, which will introduce a dedicated trace-config option, something like : --trace-config opentelemetry, context-based-tracing=ON, which will eliminate current rate=0 WAR, since this is slightly out of scope of this PR and current PR is already big.

I've re-factored tests. Extracted small sub-routines and made open telemetry tests easy to work with and to expand.

Additionally, there were issues with ABSEIL, so I removed it from OTel's installation: triton-inference-server/third_party#52

qa/L0_trace/opentelemetry_unittest.py

rmccorm4 · 2024-01-11T22:34:31Z

qa/L0_trace/opentelemetry_unittest.py

+            span_name = span[0]["name"]
+            span_names.append(span_name)
+            span_events = span[0]["events"]
+            event_names_only = rex_name_field.findall(str(span_events))


Can you share sample event format that's getting parsed? Just curious what's being parsed by the regex.

or:

>>> traces = [] >>> with open(filename) as f: ... for json_obj in f: ... entry = json.loads(json_obj) ... traces.append(entry) ... >>> rex_name_field = re.compile("(?<='name': ')[a-zA-Z0-9_\- ]+(?=')") >>> >>> parsed_spans = [ ... entry["scopeSpans"][0]["spans"] for entry in traces[0]["resourceSpans"] ... ] >>> >>> parsed_spans [[{'traceId': 'a26527be267cd796db6ca11a9da80d02', 'spanId': '328c3e08db9fb885', 'parentSpanId': 'dba197f625f0d47f', 'name': 'compute', 'kind': 2, 'startTimeUnixNano': '1705013284351565335', 'endTimeUnixNano': '1705013284352786278', 'events': [{'timeUnixNano': '1705013284351565335', 'name': 'COMPUTE_START'}, {'timeUnixNano': '1705013284352280806', 'name': 'COMPUTE_INPUT_END'}, {'timeUnixNano': '1705013284352734157', 'name': 'COMPUTE_OUTPUT_START'}, {'timeUnixNano': '1705013284352786278', 'name': 'COMPUTE_END'}], 'status': {}}], [{'traceId': 'a26527be267cd796db6ca11a9da80d02', 'spanId': 'dba197f625f0d47f', 'parentSpanId': 'f85c6361d1187a56', 'name': 'simple', 'kind': 2, 'startTimeUnixNano': '1705013284351347347', 'endTimeUnixNano': '1705013284355884227', 'attributes': [{'key': 'triton.model_name', 'value': {'stringValue': 'simple'}}, {'key': 'triton.model_version', 'value': {'intValue': '1'}}, {'key': 'triton.trace_id', 'value': {'intValue': '1'}}, {'key': 'triton.trace_parent_id', 'value': {'intValue': '0'}}], 'events': [{'timeUnixNano': '1705013284351347347', 'name': 'REQUEST_START'}, {'timeUnixNano': '1705013284351378305', 'name': 'QUEUE_START'}, {'timeUnixNano': '1705013284355884227', 'name': 'REQUEST_END'}], 'status': {}}], [{'traceId': 'a26527be267cd796db6ca11a9da80d02', 'spanId': 'f85c6361d1187a56', 'parentSpanId': '', 'name': 'InferRequest', 'kind': 2, 'startTimeUnixNano': '1705013284351158969', 'endTimeUnixNano': '1705013284357092351', 'events': [{'timeUnixNano': '1705013284351081189', 'name': 'HTTP_RECV_START'}, {'timeUnixNano': '1705013284351104757', 'name': 'HTTP_RECV_END'}, {'timeUnixNano': '1705013284352828926', 'name': 'INFER_RESPONSE_COMPLETE'}, {'timeUnixNano': '1705013284352956556', 'name': 'HTTP_SEND_START'}, {'timeUnixNano': '1705013284352983290', 'name': 'HTTP_SEND_END'}], 'status': {}}]] >>> span_names = [] >>> for span in parsed_spans: ... span_name = span[0]["name"] ... span_names.append(span_name) ... span_events = span[0]["events"] ... event_names_only = rex_name_field.findall(str(span_events)) ... print(event_names_only) ... ['COMPUTE_START', 'COMPUTE_INPUT_END', 'COMPUTE_OUTPUT_START', 'COMPUTE_END'] ['REQUEST_START', 'QUEUE_START', 'REQUEST_END'] ['HTTP_RECV_START', 'HTTP_RECV_END', 'INFER_RESPONSE_COMPLETE', 'HTTP_SEND_START', 'HTTP_SEND_END']

I see, thanks! So is the regex needed? Or can it just be parsed as json/dict? Looks like it's well structured, unless I'm misunderstanding. Something like:

for span in parsed_spans: ... event_names_only = [event["name"] for event in span_events] print(event_names_only)

I only suggest because I find the regex hard to parse/understand at a glance. I'd have to sit and think about what it's doing as a reader.

In this case it is easier to pass span[0]["events"] as a string and do regex.

span[0]["events"] is actually a list of dictionaries (on image it is seen better), that is why for this test I didn't want to parse span[0]["events"] list to collect all names.

Isn't it same?

nice, bad on my part. Will remove regex

qa/L0_trace/opentelemetry_unittest.py

rmccorm4 · 2024-01-11T22:55:44Z

qa/L0_trace/opentelemetry_unittest.py

+    def test_http_trace_simple_model(self):
+        """
+        Tests trace, collected from executing one inference request
+        for a `simple` model and HTTP client.
+        """
+        triton_client_http = httpclient.InferenceServerClient(
+            "localhost:8000", verbose=True
+        )
+        inputs = prepare_data(httpclient)
+        triton_client_http.infer(self.simple_model_name, inputs)
+
+        self._test_simple_trace()
+
+    def test_http_trace_simple_model_context_propagation(self):
+        """
+        Tests trace, collected from executing one inference request
+        for a `simple` model, HTTP client and context propagation,
+        i.e. client specifies OTel headers, defined in `self.client_headers`.
+        """
+        triton_client_http = httpclient.InferenceServerClient(
+            "localhost:8000", verbose=True
+        )
+        inputs = prepare_data(httpclient)
+        triton_client_http.infer(
+            self.simple_model_name, inputs, headers=self.client_headers
+        )


This is totally optional "follow-up PR" material if you like the idea, but I think a lot of these functions for:

http/grpc

with/without context propogation

simple/ensemble/bls

Could be condensed a lot with something like unittest.subtest or parameterized:

def test_trace(self): clients = [httpclient, grpcclient] header_list = [None, self._client_headers] models = self.test_models # [self.simple_model_name, self.ensemble, self.bls ...] permutations = itertools.product(clients, header_list, models) for p in permutations: client_type, headers, model = p[0], p[1], p[2] with self.subtest(permutation=p): client = client_type.InferenceServerClient(...) # can probably be smart and include url somewhere inputs = prepare_data(client_type) client.infer(model, inputs, headers) self._test_trace(model, headers=headers)

Doesn't have to be exactly like this, but seems like there's room for cleanup or re-using logic

qa/L0_trace/trace-config.yaml

qa/L0_trace/test.sh

src/grpc/infer_handler.cc

src/grpc/stream_infer_handler.cc

src/http_server.cc

src/tracer.h

src/tracer.cc

rmccorm4 · 2024-01-12T00:04:19Z

src/grpc/infer_handler.cc

-        std::move(trace_manager_->SampleTrace(request.model_name()));
+    TraceStartOptions start_options;
+    InferenceTraceMode mode = TRACE_MODE_TRITON;
+    trace_manager_->GetTraceMode(request.model_name(), &mode);


Looks like this adds a lock to the hot path. Will DM you to discuss further

Technically, I can do the following re-factoring:
std::shared_ptr<TraceSetting> trace_setting can become a part of TraceStartOptions.

And then when we initialize start_options on line 919, I can get trace_setting for current model through the steps. This way, lock is happenning only once (as it is now) and then we just pass start options to TraceManager::SampleTrace , where we move on to this line since trace_setting is already obtained

@GuanLuo I would like to have your opinion on this as well if possible

discussed offline

rmccorm4

Logic and tests LGTM - just want to discuss this one comment: https://github.com/triton-inference-server/server/pull/6785/files#r1449559949.

Most nits/cleanups/helper function comments can be a follow-up PR if it will impact code freeze and isn't changing any logic.

GuanLuo

Overall looks good, leave comment for clarification

GuanLuo · 2024-01-12T00:41:23Z

src/http_server.cc

+    auto prop =
+        otel_cntxt::propagation::GlobalTextMapPropagator::GetGlobalPropagator();
+    auto ctxt = opentelemetry::context::Context();
+    start_options.propagated_context = prop->Extract(carrier, ctxt);


propagated_context will be set to an valid default if there is no context from client?

Good question. Here we use an instance of class HttpTraceContext as a global propagator. We set it here

Extract method will use ExtractImpl, meaning our Get, e.g. HTTP Get, to extract traceparent and tracestate info from headers/metadata. In none is found, it will return invalid SpanContext and this line will create invalid span. So in the returned context there will be invalid span stored.

I believe the reason, why everything is working when no context is passed, is because by default StartSpanOptions.parent is initialized to invalid context here,
So essentially, OpenTelemetry handles invalid spans as parents on their end to start a parent-less span.

However, I will add changes to handle this behavior better.

GuanLuo · 2024-01-12T00:42:28Z

src/http_server.cc

+    start_options.propagated_context = prop->Extract(carrier, ctxt);
+    otel_trace_api::SpanContext span_context =
+        otel_trace_api::GetSpan(start_options.propagated_context)->GetContext();
+    if (span_context.IsValid()) {


Will this always be evaluated to true in the case of no client context? And causes force sampling as long as the trace is in OpenTelemetry mode?

I've touched this in the above comment. When no context is propagated, and invalid span will be stored in start_options.propagated_context, so span_context.IsValid() will return false.

This behavior is tested in test_http_trace_triggered, where I set rate=5 and send 5 requests without otel headers -> only one is traced, then send 5 requests with otel headers and check that all 5 are traced

Co-authored-by: Ryan McCormick <[email protected]>

… code

Tabrizian · 2024-01-12T16:47:56Z

src/tracer.cc

-    auto m_it = model_settings_.find(model_name);
-    trace_setting =
-        (m_it == model_settings_.end()) ? global_setting_ : m_it->second;
+  std::lock_guard<std::mutex> r_lk(r_mu_);


would be good to use more descriptive variable names. It is hard to know what r_mu_ means.

Currently, I am re-using already existing locks. I believe next step (since we've moved to C++17) not only to rename them, but also change to dedicated read locks (std:shared_lock I believe)

Tabrizian · 2024-01-12T17:45:12Z

src/http_server.cc

@@ -3055,11 +3055,13 @@ HTTPAPIServer::StartTrace(
    TRITONSERVER_InferenceTrace** triton_trace)
 {
 #ifdef TRITON_ENABLE_TRACING
+  HttpTextMapCarrier carrier(req->headers_in);


do these carriers just record all the headers associated with gRPC/HTTP requests?

This one is specific to HTTP requests. Carries are initialized with headers/metadata, and then OTel propagator will use them to extract relevant headers.
In this comment I provide a deep dive of what is going on:
#6785 (comment)

In HttpTextMapCarrier we've also implemented Get method, so that OTel knows how to extract headers

Tabrizian · 2024-01-12T17:47:51Z

src/tracer.cc

+}
+
+TraceManager::TraceStartOptions
+TraceManager::GetTraceStartOptions(


As a future refactor it might be worth having separate classes for TritonTrace and OTelTrace. This could help with not having to propagate #ifdefs for WiNDOWS and checking for open telemetry in every tracing function.

Yes, I agree and similar idea was added to ticket: https://jirasw.nvidia.com/browse/DLIS-4803

qa/L0_trace/test.sh

qa/L0_trace/opentelemetry_unittest.py

oandreeva-nv · 2024-01-12T23:20:41Z

src/CMakeLists.txt

@@ -507,9 +507,9 @@ if(${TRITON_ENABLE_TRACING})
    tracer.cc tracer.h
  )

-  if (NOT WIN32)
-    target_compile_features(tracing-library PRIVATE cxx_std_${TRITON_MIN_CXX_STANDARD})
+  target_compile_features(tracing-library PRIVATE cxx_std_${TRITON_MIN_CXX_STANDARD})


I've noticed that this line was previously ignored by Windows, so in this PR as well

rmccorm4 · 2024-01-12T23:45:40Z

src/tracer.cc

+  // [FIXME: DLIS-6033]
+  // A current WAR for initiating trace based on propagated context only
+  // Currently this is implemented through setting trace rate = 0
+  if (!force_sample && rate_ != 0) {


So if I'm understanding correctly, we will always trace when a context is propagated (client header provided -> force_sample=True) - but these propogation-initiated traces will not affect the counts/logic used when deciding to do normal traces (ex: every 1000th request) ?

So for example with default rate == 1000, before:

request 0 and request 1000 will be traced.

But now. if I added client headers to some intermediate request say request 7, then instead:

request 0, request 7 (context propagation), and request 1001 would get traced? is that correct?

If so, will this have any impact or confusing behavior on related tools like trace_summary tool or others?

Also out of curiosity, can rate_ == 0 here? I didn't look through all the code, but I noticed it would probably hit a division by zero / floating point exception when running:

create_trace = (((++sample_) % rate_) == 0);

without this check you added, so I wanted to try it out:

tritonserver --model-store /mnt/triton/models --trace-rate 0 --trace-level TIMESTAMPS

and I sent some inferences, but didn't get any errors.

Yes, that is also what I found. I think I talk about it in the description of this PR. % by 0 was definitely happening before this PR, and I have no idea why it wasn't segfaulting. With this PR, if rate_==0, we are skipping this block.

Regarding earlier comment: Good catch. We've discussed with Guan to not update counters, when tracing is happening "by force". Let me correct this.

I've decided to keep updating counters as is (if rate_ > 0) for now. I've started checking all parameters and it seems like e.g created_ is not used, so this area would benefit from a clean up.
I made it a scope of this ticket: DLIS-6033
Clean up is still planned, counters' handling is coming

rmccorm4

Nice work. cleanup of traceStartOptions helpers into trace manager looks great!

…ce counters

Added support for OTel context propagation --------- Co-authored-by: Markus Hennerbichler <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

oandreeva-nv requested review from rmccorm4, dyastremsky, Tabrizian, tanmayv25, GuanLuo and nnshah1 and removed request for dyastremsky January 11, 2024 19:20

github-advanced-security bot found potential problems Jan 11, 2024

View reviewed changes

qa/L0_trace/opentelemetry_unittest.py Fixed Show fixed Hide fixed

qa/L0_trace/opentelemetry_unittest.py Fixed Show fixed Hide fixed

This was referenced Jan 11, 2024

export opentelemetry spans with batch processor #6733

Closed

Updating OTel to v1.13 triton-inference-server/third_party#52

Merged