Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Context Propagation for OTel trace mode #6785

Merged
merged 23 commits into from
Jan 17, 2024

Conversation

oandreeva-nv
Copy link
Contributor

Resolves issue: #5853
I've picked up @HennerM work, he has CLA: #5853 (comment)

This PR adds 2 classes: HttpTextMapCarrier and GrpcServerCarrier, that take care of extracting OTel relevant header/metadata from an inference request. Next, when request comes to the server, OTel relevant context is extracted and stored in TraceStartOptions start_options.

TraceStartOptions was added for the following reason. Currently we can set up trace options through rate and count. However, users may want to trace every request, for which OTel trace was initiated on client side. In this case, it is hard to properly tune rate (I would say, even impossible). Thus, I've added logic to initiate trace every time a valid OTel context is passed and ignoring rate and count.

Interestingly, we are allowing to start tritonserver with rate=0 (checked locally). I've added a check for rate_ !=0 in TraceManager::TraceSetting::SampleTrace(bool force_sample) and would like to keep this for now for the following reason. If users would like to trace only requests, for which trace was initiated on client side, they can start tritonserver with rate=0. I will follow up with a separate PR, which will introduce a dedicated trace-config option, something like : --trace-config opentelemetry, context-based-tracing=ON, which will eliminate current rate=0 WAR, since this is slightly out of scope of this PR and current PR is already big.

I've re-factored tests. Extracted small sub-routines and made open telemetry tests easy to work with and to expand.

Additionally, there were issues with ABSEIL, so I removed it from OTel's installation: triton-inference-server/third_party#52

@oandreeva-nv oandreeva-nv requested review from rmccorm4, dyastremsky, Tabrizian, tanmayv25, GuanLuo and nnshah1 and removed request for dyastremsky January 11, 2024 19:20
span_name = span[0]["name"]
span_names.append(span_name)
span_events = span[0]["events"]
event_names_only = rex_name_field.findall(str(span_events))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share sample event format that's getting parsed? Just curious what's being parsed by the regex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or:

>>> traces = []
>>> with open(filename) as f:
...     for json_obj in f:
...             entry = json.loads(json_obj)
...             traces.append(entry)
... 
>>> rex_name_field = re.compile("(?<='name': ')[a-zA-Z0-9_\- ]+(?=')")
>>> 
>>> parsed_spans = [
...             entry["scopeSpans"][0]["spans"] for entry in traces[0]["resourceSpans"]
...         ]
>>> 
>>> parsed_spans
[[{'traceId': 'a26527be267cd796db6ca11a9da80d02', 'spanId': '328c3e08db9fb885', 'parentSpanId': 'dba197f625f0d47f', 'name': 'compute', 'kind': 2, 'startTimeUnixNano': '1705013284351565335', 'endTimeUnixNano': '1705013284352786278', 'events': [{'timeUnixNano': '1705013284351565335', 'name': 'COMPUTE_START'}, {'timeUnixNano': '1705013284352280806', 'name': 'COMPUTE_INPUT_END'}, {'timeUnixNano': '1705013284352734157', 'name': 'COMPUTE_OUTPUT_START'}, {'timeUnixNano': '1705013284352786278', 'name': 'COMPUTE_END'}], 'status': {}}], [{'traceId': 'a26527be267cd796db6ca11a9da80d02', 'spanId': 'dba197f625f0d47f', 'parentSpanId': 'f85c6361d1187a56', 'name': 'simple', 'kind': 2, 'startTimeUnixNano': '1705013284351347347', 'endTimeUnixNano': '1705013284355884227', 'attributes': [{'key': 'triton.model_name', 'value': {'stringValue': 'simple'}}, {'key': 'triton.model_version', 'value': {'intValue': '1'}}, {'key': 'triton.trace_id', 'value': {'intValue': '1'}}, {'key': 'triton.trace_parent_id', 'value': {'intValue': '0'}}], 'events': [{'timeUnixNano': '1705013284351347347', 'name': 'REQUEST_START'}, {'timeUnixNano': '1705013284351378305', 'name': 'QUEUE_START'}, {'timeUnixNano': '1705013284355884227', 'name': 'REQUEST_END'}], 'status': {}}], [{'traceId': 'a26527be267cd796db6ca11a9da80d02', 'spanId': 'f85c6361d1187a56', 'parentSpanId': '', 'name': 'InferRequest', 'kind': 2, 'startTimeUnixNano': '1705013284351158969', 'endTimeUnixNano': '1705013284357092351', 'events': [{'timeUnixNano': '1705013284351081189', 'name': 'HTTP_RECV_START'}, {'timeUnixNano': '1705013284351104757', 'name': 'HTTP_RECV_END'}, {'timeUnixNano': '1705013284352828926', 'name': 'INFER_RESPONSE_COMPLETE'}, {'timeUnixNano': '1705013284352956556', 'name': 'HTTP_SEND_START'}, {'timeUnixNano': '1705013284352983290', 'name': 'HTTP_SEND_END'}], 'status': {}}]]
>>> span_names = []
>>> for span in parsed_spans:
...     span_name = span[0]["name"]
...     span_names.append(span_name)
...     span_events = span[0]["events"]
...     event_names_only = rex_name_field.findall(str(span_events))
...     print(event_names_only)
... 
['COMPUTE_START', 'COMPUTE_INPUT_END', 'COMPUTE_OUTPUT_START', 'COMPUTE_END']
['REQUEST_START', 'QUEUE_START', 'REQUEST_END']
['HTTP_RECV_START', 'HTTP_RECV_END', 'INFER_RESPONSE_COMPLETE', 'HTTP_SEND_START', 'HTTP_SEND_END']

Copy link
Contributor

@rmccorm4 rmccorm4 Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks! So is the regex needed? Or can it just be parsed as json/dict? Looks like it's well structured, unless I'm misunderstanding. Something like:

for span in parsed_spans:
  ...
  event_names_only = [event["name"] for event in span_events]
  print(event_names_only)

I only suggest because I find the regex hard to parse/understand at a glance. I'd have to sit and think about what it's doing as a reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it is easier to pass span[0]["events"] as a string and do regex.

span[0]["events"] is actually a list of dictionaries (on image it is seen better), that is why for this test I didn't want to parse span[0]["events"] list to collect all names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it same?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, bad on my part. Will remove regex

Comment on lines +476 to +508
def test_http_trace_simple_model(self):
"""
Tests trace, collected from executing one inference request
for a `simple` model and HTTP client.
"""
triton_client_http = httpclient.InferenceServerClient(
"localhost:8000", verbose=True
)
inputs = prepare_data(httpclient)
triton_client_http.infer(self.simple_model_name, inputs)

self._test_simple_trace()

def test_http_trace_simple_model_context_propagation(self):
"""
Tests trace, collected from executing one inference request
for a `simple` model, HTTP client and context propagation,
i.e. client specifies OTel headers, defined in `self.client_headers`.
"""
triton_client_http = httpclient.InferenceServerClient(
"localhost:8000", verbose=True
)
inputs = prepare_data(httpclient)
triton_client_http.infer(
self.simple_model_name, inputs, headers=self.client_headers
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is totally optional "follow-up PR" material if you like the idea, but I think a lot of these functions for:

  1. http/grpc
  2. with/without context propogation
  3. simple/ensemble/bls

Could be condensed a lot with something like unittest.subtest or parameterized:

def test_trace(self):
  clients = [httpclient, grpcclient]
  header_list = [None, self._client_headers]
  models = self.test_models # [self.simple_model_name, self.ensemble, self.bls ...]

  permutations = itertools.product(clients, header_list, models)
  for p in permutations:
    client_type, headers, model = p[0], p[1], p[2]
    with self.subtest(permutation=p):
        client = client_type.InferenceServerClient(...) # can probably be smart and include url somewhere
        inputs = prepare_data(client_type) 
        client.infer(model, inputs, headers)
        self._test_trace(model, headers=headers)

Doesn't have to be exactly like this, but seems like there's room for cleanup or re-using logic

qa/L0_trace/test.sh Outdated Show resolved Hide resolved
src/grpc/infer_handler.cc Outdated Show resolved Hide resolved
src/http_server.cc Outdated Show resolved Hide resolved
src/tracer.h Show resolved Hide resolved
src/tracer.cc Outdated Show resolved Hide resolved
std::move(trace_manager_->SampleTrace(request.model_name()));
TraceStartOptions start_options;
InferenceTraceMode mode = TRACE_MODE_TRITON;
trace_manager_->GetTraceMode(request.model_name(), &mode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this adds a lock to the hot path. Will DM you to discuss further

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, I can do the following re-factoring:
std::shared_ptr<TraceSetting> trace_setting can become a part of TraceStartOptions.

And then when we initialize start_options on line 919, I can get trace_setting for current model through the steps. This way, lock is happenning only once (as it is now) and then we just pass start options to TraceManager::SampleTrace , where we move on to this line since trace_setting is already obtained

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuanLuo I would like to have your opinion on this as well if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic and tests LGTM - just want to discuss this one comment: https://github.com/triton-inference-server/server/pull/6785/files#r1449559949.

Most nits/cleanups/helper function comments can be a follow-up PR if it will impact code freeze and isn't changing any logic.

Copy link
Contributor

@GuanLuo GuanLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, leave comment for clarification

auto prop =
otel_cntxt::propagation::GlobalTextMapPropagator::GetGlobalPropagator();
auto ctxt = opentelemetry::context::Context();
start_options.propagated_context = prop->Extract(carrier, ctxt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propagated_context will be set to an valid default if there is no context from client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Here we use an instance of class HttpTraceContext as a global propagator. We set it here

Extract method will use ExtractImpl, meaning our Get, e.g. HTTP Get, to extract traceparent and tracestate info from headers/metadata. In none is found, it will return invalid SpanContext and this line will create invalid span. So in the returned context there will be invalid span stored.

I believe the reason, why everything is working when no context is passed, is because by default StartSpanOptions.parent is initialized to invalid context here,
So essentially, OpenTelemetry handles invalid spans as parents on their end to start a parent-less span.

However, I will add changes to handle this behavior better.

start_options.propagated_context = prop->Extract(carrier, ctxt);
otel_trace_api::SpanContext span_context =
otel_trace_api::GetSpan(start_options.propagated_context)->GetContext();
if (span_context.IsValid()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this always be evaluated to true in the case of no client context? And causes force sampling as long as the trace is in OpenTelemetry mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've touched this in the above comment. When no context is propagated, and invalid span will be stored in start_options.propagated_context, so span_context.IsValid() will return false.

This behavior is tested in test_http_trace_triggered, where I set rate=5 and send 5 requests without otel headers -> only one is traced, then send 5 requests with otel headers and check that all 5 are traced

@oandreeva-nv oandreeva-nv force-pushed the oandreeva_opentelemetry_cntxt_propagation_main branch 2 times, most recently from 867d3bf to aae29ca Compare January 12, 2024 09:27
auto m_it = model_settings_.find(model_name);
trace_setting =
(m_it == model_settings_.end()) ? global_setting_ : m_it->second;
std::lock_guard<std::mutex> r_lk(r_mu_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to use more descriptive variable names. It is hard to know what r_mu_ means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I am re-using already existing locks. I believe next step (since we've moved to C++17) not only to rename them, but also change to dedicated read locks (std:shared_lock I believe)

@@ -3055,11 +3055,13 @@ HTTPAPIServer::StartTrace(
TRITONSERVER_InferenceTrace** triton_trace)
{
#ifdef TRITON_ENABLE_TRACING
HttpTextMapCarrier carrier(req->headers_in);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these carriers just record all the headers associated with gRPC/HTTP requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is specific to HTTP requests. Carries are initialized with headers/metadata, and then OTel propagator will use them to extract relevant headers.
In this comment I provide a deep dive of what is going on:
#6785 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In HttpTextMapCarrier we've also implemented Get method, so that OTel knows how to extract headers

}

TraceManager::TraceStartOptions
TraceManager::GetTraceStartOptions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a future refactor it might be worth having separate classes for TritonTrace and OTelTrace. This could help with not having to propagate #ifdefs for WiNDOWS and checking for open telemetry in every tracing function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree and similar idea was added to ticket: https://jirasw.nvidia.com/browse/DLIS-4803

qa/L0_trace/test.sh Show resolved Hide resolved
qa/L0_trace/opentelemetry_unittest.py Show resolved Hide resolved
Tabrizian
Tabrizian previously approved these changes Jan 12, 2024
@@ -507,9 +507,9 @@ if(${TRITON_ENABLE_TRACING})
tracer.cc tracer.h
)

if (NOT WIN32)
target_compile_features(tracing-library PRIVATE cxx_std_${TRITON_MIN_CXX_STANDARD})
target_compile_features(tracing-library PRIVATE cxx_std_${TRITON_MIN_CXX_STANDARD})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that this line was previously ignored by Windows, so in this PR as well

src/tracer.cc Outdated
// [FIXME: DLIS-6033]
// A current WAR for initiating trace based on propagated context only
// Currently this is implemented through setting trace rate = 0
if (!force_sample && rate_ != 0) {
Copy link
Contributor

@rmccorm4 rmccorm4 Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm understanding correctly, we will always trace when a context is propagated (client header provided -> force_sample=True) - but these propogation-initiated traces will not affect the counts/logic used when deciding to do normal traces (ex: every 1000th request) ?

So for example with default rate == 1000, before:

  • request 0 and request 1000 will be traced.

But now. if I added client headers to some intermediate request say request 7, then instead:

  • request 0, request 7 (context propagation), and request 1001 would get traced? is that correct?

If so, will this have any impact or confusing behavior on related tools like trace_summary tool or others?

Copy link
Contributor

@rmccorm4 rmccorm4 Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also out of curiosity, can rate_ == 0 here? I didn't look through all the code, but I noticed it would probably hit a division by zero / floating point exception when running:

    create_trace = (((++sample_) % rate_) == 0);

without this check you added, so I wanted to try it out:

tritonserver --model-store /mnt/triton/models --trace-rate 0 --trace-level TIMESTAMPS

and I sent some inferences, but didn't get any errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is also what I found. I think I talk about it in the description of this PR. % by 0 was definitely happening before this PR, and I have no idea why it wasn't segfaulting. With this PR, if rate_==0, we are skipping this block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding earlier comment: Good catch. We've discussed with Guan to not update counters, when tracing is happening "by force". Let me correct this.

Copy link
Contributor Author

@oandreeva-nv oandreeva-nv Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to keep updating counters as is (if rate_ > 0) for now. I've started checking all parameters and it seems like e.g created_ is not used, so this area would benefit from a clean up.
I made it a scope of this ticket: DLIS-6033

Clean up is still planned, counters' handling is coming

rmccorm4
rmccorm4 previously approved these changes Jan 13, 2024
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. cleanup of traceStartOptions helpers into trace manager looks great!

@oandreeva-nv oandreeva-nv merged commit af2c54e into main Jan 17, 2024
3 checks passed
oandreeva-nv added a commit that referenced this pull request Jan 17, 2024
Added support for OTel context propagation

---------

Co-authored-by: Markus Hennerbichler <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
@oandreeva-nv oandreeva-nv deleted the oandreeva_opentelemetry_cntxt_propagation_main branch January 17, 2024 20:29
GuanLuo pushed a commit that referenced this pull request Jan 17, 2024
Added support for OTel context propagation

---------

Co-authored-by: Markus Hennerbichler <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants