Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor]: Refactor Frontend Trace OpenTelemetry Implementation #7390

Merged
merged 9 commits into from
Jul 5, 2024

Conversation

oandreeva-nv
Copy link
Contributor

What does the PR do?

This PR introduces an improvement to our frontend OpenTelemetry tracing implementation. The main goal of this refactoring is to enhance the reliability and flexibility of the system.

Key Changes:

  • Introduced a Stack-Based Approach
    • The previous implementation relied on specific span names to preserve the trace hierarchy.
    • Current solution uses stacks and doesn't need to know what type of span was started.
    • For ensemble and bls models, any number of sub-traces can be spawned. Since we preserve parent_id for every trace, I utilize this and introduced an unordered map to keep spans in stacks. Each trace and sub-trace has its own stack. When stack is empty, I use parent_id to check the top of the parent's stack to get parent span's information.
    • This refactor also helps the following Custom backend tracing implementation to avoid handling of the complex logic for multiple nested custom spans, i.e. we simply look at the top of the stack.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • [] Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

N/A

Where should the reviewer start?

trace.h

Test plan:

  • CI Pipeline ID: 16139557

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

@oandreeva-nv oandreeva-nv added module: server Issues related to the server core PR: refactor A code change that neither fixes a bug nor adds a feature labels Jun 27, 2024

/// Adds event to the span on the top of the stack, related to trace
/// with `trace_id`. If activity is TRITONSERVER_TRACE_REQUEST_START,
/// or TRITONSERVER_TRACE_COMPUTE_START, starts a new span and adds it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not let caller use StartSpan for those event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify?

src/tracer.cc Show resolved Hide resolved
src/tracer.cc Show resolved Hide resolved
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an example for how the start/end detection looks like for custom backend tracing API?

src/tracer.h Outdated Show resolved Hide resolved
src/tracer.h Outdated Show resolved Hide resolved
src/tracer.h Outdated Show resolved Hide resolved
src/tracer.h Outdated Show resolved Hide resolved
@oandreeva-nv
Copy link
Contributor Author

@Tabrizian this is just a re-factor for the frontend. The actual custom backend implementation PR will be up shortly

@oandreeva-nv oandreeva-nv force-pushed the oandreeva-trace-refactor branch from 6b60d93 to 864cc75 Compare July 2, 2024 21:27
@oandreeva-nv oandreeva-nv mentioned this pull request Jul 2, 2024
20 tasks
@oandreeva-nv oandreeva-nv merged commit 5f10d61 into main Jul 5, 2024
3 checks passed
@oandreeva-nv oandreeva-nv deleted the oandreeva-trace-refactor branch July 5, 2024 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: server Issues related to the server core PR: refactor A code change that neither fixes a bug nor adds a feature
Development

Successfully merging this pull request may close these issues.

3 participants