Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement config to ignore the sampled flag in traceparent #1310

Merged
merged 5 commits into from
Jun 15, 2021
Merged

Implement config to ignore the sampled flag in traceparent #1310

merged 5 commits into from
Jun 15, 2021

Conversation

gregkalapos
Copy link
Contributor

@gregkalapos gregkalapos commented May 26, 2021

The problem

We started seeing environments where the full trace is not within elastic and the sample rate is effectively 0% regardless of the TransactionSampleRate setting. Very likely this is because unmonitored .NET5+ services send a traceparent header with a sampled flag set to 0 (doc on the change from Microsoft)

Proposed solution in this PR

We add a new setting called SuppressTraceContextHeaders. If this is true we ignore any distributed tracing data during transaction start and we make a new sampling decision regardless of the traceparent value. Update: see comments below, the solution evolved during the PR reviews.

@gregkalapos gregkalapos self-assigned this May 26, 2021
@apmmachine
Copy link
Contributor

apmmachine commented May 26, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Branch indexing

  • Start Time: 2021-06-14T13:34:41.472+0000

  • Duration: 48 min 19 sec

  • Commit: 3ce68a6

Test stats 🧪

Test Results
Failed 0
Passed 19429
Skipped 101
Total 19530

Trends 🧪

Image of Build Times

Image of Tests

@gregkalapos
Copy link
Contributor Author

gregkalapos commented May 27, 2021

Alternatives:

  • make the setting into a URL pattern (set URLs where you ignore the headers and apply it for the rest)
  • only apply when tracestate=es:... is NOT present (aka the request is not from our agent)

Maybe make it TraceContextHeadersIgnore.

  • See if traceparent or tracestate has any indication if it was set from Activity.

@russcam
Copy link
Contributor

russcam commented Jun 4, 2021

@gregkalapos, @AlexanderWert Another alternative, which is along similar lines to ones proposed:

New configuration value: TraceContextIgnoreSampledFalse: true|false, with a default value of false.

If TraceContextIgnoreSampledFalse is set to true, when a transaction is started, if there is Trace Context:

  • If the sampled flag is 1, proceed as per current implementation
  • If the sampled flag is 0, check the tracestate
    • If tracestate contains es vendor and s value, proceed as per current implementation
    • If tracestate does not contain es vendor and s value
      1. determine whether the transaction should be sampled based on the transaction id and set IsSampled
      2. set the sample rate from the sampler
      3. use the trace id, parent id of Trace Context
      4. mutate tracestate of Trace Context to add es vendor and s value to and use

The end result is that sampled flag value of 0 is ignored when tracestate does not contain es vendor and s value

@gregkalapos
Copy link
Contributor Author

@gregkalapos, @AlexanderWert Another alternative, which is along similar lines to ones proposed:

New configuration value: TraceContextIgnoreSampledFalse: true|false, with a default value of false.

If TraceContextIgnoreSampledFalse is set to true, when a transaction is started, if there is Trace Context:

  • If the sampled flag is 1, proceed as per current implementation

  • If the sampled flag is 0, check the tracestate

    • If tracestate contains es vendor and s value, proceed as per current implementation

    • If tracestate does not contain es vendor and s value

      1. determine whether the transaction should be sampled based on the transaction id and set IsSampled
      2. set the sample rate from the sampler
      3. use the trace id, parent id of Trace Context
      4. mutate tracestate of Trace Context to add es vendor and s value to and use

The end result is that sampled flag value of 0 is ignored when tracestate does not contain es vendor and s value

@russcam the suggestion sounds good to me.

The advantage is clearly that this would solve the issues we have with .NET5+ unmonitored services while it causes the least harm in other environments.

The only disadvantage I see is that people will have a super hard time to understand this - I don't expect users to understand the W3C TraceContext spec, so it'll be super hard to explain in the docs what this config exactly does.

Overall I think this is a good way to proceed and worth accepting the more complicated setting - I'll update the PR.

@gregkalapos
Copy link
Contributor Author

Pushed the changes discussed above - ready for another review round.

Copy link
Contributor

@russcam russcam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had an initial look and left some comments

@@ -32,6 +32,7 @@ public static class DefaultValues
public const string SpanFramesMinDuration = "5ms";
public const double SpanFramesMinDurationInMilliseconds = 5;
public const int StackTraceLimit = 50;
public const bool SuppressTraceContextHeaders = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public const bool SuppressTraceContextHeaders = false;
public const bool TraceContextIgnoreSampledFalse = false;

Consistent name for the default value constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Fixed this and the usage of it as well.

[[config-trace-context-ignore-sampled-false]]
==== `TraceContextIgnoreSampledFalse`

By setting this to `true` the agent will ignore the sampled part of the W3C TraceContext traceparent header when it's false and when the upstream transaction is not coming from a non-Elastic APM agent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to explain the "normal" flow when Trace Context is received, and then lead into describing what this setting does. In doing so, I think it would provide more context around why it is needed and if it might be relevant to a specific user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

This can be useful when a caller service always sets the sampled flag to false and the agent would have no chance to
create any sampled transaction.

NOTE: .NET 5 by default sets the W3C TraceContext headers, but without an active agent it sets the sampled flag to `false` leading to 0% sampling. If your service is called by an unmonitored .NET 5 application, setting this flag will instruct the agent to start a new trace and make a new sampling decision based on <<config-transaction-sample-rate>>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NOTE: .NET 5 by default sets the W3C TraceContext headers, but without an active agent it sets the sampled flag to `false` leading to 0% sampling. If your service is called by an unmonitored .NET 5 application, setting this flag will instruct the agent to start a new trace and make a new sampling decision based on <<config-transaction-sample-rate>>.
[IMPORTANT]
--
.NET 5 applications set the https://www.w3.org/TR/trace-context/[W3C Trace Context] headers for outgoing HTTP requests by default, but with the traceparent sampled flag set to `false`. If the application has an active agent, the agent will ensure that the sampled flag is propagated appropriately. If the application does not have an active agent however, and the application calls another application (the callee) that does have an active agent, the propagation of a sampled flag of `false` results in no sampled traces in the callee application.
If your application is called by an unmonitored .NET 5 application, setting `TraceContextIgnoreSampledFalse` to `true` will instruct the agent to start a new trace and make a new sampling decision based on <<config-transaction-sample-rate>> when the traceparent sampled flag is `false` and there is no agent specific tracestate present.
--

Maybe the section should lead with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, moved this to the top of the section.


[IMPORTANT]
--
..NET 5 applications set the https://www.w3.org/TR/trace-context/[W3C Trace Context] headers for outgoing HTTP requests by default, but with the traceparent sampled flag set to `false`. If the application has an active agent, the agent will ensure that the sampled flag is propagated appropriately. If the application does not have an active agent however, and the application calls another application (the callee) that does have an active agent, the propagation of a sampled flag of `false` results in no sampled traces in the callee application.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we need an additional . to render .NET in asciidoc when it's the 1. word in the section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it renders the paragraph in red italics in asciidoctor.js

image

An attribute entry can be used

Suggested change
..NET 5 applications set the https://www.w3.org/TR/trace-context/[W3C Trace Context] headers for outgoing HTTP requests by default, but with the traceparent sampled flag set to `false`. If the application has an active agent, the agent will ensure that the sampled flag is propagated appropriately. If the application does not have an active agent however, and the application calls another application (the callee) that does have an active agent, the propagation of a sampled flag of `false` results in no sampled traces in the callee application.
:dot: .
{dot}NET 5 applications set the https://www.w3.org/TR/trace-context/[W3C Trace Context] headers for outgoing HTTP requests by default, but with the traceparent sampled flag set to `false`. If the application has an active agent, the agent will ensure that the sampled flag is propagated appropriately. If the application does not have an active agent however, and the application calls another application (the callee) that does have an active agent, the propagation of a sampled flag of `false` results in no sampled traces in the callee application.

which renders as

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ignore the suggestion here as it's included in the next suggestion

@gregkalapos gregkalapos changed the title Implement SuppressTraceContextHeaders config to ignore W3C headers Implement config to ignore the sampled flag in traceparent Jun 14, 2021
Copy link
Contributor

@russcam russcam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've left a suggestion for the docs


[IMPORTANT]
--
..NET 5 applications set the https://www.w3.org/TR/trace-context/[W3C Trace Context] headers for outgoing HTTP requests by default, but with the traceparent sampled flag set to `false`. If the application has an active agent, the agent will ensure that the sampled flag is propagated appropriately. If the application does not have an active agent however, and the application calls another application (the callee) that does have an active agent, the propagation of a sampled flag of `false` results in no sampled traces in the callee application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ignore the suggestion here as it's included in the next suggestion

@gregkalapos gregkalapos merged commit 708ece0 into elastic:master Jun 15, 2021
@gregkalapos gregkalapos deleted the SuppressTraceparent branch June 15, 2021 18:31
@russcam russcam added the enhancement New feature or request label Jun 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants