-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ActivityId missing in a newly added HttpTelemetry Stop events. #40776
Comments
Tagging subscribers to this area: @dotnet/ncl |
I might have found the root cause of this problem. The Details:
Several options how to solve it:
|
@ManickaP nice investigating 👍 I'm not knowledgeable about the networking code, but is there an option to call Start() sooner so that it runs in the same Task where Stop() will eventually be called? Other more customized solutions may also be possible, but I've never attempted to author one and my suspicion is it would get messy. |
We could, theoretically, run it from Is there a way to manually retrieve and later attach the ActivityId to the stop event? |
As @ManickaP said, we could only do that for buffered responses. For unbuffered ones, we hand out the response that the user can interact with that will eventaully trigger a Stop event. If I'm understanding this correctly, we can't flow the ActivityID up to the parent? If we give up on tracking the response portion, we would cut into the value of the activity. At that point we would be telling users to log their own Start/Stop activity enclosing the processing of a request. Alternatively, would introducing a "SendAndProcessWithTelemetryAsync" method that takes a callback make any sense? In that can we could ensure that everything happens in the same scope. Something along the lines of (just a rough idea): // On HttpClient
Task<HttpResponseMessage> SendAndProcessWithTelemetryAsync(HttpRequestMessage request, HttpCompletionOption completionOption, Func<HttpResponseMessage, Task> responseCallback, CancellationToken cancellationToken)
{
TelemetryStart();
try
{
HttpResponseMessage response = await this.SendAsync(request, completionOption, cancellationToken);
await responseCallback(response);
TelemetryStop();
return response;
}
catch
{
TelemetryAbort(); // Which also logs Stop
throw;
}
} |
That would measure/trace different things depending of |
The idea is that it would be captured in the user-provided callback. Essentially if a user had code like Task DoSomethingWithResponseAsync(HttpResponseMessage response) {} var response = await client.SendAsync(request, completionOption);
await DoSomethingWithResponseAsync(response); it would turn into await client.SendAndProcessWithTelemetryAsync(request, completionOption, DoSomethingWithResponseAsync); That way the Telemetry Start/Stop would be in the enclosing scope of the response processing. |
Now I see. |
One could save EventSource.CurrentThreadActivityId after the start event, use EventSource.SetCurrentThreadActivityId to temporarily change the activity ID before and after the stop event, and set EventActivityOptions.Disable on the stop event to prevent ActivityTracker from overriding the activity ID. However, if the request causes any other activities, then those would not get the correct parent in their hierarchical GUIDs, so the Related: dotnet/diagnostics#1347 |
Good question, but no :-) We do not want to encourage that style of programming, and it won't fix the 99% case (on top of which it's more expensive). |
What @KalleOlaviNiemitalo suggests would work to set the activity id of the stop event, however there is also the consideration of what activity id is attached to all events that happen to occur in between the start and stop. Perhaps we are in position to know that we can control all of those events because we own 100% of the code that executes between those two points? I didn't get a chance to read all the updates above and it is almost 5:30am so I need to sleep before I'll be able to do a good job thinking about this. When I'm awake again I'll dig into it further : ) |
We can move the All of this is based on premise, that tracing the request until the whole response in consumed is what we want. Alternatively, we could narrow the scope only to the point of creating the response and exclude the response content reading. |
If the response is unbuffered, then we should make the stop be when we hand back control to the user code, as its not ultimately us in control. |
Yes (at least in the cases where the caller will still need to download the remainder of the response after SendAsync task completes). Apologies for not recognizing the potential for this issue earlier. The main premises of Activities are:
Even if we are capable creating events that don't adhere to these rules, the tools doing analysis are likely to struggle because we expect Activities form a tree. So a generic pattern that is hard for the library to represent with a single Activity looks like this: class Worker
{
Task DoWorkPart1(...); // in our case this is HttpClient.SendAsync()
Task DoWorkPart2(...); // and Stream.ReadAsync() on the returned stream
} A user can consume that API and do very not-treeish things with it: List<string> workList = new List<string>();
foreach(string work in workList) { await Worker.DoWorkPart1(work); }
foreach(string work in workList) { await Worker.DoWorkPart2(work); } The individual parts of the work can be sequenced/nested, but if part1 and part2 are combined into a single activity they don't nest:
However, there are still options we could look at to improve the situation, now and/or in the future:
I also want to loop in @brianrob who has probably had more experience with this type of issue than I have. He may know of some pre-existing approaches that handle this better without needing to do anything novel. Hope this helps! |
For 5, I'd recommend we scope down and send the stop at the end of SendAsync, and make it clear either with the event name, or parameters in the payload as to what type of request it is, and so what the Stop will actually correspond to. |
FYI: For your test app, make sure you aren’t running under the debugger. VS TPL analysis can conflict with the activity tracking – its something Noah mentioned to me when I was looking at some other event stuff.
Sam
|
Closing as resolved by #41022 |
In non-trivial number of cases, the
RequestStop
event has empty activity id.9 945
out of35 452
requests sent during HTTP FunctionalTests were missing activity id.Other
Start/Stop
event pairs introduced with the new telemetry are not properly linked with ActivityId as well.Also easily reproducible with a simple
HttpClient
scenario:Result (with the change in #41022)
Un-pairedStop
events:-ResolutionStop
: ran in continuation: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.NameResolution/src/System/Net/Dns.cs#L485-ConnectStop
: ran inSocketsEventArgs
: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.cs#L744-HandshakeStop
: ran in continuation: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs#L240They're paired, I misinterpreted the logs. The Stop event doesn't carry parent activity id, only the start one.
What needs to be done to achieve desired statistics:
Fix Resolution eventswrongly interpreted logsFix Connect eventswrongly interpreted logsFix Handshake eventswrongly interpreted logsResponseHeadersRead
for the whole method Moved HTTP request telemetry to HttpClient.SendAsync #41022ResponseHeadersRead
that want to properly time response body reading (based on our solution for convenience methods)We can also add or leave existing unpaired telemetry (though it shouldn't be named Start/Stop then) for events that cannot be bound by a single async context. We can link them to the Start/Stop events manually either via remembering ActivityId or using some other custom ID.
cc: @samsp-msft @MihaZupan @karelz
The text was updated successfully, but these errors were encountered: