TraceLog streaming/in-memory EventPipe support #1867

vaind · 2023-05-04T21:02:34Z

I'm trying to make changes to enable TraceLog use without creating an ETLX file from an EventPipe (nettrace). I've taken the existing ETW streaming support as a baseline and tried to adapt that. However, I'm having trouble getting the newly added tests to pass because the events coming from TraceLog don't match what I'd get from EventPipeEventSource (the previous test case, Streaming in the same file).

@brianrob maybe if you could have a look at my changes, you could spot an issue what I'm doing wrong (or not doing and I should be) to get this working properly? Or is there some bigger problem I'm missing completely that prevents this from happening?

The use-case I'm going after is a continuously running sampling profiler that only samples as needed (because actually starting the profiler takes a lot of time in the CLR so I want to do it only once).

This is a follow up, though not strictly an implementation of, #1829

vaind · 2023-05-08T18:25:23Z

@microsoft-github-policy-service agree

brianrob · 2023-05-08T23:28:14Z

@vaind, thanks for working on this. I think at a high level this should work. A couple of thoughts:

It would be worthwhile to architect this similiar to how we do with TraceEventSession for live sessions. This allows the TraceLog to know that it is being used in a live mode instead of a file-based mode. There are some differences in how one can use the TraceLog when its live, because it hasn't seen the "end" of the trace, unlike in the file-based mode.
Can you share what differences you're seeing? It's possible that it's just some piece of data that hasn't made it over to the TraceLog.

vaind · 2023-05-09T19:19:23Z

@vaind, thanks for working on this. I think at a high level this should work. A couple of thoughts:

Thanks for getting back to me, it indeed seems to work from my testing in the Sentry profiling SDK, with some quirks so far (I didn't get stack trace method resolution working yet, but from my understanding it should work because all the info has already come from the runtime). Also the ActivityComputer (and others) don't seem to work when streaming at the moment so that would also need some tuning I guess. I was able to get around them for now but I think they're useful so I'll likely get back to them in the future.

It would be worthwhile to architect this similiar to how we do with TraceEventSession for live sessions. This allows the TraceLog to know that it is being used in a live mode instead of a file-based mode. There are some differences in how one can use the TraceLog when its live, because it hasn't seen the "end" of the trace, unlike in the file-based mode.

Yes, I've made the change by reusing the same code, really, with a minor change of setting up the kernelSession in the TraceEventSession factory instead of the common constructor.

Can you share what differences you're seeing? It's possible that it's just some piece of data that hasn't made it over to the TraceLog.

e.g the following line is missing from the captured output but is present in the eventpipe-dotnetcore2.1-linux-x64-objver3.netperf.baseline.txt:

System.Threading.Tasks.TplEventSource/SetActivityId, 1, <Event MSec=   "111.4517" PID="80749" PName="Process(80749)" TID="80749" EventName="SetActivityId" ProviderName="System.Threading.Tasks.TplEventSource" NewId="10000000-0000-0000-0000-000000000001"/>

Also, all the custom events are missing, as present in eventpipe-dotnetcore2.1-linux-x64-tracelogging.netperf.baseline.txt

I'm pretty sure it's a test-code issue, because as I've mentioned earlier, the overall approach seems to work in general, based on my changes to the profiling code in the Sentry SDK which now starts the profiler early in the process and slices profiles when actually needed (drops the other samples if not currently needed). I think this functionality will need better tests here, instead of trying to reuse what was already present in EventPipeParsing/Streaming.

vaind · 2023-05-11T15:32:54Z

So the issue I'm having with stacktrace info not being available is due to missing modules & method info because the rundown on a session only happens when a session is stopped, while I'm trying to access this info while the sample-profiler is running (on each sample). I've raised an inquiry for this issue: dotnet/runtime#86103

I was wondering whether I could start & stop a session, in the beginning, to force the rundown to happen and then use these events to feed TraceLog. Afterwards, the runtime information would rely on the runtime provider to get updates. @brianrob do you think this could work, conceptually or do you see any issues that would need to be overcome (or prevent this completely)?

vaind · 2023-05-19T17:59:27Z

I was wondering whether I could start & stop a session, in the beginning, to force the rundown to happen and then use these events to feed TraceLog. Afterwards, the runtime information would rely on the runtime provider to get updates. @brianrob do you think this could work, conceptually or do you see any issues that would need to be overcome (or prevent this completely)?

So I've verified this works fine, I'll just need to add test cases to this repo so that it's covered here.

vaind · 2023-05-24T07:09:33Z

@brianrob Any chance you could have a first look at TraceLog.cs changes if they seem reasonable? If so, I'd update tests to cover the changes & mark this PR as Ready for review

brianrob · 2023-05-25T01:44:17Z

@vaind, apologies for not getting to this sooner. I am reviewing your code.

brianrob · 2023-05-25T02:25:52Z

@vaind, the concept here looks good. One thing I think we will need to resolve before this is ready for review is the rundown design. Your instinct to trigger a rundown at the beginning of the trace is the right one. I was going to recommend that you use the StartRundown keyword to do this, but it doesn't look like this is plumbed through the runtime, so it wouldn't have any impact. If I recall correctly, EventPipe has a relatively hardcoded rundown setup, in which case we may not have many options here.

@davmason, do you know if rundown is configurable at all? If I recall correctly, end rundown is hardcoded at the end of the session. Is that right? I'm trying to figure out the best option for @vaind to trigger a start rundown at the beginning of streaming so that TraceLog gets method rundown information and can resolve stacks.

@vaind, once this is resolved, then I think it will make sense to review this more. I saw some diffs that I want to examine further, but I'm not on a great network connection today and so I don't have access to a quality diff tool.

vaind · 2023-05-25T06:24:34Z

@vaind, the concept here looks good. One thing I think we will need to resolve before this is ready for review is the rundown design. Your instinct to trigger a rundown at the beginning of the trace is the right one. I was going to recommend that you use the StartRundown keyword to do this, but it doesn't look like this is plumbed through the runtime, so it wouldn't have any impact. If I recall correctly, EventPipe has a relatively hardcoded rundown setup, in which case we may not have many options here.

Thanks for getting back to me @brianrob. And yes, there doesn't seem a way to trigger a rundown on the same session. I've discussed that in the runtime issue, I've created two weeks ago: dotnet/runtime#86103

davmason · 2023-05-31T08:23:13Z

@davmason, do you know if rundown is configurable at all? If I recall correctly, end rundown is hardcoded at the end of the session. Is that right? I'm trying to figure out the best option for @vaind to trigger a start rundown at the beginning of streaming so that TraceLog gets method rundown information and can resolve stacks.

Rundown is not currently configurable with EventPipe. You are remembering correctly, it is hardcoded to run at the end of the session. The internal code has the option to configure whether to do rundown or not, but we don't expose it over the IPC commands so customers can't turn it off or on. It would be fairly easy to expose the option to have rundown or not, and I don't think it would be that much work to have it at the start of a trace. I think we could reuse all the existing code

brianrob · 2023-05-31T14:48:49Z

@davmason, do you know if rundown is configurable at all? If I recall correctly, end rundown is hardcoded at the end of the session. Is that right? I'm trying to figure out the best option for @vaind to trigger a start rundown at the beginning of streaming so that TraceLog gets method rundown information and can resolve stacks.

Rundown is not currently configurable with EventPipe. You are remembering correctly, it is hardcoded to run at the end of the session. The internal code has the option to configure whether to do rundown or not, but we don't expose it over the IPC commands so customers can't turn it off or on. It would be fairly easy to expose the option to have rundown or not, and I don't think it would be that much work to have it at the start of a trace. I think we could reuse all the existing code

Thanks @davmason. Sounds like we should keep dotnet/runtime#86103 open to track making it possible to configure rundown via IPC, but for now, hacking things a bit here to use EndRundown sounds fine to me.

@vaind, once you're ready, go ahead and mark this as ready for review and I'll take a more thorough pass on it.

vaind · 2023-05-31T15:21:25Z

@vaind, once you're ready, go ahead and mark this as ready for review and I'll take a more thorough pass on it.

I've done so. It does need tests but I want to make sure the approach and APIs are OK-ish before trying to devise tests for them.

vaind · 2023-05-31T15:22:38Z

P.S. I've tested it as a consumer in Sentry .NET SDK. What I meant by needing tests is that it needs tests in this repo.

brianrob

Thanks for your patience. Here's a first cut at the review. I think we may have a few iterations, but overall, this is looking good.

src/TraceEvent/TraceEvent.Tests/Parsing/EventPipeParsing.cs

src/TraceEvent/TraceEvent.csproj

src/TraceEvent/TraceLog.cs

src/TraceEvent/TraceEvent.Tests/Parsing/EventPipeParsing.cs

src/TraceEvent/TraceLog.cs

vaind

I've made some changes as requested and replied to some questions.

Keeping the testing open until the questions are resolved.

src/TraceEvent/TraceEvent.csproj

src/TraceEvent/TraceLog.cs

src/TraceEvent/TraceEvent.Tests/Parsing/EventPipeParsing.cs

vaind · 2023-07-06T09:19:57Z

I've made some changes as requested and replied to some questions.

Keeping the testing open until the questions are resolved.

@brianrob Have you been able to finish reviewing this? Any more concerns or do you think it would be acceptable after adding tests for the changes/new APIs? Also, do you have any preferences for the testing approach?

src/TraceEvent/TraceEvent.Tests/Parsing/EventPipeParsing.cs

bruno-garcia · 2023-07-24T18:49:52Z

@brianrob this would really help unblock some work we want to do for .NET here at Sentry. Another round of review much appreciated

src/TraceEvent/TraceLog.cs

vaind · 2024-03-20T15:36:44Z

Hi @brianrob, I've updated this PR with the latest changes from the main branch and all the CI jobs pass. Is there anything I should do to so that these changes can eventually land?

brianrob

@vaind thank you for your patience on this and for all of the work that you've done thus far. I think we're getting close. I've added a few comments here, but I don't think any are big.

I see the following as the path to merge:

@noahfalk, can I get you to give this a review please? This is for livesession support for EventPipe.
I would like to validate that we don't regress live session support for ETW traces. Have you done any testing in this space? I know we don't have tests here in the repo, but presumably, we can do some adhoc testing, similar to https://github.com/brianrob/examples/tree/main/src/realtime-session-with-stacks.

Assuming we can resolve this, I think we'll be ready to merge (or at least know what we need to do).

src/PerfView.Tests/StackViewer/StackWindowTests.cs

src/TraceEvent/TraceLog.cs

noahfalk

This looked fine to me, but I warn my knowledge of the TraceLog portion of TraceEvent is limited so I wouldn't read too much into it. If there were anything subtly wrong I don't expect I would catch it without spending a while researching the internals of TraceLog.

brianrob · 2024-04-03T02:39:28Z

This looked fine to me, but I warn my knowledge of the TraceLog portion of TraceEvent is limited so I wouldn't read too much into it. If there were anything subtly wrong I don't expect I would catch it without spending a while researching the internals of TraceLog.

Thanks @noahfalk. No worries - I just wanted to get another set of eyes on this, and also to make you aware of the functionality.

bruno-garcia · 2024-04-05T00:53:52Z

wooo thanks @noahfalk !

Getting there

…og-streaming

vaind · 2024-05-23T10:49:13Z

@brianrob thanks for the review, somehow I've missed it until now.

I've addressed your requests and updated the branch. Let me know if there's anything else needed.

vaind · 2024-05-23T10:59:14Z

P. S. besides review changes, there was one more fix 36d2e2c (#1867) - these structures would grow indefinitely otherwise (getsentry/sentry-dotnet#3375). Similar is done for the other realtime session (when queue is used), but in that scenario it keeps a buffer of events - here, we don't need it because events are processed one by one as they come so after each event, all the callbacks have already been called.

bruno-garcia · 2024-06-07T18:58:44Z

This PR had its 1 year anniversary 😅

It's got all the review points addressed. Could we please get some help pushing this through 🙏

ericsampson · 2024-06-10T15:05:44Z

@bruno-garcia too bad that Elon tanked Twitter 😝

brianrob

Looking good. One oustanding issue and then I think we're about ready to merge.

src/TraceEvent/TraceEvent.csproj

brianrob · 2024-06-12T21:17:16Z

This PR had its 1 year anniversary 😅

It's got all the review points addressed. Could we please get some help pushing this through 🙏

Yup! Happy Anniversary?

vaind · 2024-06-27T13:04:40Z

FYI I'm investigating an unbound memory growth when streaming EventPipe. At first, I suspected the cause is this GC.KeepAlive but it seems there's (also) another issue. Maybe it has to do with pinned buffers in or in EventCache or something else entirely. In any case, I'm unable to pinpoint the issue, mainly because in while memory profiling, the managed memory actually doesn't grow. Or more specifically, managed memory grows, but when a GC is triggered, it is freed but the unmanaged memory stays the same.

On the other hand, if I run the same app with procgov64 --maxmem 70M -- app-name.exe (https://github.com/lowleveldesign/process-governor) the memory usage is stable (with an occasional crash which may be caused by some temporary memory pressure I guess). This, to me, sounds like it may have something to do with how runtime reuses (or doesnt) heaps when there are pinned objects inside... Any ideas? In any case I wouldn't want to block this PR so feel free to merge it if you're fine with the current status and any changes can come in a followup.

brianrob · 2024-06-28T21:02:53Z

I don't expect the GC.KeepAlive call to be a problem. That's just ensuring that the JIT reports the object as live until the call to GC.KeepAlive completes.

It's hard to say where the unmanaged memory growth is coming from. Does it repro in a minimal repro app? It might be worth looking at a GC trace (GCStats in PerfView) to see if the GC heap is increasing in size and this is just fragmentation being held, or if there is truly a native leak.

Either way, I am OK taking this change as is, and we can address this in a follow-up.

brianrob

Thanks for your patience and for bringing this across the finish line!

bruno-garcia mentioned this pull request May 8, 2023

.NET Profiling for client apps (Desktop/CLI/Mobile) getsentry/sentry-dotnet#2315

Open

12 tasks

vaind force-pushed the feat/eventpipe-tracelog-streaming branch from 3140653 to 031ce2d Compare May 19, 2023 17:57

vaind mentioned this pull request May 22, 2023

profiling streaming TraceLog getsentry/sentry-dotnet#2385

Merged

vaind force-pushed the feat/eventpipe-tracelog-streaming branch from 927865d to 8a3ac13 Compare May 24, 2023 07:06

vaind force-pushed the feat/eventpipe-tracelog-streaming branch from 8a3ac13 to 0aeb827 Compare May 30, 2023 20:19

vaind marked this pull request as ready for review May 31, 2023 15:20

brianrob reviewed Jun 6, 2023

View reviewed changes

vaind commented Jun 7, 2023

View reviewed changes

rapetum228 reviewed Jun 30, 2023

View reviewed changes

src/TraceEvent/TraceEvent.Tests/Parsing/EventPipeParsing.cs Outdated Show resolved Hide resolved

bruno-garcia reviewed Jul 24, 2023

View reviewed changes

src/TraceEvent/TraceEvent.Tests/Parsing/EventPipeParsing.cs Outdated Show resolved Hide resolved

bruno-garcia reviewed Jul 24, 2023

View reviewed changes

src/TraceEvent/TraceEvent.Tests/Parsing/EventPipeParsing.cs Outdated Show resolved Hide resolved

brianrob reviewed Jul 25, 2023

View reviewed changes

src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved

brianrob reviewed Jul 25, 2023

View reviewed changes

src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved

brianrob reviewed Jul 25, 2023

View reviewed changes

src/TraceEvent/TraceLog.cs Show resolved Hide resolved

brianrob reviewed Jul 25, 2023

View reviewed changes

src/TraceEvent/TraceLog.cs Show resolved Hide resolved

roll back debug assert change in tracelog

b2bb4e5

brianrob reviewed Apr 1, 2024

View reviewed changes

src/PerfView.Tests/StackViewer/StackWindowTests.cs Outdated Show resolved Hide resolved

src/TraceEvent/TraceLog.cs Show resolved Hide resolved

src/TraceEvent/TraceLog.cs Outdated Show resolved Hide resolved

noahfalk previously approved these changes Apr 2, 2024

View reviewed changes

vaind dismissed noahfalk’s stale review via 418dd93 May 23, 2024 08:23

vaind added 4 commits May 23, 2024 11:06

fix: clean up termporary data structures for realtime eventpipe source

36d2e2c

Merge remote-tracking branch 'origin/main' into feat/eventpipe-tracel…

69a09db

…og-streaming

chore: roll back some changes

fee63f9

docs

5153769

vaind force-pushed the feat/eventpipe-tracelog-streaming branch from 418dd93 to 5153769 Compare May 23, 2024 10:47

Merge branch 'main' into feat/eventpipe-tracelog-streaming

44c8fee

brianrob reviewed Jun 12, 2024

View reviewed changes

src/TraceEvent/TraceEvent.csproj Outdated Show resolved Hide resolved

vaind and others added 3 commits June 27, 2024 08:47

set NuspecProperties

2270d69

Merge branch 'main' into feat/eventpipe-tracelog-streaming

d161492

fix traceevent.csproj

e21e9c7

brianrob approved these changes Jun 28, 2024

View reviewed changes

brianrob merged commit c847ec9 into microsoft:main Jun 29, 2024
5 checks passed

vaind deleted the feat/eventpipe-tracelog-streaming branch July 10, 2024 08:37

brianrob mentioned this pull request Jul 14, 2024

TraceEvent: Getting allocation stacks on Linux #2057

Closed

vaind mentioned this pull request Jul 16, 2024

deps: update perfview getsentry/sentry-dotnet#3492

Merged

mdh1418 mentioned this pull request Oct 28, 2024

Bump TraceEvent package version dotnet/runtime#109192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TraceLog streaming/in-memory EventPipe support #1867

TraceLog streaming/in-memory EventPipe support #1867

vaind commented May 4, 2023

vaind commented May 8, 2023

brianrob commented May 8, 2023

vaind commented May 9, 2023

vaind commented May 11, 2023

vaind commented May 19, 2023

vaind commented May 24, 2023

brianrob commented May 25, 2023

brianrob commented May 25, 2023

vaind commented May 25, 2023

davmason commented May 31, 2023

brianrob commented May 31, 2023

vaind commented May 31, 2023

vaind commented May 31, 2023

brianrob left a comment

vaind left a comment

vaind commented Jul 6, 2023

bruno-garcia commented Jul 24, 2023

vaind commented Mar 20, 2024 •

edited

Loading

brianrob left a comment

noahfalk left a comment

brianrob commented Apr 3, 2024

bruno-garcia commented Apr 5, 2024

vaind commented May 23, 2024

vaind commented May 23, 2024 •

edited

Loading

bruno-garcia commented Jun 7, 2024

ericsampson commented Jun 10, 2024

brianrob left a comment

brianrob commented Jun 12, 2024

vaind commented Jun 27, 2024

brianrob commented Jun 28, 2024

brianrob left a comment

TraceLog streaming/in-memory EventPipe support #1867

TraceLog streaming/in-memory EventPipe support #1867

Conversation

vaind commented May 4, 2023

vaind commented May 8, 2023

brianrob commented May 8, 2023

vaind commented May 9, 2023

vaind commented May 11, 2023

vaind commented May 19, 2023

vaind commented May 24, 2023

brianrob commented May 25, 2023

brianrob commented May 25, 2023

vaind commented May 25, 2023

davmason commented May 31, 2023

brianrob commented May 31, 2023

vaind commented May 31, 2023

vaind commented May 31, 2023

brianrob left a comment

Choose a reason for hiding this comment

vaind left a comment

Choose a reason for hiding this comment

vaind commented Jul 6, 2023

bruno-garcia commented Jul 24, 2023

vaind commented Mar 20, 2024 • edited Loading

brianrob left a comment

Choose a reason for hiding this comment

noahfalk left a comment

Choose a reason for hiding this comment

brianrob commented Apr 3, 2024

bruno-garcia commented Apr 5, 2024

vaind commented May 23, 2024

vaind commented May 23, 2024 • edited Loading

bruno-garcia commented Jun 7, 2024

ericsampson commented Jun 10, 2024

brianrob left a comment

Choose a reason for hiding this comment

brianrob commented Jun 12, 2024

vaind commented Jun 27, 2024

brianrob commented Jun 28, 2024

brianrob left a comment

Choose a reason for hiding this comment

vaind commented Mar 20, 2024 •

edited

Loading

vaind commented May 23, 2024 •

edited

Loading