Improve Eventing Fundamentals #45518

josalem · 2020-12-03T02:06:02Z

This issue is tracking investigation and potential work on improving eventing fundamentals in .NET.

User Statement: As a user, I should be able to easily enable and collect profiling and event data from my .NET application with minimal overhead.

Make C implementation of EventPipe the default (Switch to C implementation of EventPipe #46079) (Use C EventPipe implementation by default #47665)
Aligned reader causes latency in dispatching events live (now tracked in Aligned reading strategy in StreamReaderWriter causes event latency for EventPipe microsoft/perfview#1447)
- Turn DiagnosticPort tests back on (TraceEvent issue is breaking test) (#44072). A bandaid fix would be to turn on the sample profiler so that data continues to flow and we don't get broken by an unaligned block size.
- ~~Fix issue in TraceEvent~~ OOB fix in trace event, see: Aligned reading strategy in StreamReaderWriter causes event latency for EventPipe microsoft/perfview#1447
Improve EventPipe high core count CPU scalability #Support EventPipe high core count CPU scalability diagnostics#1412
Fix libcoreclr.so!EventPipeInternal::GetNextEvent high CPU use (libcoreclr.so!EventPipeInternal::GetNextEvent high CPU use #43985)

Potential future work (None are considered committed work or in scope for 6):

Fix Deadlock in EventPipe when reader tries to take EventPipe config lock while tracing self EventPipe (Deadlock in EventPipe when reader tries to take EventPipe config lock while tracing self #1892)
Fix EventCounter events not filtered by EventSource/EventListener relationship (EventCounter events not filtered by EventSource/EventListener relationship #31927)
Fix assert in gcstress-extra CI (Assertion failed in test 'Loader\\binding\\tracing\\BinderTracingTest.Basic\\BinderTracingTest.Basic.cmd' #47698)
Reduce or remove impact of Rundown when collecting traces via EventPipe
- Currently, users must collect a sequence of events called Rundown that contain information like loaded modules and IP -> Method Name mappings
- Rundown requires indexing the code manager table to get symbols for all IPs. This prevents the JIT from... JIT-ing.
- Sufficiently large processes (by method/type count) can take long amounts of time to send Rundown.
- We have historically run into issues with self-tracing processes if the act of reading Rundown events causes something to JIT.
Quantify limits of EventPipe throughput in resource constrained environments
- We don't currently document any limits or expected throughput of EventPipe. While we may choose not to document values like this, we should still have remedial advice for situations where events are being dropped.
Quantify overhead of CPU sample profiling via EventPipe
Determine if we can mitigate safe-point bias in SampleProfiler
- Samples are currently collected by suspending the runtime using the same infrastructure as the GC. This means that suspension on each thread will defer to a "safe point" in the code for the GC--typically this will be on method return. This will bias sampling towards the second-most leaf frame in the stack. For example, if method A calls method B which calls method C in a tight loop and method C does not have any potential safe points in it, then samples will be biased to show stacks containing A->B. The suspend could have happened in C, but the nearest safe point ends up being on its return.
Determine if it is possible to include mixed-mode stacks in SampleProfiler
- Currently, SampleProfiler is only capable of sending managed stacks without any native frames.
Allow configuring SampleProfiler frequency
- SampleProfiler defaults to a 1ms sample rate. Some applications of profiling data don't require this level of resolution and could benefit from the reduced throughput of a lower sample frequency.

CC @tommcdon @sywhang @noahfalk @shirhatti

The text was updated successfully, but these errors were encountered:

ghost · 2020-12-03T02:06:04Z

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

This issue is tracking investigation and potential work on improving eventing fundamentals in .NET.

User Statement: As a user, I should be able to easily enable and collect profiling and event data from my .NET application with minimal overhead.

Potential work (None are considered committed work):

Reduce or remove impact of Rundown when collecting traces via EventPipe
- Currently, users must collect a sequence of events called Rundown that contain information like loaded modules and IP -> Method Name mappings
- Rundown requires indexing the code manager table to get symbols for all IPs. This prevents the JIT from... JIT-ing.
- Sufficiently large processes (by method/type count) can take long amounts of time to send Rundown.
- We have historically run into issues with self-tracing processes if the act of reading Rundown events causes something to JIT.
Quantify limits of EventPipe throughput in resource constrained environments
- We don't currently document any limits or expected throughput of EventPipe. While we may choose not to document values like this, we should still have remedial advice for situations where events are being dropped.
Quantify overhead of CPU sample profiling via EventPipe
Mitigate safe-point bias in SampleProfiler
- Samples are currently collected by suspending the runtime using the same infrastructure as the GC. This means that suspension on each thread will defer to a "safe point" in the code for the GC--typically this will be on method return. This will bias sampling towards the second-most leaf frame in the stack. For example, if method A calls method B which calls method C in a tight loop and method C does not have any potential safe points in it, then samples will be biased to show stacks containing A->B. The suspend could have happened in C, but the nearest safe point ends up being on its return.
Determine if it is possible to include mixed-mode stacks in SampleProfiler
- Currently, SampleProfiler is only capable of sending managed stacks without any native frames.
Allow configuring SampleProfiler frequency
- SampleProfiler defaults to a 1ms sample rate. Some applications of profiling data don't require this level of resolution and could benefit from the reduced throughput of a lower sample frequency.

CC @tommcdon @sywhang @noahfalk @shirhatti

Author:	josalem
Assignees:	-
Labels:	`area-Diagnostics-coreclr`
Milestone:	-

josalem · 2021-08-06T21:27:02Z

I'm going to move this back to 6, close it, and open a new one for 7 to prevent confusion over what work goes where.

josalem added the area-Diagnostics-coreclr label Dec 3, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 3, 2020

josalem removed the untriaged New issue has not been triaged by the area owner label Dec 3, 2020

tommcdon added this to the 6.0.0 milestone Dec 3, 2020

tommcdon added the User Story A single user-facing feature. Can be grouped under an epic. label Dec 9, 2020

This was referenced Dec 10, 2020

Slow AssemblyLoadContext.StartAssemblyLoad() at startup #45466

Open

Does locking for EventSource.DoCommand need to be a global lock? #45059

Open

FinalizeObject event sends a BulkType event for every finalized object #39887

Open

tommcdon added the Bottom Up Work Not part of a theme, epic, or user story label Dec 10, 2020

josalem mentioned this issue Feb 18, 2021

Prevent unbounded lock holds in BufferManager of EventPipe #48435

Merged

sywhang mentioned this issue Feb 18, 2021

Future of this project djluck/prometheus-net.DotNetRuntime#36

Open

shirhatti mentioned this issue Mar 11, 2021

.NET applications are observable and easily diagnosable dotnet/core#5929

Closed

18 tasks

This was referenced May 24, 2021

What's new in .NET 6 Preview 4 dotnet/core#6098

Closed

Reduce lock holds during EP session disable #53395

Closed

tommcdon modified the milestones: 6.0.0, 7.0.0 Jun 28, 2021

josalem modified the milestones: 7.0.0, 6.0.0 Aug 6, 2021

josalem closed this as completed Aug 6, 2021

ghost locked as resolved and limited conversation to collaborators Sep 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Eventing Fundamentals #45518

Improve Eventing Fundamentals #45518

josalem commented Dec 3, 2020 •

edited by tommcdon

Loading

ghost commented Dec 3, 2020

josalem commented Aug 6, 2021

Improve Eventing Fundamentals #45518

Improve Eventing Fundamentals #45518

Comments

josalem commented Dec 3, 2020 • edited by tommcdon Loading

ghost commented Dec 3, 2020

josalem commented Aug 6, 2021

josalem commented Dec 3, 2020 •

edited by tommcdon

Loading