Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Eventing Fundamentals #45518

Closed
6 tasks done
Tracked by #5929
josalem opened this issue Dec 3, 2020 · 2 comments
Closed
6 tasks done
Tracked by #5929

Improve Eventing Fundamentals #45518

josalem opened this issue Dec 3, 2020 · 2 comments
Labels
area-Diagnostics-coreclr Bottom Up Work Not part of a theme, epic, or user story User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@josalem
Copy link
Contributor

josalem commented Dec 3, 2020

This issue is tracking investigation and potential work on improving eventing fundamentals in .NET.

User Statement: As a user, I should be able to easily enable and collect profiling and event data from my .NET application with minimal overhead.


Potential future work (None are considered committed work or in scope for 6):

  • Fix Deadlock in EventPipe when reader tries to take EventPipe config lock while tracing self EventPipe (Deadlock in EventPipe when reader tries to take EventPipe config lock while tracing self #1892)
  • Fix EventCounter events not filtered by EventSource/EventListener relationship (EventCounter events not filtered by EventSource/EventListener relationship #31927)
  • Fix assert in gcstress-extra CI (Assertion failed in test 'Loader\\binding\\tracing\\BinderTracingTest.Basic\\BinderTracingTest.Basic.cmd' #47698)
  • Reduce or remove impact of Rundown when collecting traces via EventPipe
    • Currently, users must collect a sequence of events called Rundown that contain information like loaded modules and IP -> Method Name mappings
    • Rundown requires indexing the code manager table to get symbols for all IPs. This prevents the JIT from... JIT-ing.
    • Sufficiently large processes (by method/type count) can take long amounts of time to send Rundown.
    • We have historically run into issues with self-tracing processes if the act of reading Rundown events causes something to JIT.
  • Quantify limits of EventPipe throughput in resource constrained environments
    • We don't currently document any limits or expected throughput of EventPipe. While we may choose not to document values like this, we should still have remedial advice for situations where events are being dropped.
  • Quantify overhead of CPU sample profiling via EventPipe
  • Determine if we can mitigate safe-point bias in SampleProfiler
    • Samples are currently collected by suspending the runtime using the same infrastructure as the GC. This means that suspension on each thread will defer to a "safe point" in the code for the GC--typically this will be on method return. This will bias sampling towards the second-most leaf frame in the stack. For example, if method A calls method B which calls method C in a tight loop and method C does not have any potential safe points in it, then samples will be biased to show stacks containing A->B. The suspend could have happened in C, but the nearest safe point ends up being on its return.
  • Determine if it is possible to include mixed-mode stacks in SampleProfiler
    • Currently, SampleProfiler is only capable of sending managed stacks without any native frames.
  • Allow configuring SampleProfiler frequency
    • SampleProfiler defaults to a 1ms sample rate. Some applications of profiling data don't require this level of resolution and could benefit from the reduced throughput of a lower sample frequency.

CC @tommcdon @sywhang @noahfalk @shirhatti

@ghost
Copy link

ghost commented Dec 3, 2020

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

This issue is tracking investigation and potential work on improving eventing fundamentals in .NET.

User Statement: As a user, I should be able to easily enable and collect profiling and event data from my .NET application with minimal overhead.

Potential work (None are considered committed work):

  • Reduce or remove impact of Rundown when collecting traces via EventPipe
    • Currently, users must collect a sequence of events called Rundown that contain information like loaded modules and IP -> Method Name mappings
    • Rundown requires indexing the code manager table to get symbols for all IPs. This prevents the JIT from... JIT-ing.
    • Sufficiently large processes (by method/type count) can take long amounts of time to send Rundown.
    • We have historically run into issues with self-tracing processes if the act of reading Rundown events causes something to JIT.
  • Quantify limits of EventPipe throughput in resource constrained environments
    • We don't currently document any limits or expected throughput of EventPipe. While we may choose not to document values like this, we should still have remedial advice for situations where events are being dropped.
  • Quantify overhead of CPU sample profiling via EventPipe
  • Mitigate safe-point bias in SampleProfiler
    • Samples are currently collected by suspending the runtime using the same infrastructure as the GC. This means that suspension on each thread will defer to a "safe point" in the code for the GC--typically this will be on method return. This will bias sampling towards the second-most leaf frame in the stack. For example, if method A calls method B which calls method C in a tight loop and method C does not have any potential safe points in it, then samples will be biased to show stacks containing A->B. The suspend could have happened in C, but the nearest safe point ends up being on its return.
  • Determine if it is possible to include mixed-mode stacks in SampleProfiler
    • Currently, SampleProfiler is only capable of sending managed stacks without any native frames.
  • Allow configuring SampleProfiler frequency
    • SampleProfiler defaults to a 1ms sample rate. Some applications of profiling data don't require this level of resolution and could benefit from the reduced throughput of a lower sample frequency.

CC @tommcdon @sywhang @noahfalk @shirhatti

Author: josalem
Assignees: -
Labels:

area-Diagnostics-coreclr

Milestone: -

@josalem
Copy link
Contributor Author

josalem commented Aug 6, 2021

I'm going to move this back to 6, close it, and open a new one for 7 to prevent confusion over what work goes where.

@josalem josalem modified the milestones: 7.0.0, 6.0.0 Aug 6, 2021
@josalem josalem closed this as completed Aug 6, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Diagnostics-coreclr Bottom Up Work Not part of a theme, epic, or user story User Story A single user-facing feature. Can be grouped under an epic.
Projects
None yet
Development

No branches or pull requests

3 participants