-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increased CPU and memory overhead while profiling #3199
Comments
cc @vaind Could we have some details about what the app is doing? With a repro we can look exactly what's going on and make improvements but with the currently provided information it's hard for us to figure out where the problem is. Is this a web server? How many requests in parallel are running? If this has high thrughput, changing sample rate to:
Does that reduce the CPU overhead considerably? |
@bruno-garcia I tried to put sample rates to 0.001 and there is still quite a significant spike (from 7 % CPU -> 40 % CPU) I have mentioned before that we are using hang fire to queue background jobs and I have noticed call like that: Sentry.AspNetCore.SentryTracingMiddleware.InvokeAsync(HttpContext context). But when it is a hangfire background job HttpContext will be null, so that might somehow explain what is going on |
This didn't come up during our tests so I'm a bit surprised but I wonder if it might be the overhead of the .NET profiler itself which unfortunately has a high frequency and we down sample after the fact.
I wonder if this is to get the Do you get the overhead without Profiling turned on? Just keep transactions ( By any chance in the profiles in Sentry, do you see any frames from Sentry that seem to be taking significant part time? |
Thanks @bruno-garcia. Without profiling it works like a charm without any noticable overhead. |
sorry for the delay here. any chance you can get a repro to help us debug this? |
I've tried reproducing this locally with our aspnetcore sample and I've launched the example with Some reproducable example would be of great help here. Or any more details on what is actually happening. Maybe you can run |
Thanks @vaind. I will try to reproduce it on some minimum functionality branch - maybe will be able to catch it ... |
I just ran into this issue also. Last night I installed the Sentry profiler. With profiling enabled at 0.2 with tracing at 0.2 my app which normally runs at 5-10% CPU was redlining at 100% to the point where HTTP requests were timing out and I was getting 502 gateway errors and health check alerts. I disabled the profiler (keeping only tracing) and things returned to normal. There's definitely something up with the .NET Core profiler. My app is a web app, .NET Core 8, hosted on Azure, uses standard MVC + some SignalR for live data feeds. Backed by MSSQL. In the above image you can pinpoint where I deployed the profiler, and also where I just turned it off. Here's my Sentry config: // Sentry monitoring
builder.WebHost.UseSentry((ctx, sentryConfig) =>
{
sentryConfig.Dsn = "<REDACTED>";
// Don't need debug messages
sentryConfig.Debug = false;
// Set TracesSampleRate to 1.0 to capture 100% of transactions for performance monitoring. We recommend adjusting this value in production
sentryConfig.TracesSampleRate = 0.2;
// Sample rate for profiling, applied on top of the TracesSampleRate e.g. 0.2 means we want to profile 20 % of the captured transactions. We recommend adjusting this value in production.
//sentryConfig.ProfilesSampleRate = 0.2;
// Note: By default, the profiler is initialized asynchronously. This can be tuned by passing a desired initialization timeout to the constructor.
//sentryConfig.AddIntegration(new ProfilingIntegration(
// During startup, wait up to 500ms to profile the app startup code. This could make launching the app a bit slower so comment it out if you prefer profiling to start asynchronously.
//TimeSpan.FromMilliseconds(500)
//));
}); |
No change disabling all |
Thanks for the continuous trying and updates @haneytron. |
I've hacked my app down to literally the bare bones, a single route that just shows a 404 page, and the issues persist. Here are some interesting graphs from today: GraphsCPU - you can see pretty clearly where I first fired it up, and also where I disabled profiling, then later re-enabled it. Memory - about double the usual working set when profiling is active. Gen 0, 1, 2 GC are not exciting or out of the ordinary. Server Errors: Ignore the initial spike, I had things configured incorrectly. 0 errors all day. Total (HTTP) requests - again not exciting. Not a prod site instance, only traffic is me randomly playing. Thread Count: not exciting, fairly stable. Average Response Time - VERY exciting!Consistently low with profiling off, 30+ second bursts when on. What's NextTwo things next. 1. I'm gonna enable it on my prod site again, right now.I'll leave it until it explodes or 24 hours goes by, and we'll see if things are working properly and well OR the issue actually remains and these graphs from Azure Portal are garbage. Honestly, they might be, perf counters are funky at best in my experience. I'll report back tomorrow with the results! 2. You can try replicating my setupYou could also try to replicate my app and hosting env: .NET Core 8 (MVC Web App) Here is the entirely of my current using CodeSession;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Sentry.Profiling;
using System;
var builder = WebApplication.CreateBuilder(args);
var isDev = builder.Environment.IsDevelopment();
builder.Host.ConfigureServices((ctx, services) =>
{
// Enable MVC
services.AddControllersWithViews(options =>
{
options.SuppressAsyncSuffixInActionNames = true;
options.EnableEndpointRouting = true;
});
if (!isDev)
{
services.AddHttpsRedirection(options =>
{
// Prevents an error that occurs in the TRACE log and reports to Azure infrequently, seemingly at app startup
options.HttpsPort = 443;
});
}
});
// Sentry monitoring
builder.WebHost.UseSentry((ctx, sentryConfig) =>
{
// For Sentry config later on
var sentryOptions = new SentryOptions();
ctx.Configuration.GetSection("SentrySettings").Bind(sentryOptions);
sentryConfig.Dsn = sentryOptions.Dsn;
// Don't need debug messages
sentryConfig.Debug = false;
// Set TracesSampleRate to 1.0 to capture 100% of transactions for performance monitoring. We recommend adjusting this value in production
sentryConfig.TracesSampleRate = 0.2;
// Sample rate for profiling, applied on top of the TracesSampleRate e.g. 0.2 means we want to profile 20 % of the captured transactions. We recommend adjusting this value in production.
sentryConfig.ProfilesSampleRate = 0.2;
// Note: By default, the profiler is initialized asynchronously. This can be tuned by passing a desired initialization timeout to the constructor.
sentryConfig.AddIntegration(new ProfilingIntegration(
// During startup, wait up to 500ms to profile the app startup code. This could make launching the app a bit slower so comment it out if you prefer profiling to start asynchronously.
TimeSpan.FromMilliseconds(500)
));
});
// Build it
var app = builder.Build();
// Redirect from HTTP to HTTPS
app.UseHttpsRedirection();
app.UseRouting();
// Default to 404
app.MapFallbackToController(action: "NotFound", controller: "Error");
// Run it
await app.RunAsync(); |
Welp, bad news this AM. Multiple 502 and 503 errors (with NOTHING logged to console, app logs, or HTTP logs in Azure in terms of details), and multiple app restarts (as a result of my health checks I think) as well. 500 errors, starting when I deployed last night just after my last update: Here are GSheets of the Console and HTTP logs for last 24 hours, limited to last 1000 logs, and all PII scrubbed: https://docs.google.com/spreadsheets/d/1dqrHKF8DrBACtgoaQYxb7H59-8FO7SQhdYfVHogaZNI/edit#gid=940056293 Note that all of the 503, 502, 499s happen with the SignalR endpoints (/session/live/* routes) so my best guess is SignalR / WebSocket usage is an issue with the profiler. Let me know how I can be of further help. Disabling profiling for now. |
Thx for all the updates. Any chance you could try running dotnet-trace profiler (with Sentry builtin profiling disabled at the time) on your app and see how that influences cpu usage? https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace#collect-a-trace-with-dotnet-trace |
@vaind ran a trace for almost 3 hours, ending just now:
(output to dev/null to avoid file storage issues and I/O penalties) Result in CPU was high, but also profiling @ 100%: Response times remained good (the longer ones are incorrect measuring on long-lived WebSockets SignalR conns): No 5xx or 499 thrown during this period: Average memory working set was constant w/ profiler active (no 2x jump like with the Sentry profiler): Console / App / HTTP logs are clean. No app service restarts since I turned off the Sentry profiler yesterday afternoon. I suspect SignalR / WebSockets issues w/ the Sentry profiler remain. |
I've identified at least one memory leak in perfview EventPipe processing where it calls |
TLDR: not resolved yet I'll post a PR decreasing the circular buffer we use when setting up a diagnostic session but other than that, the main problem seems to be with object pinning in perfview and changing that would require changing quite a few things. In short, |
FYI, I've raised an issue over at dotnet/runtime#105132 for the unmanaged memory growth |
@filipnavara has made a fix related to a leak. I've updated our fork to include the fix. We'll bump the submodule here and make a release. |
Woo! Thank you @vaind and @filipnavara I'm closing this issue as resolved. If you have any issues with our profiling offer, please raise a ticket. We'd like to make sure this is working properly to everyone! |
Package
Sentry.AspNetCore
.NET Flavor
.NET
.NET Version
.Net 7 / .Net 8
OS
Any (not platform specific)
SDK Version
4.1.2
Self-Hosted Sentry Version
No response
Steps to Reproduce
Enable profiling through the following setup:
Expected Result
The CPU overhead is about +5 to 10%.
Actual Result
The CPU overhead is around around +70%, which is seen through google cloud monitoring system.
The following Sentry packages are used:
Sentry 4.1.2
Sentry.AspNetCore 4.1.2
Sentry.Profiling 4.1.2
Slack conversation
Zendesk ticket
The text was updated successfully, but these errors were encountered: