-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture errors with startup hook auto instrumentation #1298
Conversation
This commit updates the startup hook auto instrumentation feature to capture thrown exceptions through the subscribed AspNetCoreDiagnosticListener. When an unhandled exception is thrown, ASP.NET Core's DeveloperExceptionPageMiddleware will raise a diagnostic event with key "Microsoft.AspNetCore.Diagnostics.UnhandledException" if there is a listener listening to the event. If no listener is subscribed, the event is not captured. Update how assemblies are loaded in the startup hook auto instrumentation. Register an event handler to the AssemblyLoadContext.Default.Resolve event to load assemblies that exist in the loader directory with matching version and public key. For assemblies starting with Elastic.Apm, that may contain DiagnosticListeners, load into AssemblyLoadContext.Default to allow listeners to register via DiagnosticListener.AllListeners.Subscribe(). For any other assemblies, load them into a separate AssemblyLoadContext to avoid version conflicts with assemblies that the application may reference. An example of the problem this avoids is System.Reflection.Metadata; The APM agent's source included version of Ben.Demystifier depends on System.Reflection.Metadata 5.0.0, but an application may reference a different version, such as ASP.NET Core 3.0, which references System.Reflection.Metadata 1.4.4.0 by default. Add integration tests for capturing errors from startup hooks for netcoreapp3.0, netcoreapp3.1 and net5.0. Update documentation for startup hooks to indicate that this feature is supported in .NET Core 3.0 and newer. .NET Core 2.2 is End of Life (EOL) and no longer supported by Microsoft. In testing startup hooks with .NET Core 2.2, Diagnostic Listeners do not subscribe to ASP.NET Core diagnostic events. On initial investigation, the System.Diagnostics.DiagnosticSource assembly loaded in the startup hooks is version 4.0.3.1. When listeners come to subscribe, System.Diagnostics.DiagnosticSource 4.0.4.0 that listeners are compiled against is loaded into the separate AssemblyLoadContext, which might be why listeners do not end up subscribing. Considering .NET Core 2.2 is EOL however, no further effort has been put into making this work. Fixes elastic#1233
💔 Tests Failed
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪Test errors
Expand to view the tests failures> Show only the first 10 test failures
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure this fixes #1233 - or changes anything around that.
I did some tests to reproduce the original problem (no success so far) and documented my findings here: #1233 (comment).
If you add the new test Auto_Instrument_With_StartupHook_Should_Capture_Error
(with throw new Exception("...");
to Elastic.Apm.StartupHook.Sample
) on master
, it'll be green. With my testing today and this fact I think the original problem is more like an edge case.
I think the main change here is that we load dependencies into an extra AssemblyLoadContext
- I think that does not change how the ASP.NET Core diagnostic source events are sent once we already subscribed (and subscribing worked before as well).
I added some comments regarding this change below.
@@ -37,23 +37,9 @@ private static string AssemblyDirectory | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static string AssemblyDirectory
can be removed as well
@@ -20,7 +20,7 @@ | |||
namespace Elastic.Apm.StartupHook.Loader | |||
{ | |||
/// <summary> | |||
/// Loads the agent assemblies, its dependent assemblies and starts it | |||
/// Starts the agent | |||
/// </summary> | |||
internal class Loader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change the name of the type and the whole project isn't very accurate - it's not really loading things anymore, it only subscribes. It's not a big deal, and I don't have super strong opinion, but wanted to note this. Maybe we could consider renaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading the loader assembly and invoking the Initialize()
method causes the Elastic.Apm.* assemblies it references to be loaded, though this is implicit now rather than being explicitly loaded. Thoughts on a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on a better name?
Not really - and you have a good point, it still triggers the agent dlls to be loaded. Let's leave it for now.
{ | ||
// load Elastic.Apm assemblies with the default assembly load context, to allow DiagnosticListeners to subscribe. | ||
// For all other dependencies, load with a separate load context to not conflict with application dependencies. | ||
return name.Name.StartsWith("Elastic.Apm", StringComparison.Ordinal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we totally sure about this? 🤔
This means if we have let's say Elstic.Apm.X
which depends on assembly Y
, we load Elastic.Apm.X
into the default context, but load Y
into the custom context - in this case Elastic.Apm.X
would have missing typerefs in the default load context.
This seems to run and with our current references it does not seem to have any problems, but this looks a bit strange to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two problems that this is attempting to address:
- The Elastic.Apm assemblies that contain Diagnostic Listeners must be loaded into the default load context to be able to subscribe to
DiagnosticListeners.AllListeners
and observe diagnostic events - subscription will not occur if loaded into a different load context. - Dependencies of Elastic.Apm assemblies cannot version conflict with dependencies of the application. In testing and debugging, the dependencies of the internalized Ben.Demystifier, namely, System.Reflection.Metadata, can very likely be a different version to the one referenced by the application, which was true for ASP.NET Core 3.0 if I recall.
Are there additional tests that can be written to provide more assurance in this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there additional tests that can be written to provide more assurance in this approach?
I did some manual testing and so far everything worked. Maybe later we could add more tests where:
- We make sure we also trigger stack trace capturing (e.g. making an outgoing HTTP call in the sample app)
- Where we have agent assemblies with additional dependencies (e.g. using elastisearch-net and make the app depend on a different version than the agent)
I think those will trigger code paths that are critical. I did those tests manually and those worked for me.
private static void LoadAssembliesFromLoaderDirectory(string loaderDirectory) | ||
{ | ||
var context = new ElasticApmAssemblyLoadContext(); | ||
AssemblyLoadContext.Default.Resolving += (_, name) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously when we had Newtonsoft.Json and Ben.Demystifier as normal dependencies, often I ran into stackoverflow exceptions with this. Seems working now, just noting it here.
When I first started looking at this, there were a couple of runs with ASP.NET Core 3.0 where an error was not captured. I don't recall being able to replicate, but I did find an issue with a |
We discussed this in a meeting - we agreed to merge this as it is. The Merging this in now. |
…u-20 * upstream/master: (21 commits) Prefer W3C traceparent over elastic-apm-traceparent (elastic#1302) fix spacing and cross references in docs (elastic#1328) Update README (elastic#1325) Mark MicrosoftAzureBlobStorageTracer internal (elastic#1326) Update docs (elastic#1327) Update context.destination.address (elastic#1324) synchronize json schema specs (elastic#1320) Don't package Elastic.Apm.Specification (elastic#1316) Update setup.asciidoc (elastic#1318) Prepare release v.1.10.0 (elastic#1314) Fix nullref in Elastic.Apm.Extensions.Logging (elastic#1311) Capture errors with startup hook auto instrumentation (elastic#1298) Use Logger to log exception in AgentComponents initialization (elastic#1305) fix: use .NET native SDK for build and test (elastic#1301) Skip running Elasticsearch docker test when docker not available (elastic#1312) Use TraceLogger as default logger in ASP.NET Full Framework (elastic#1288) Create receive messaging span when inside transaction (elastic#1308) Fix SanitizeFieldNamesTests (elastic#1299) Do not capture HTTP child spans for Elasticsearch (elastic#1306) Use storage account in destination.service.resource (elastic#1284) ...
This commit updates the startup hook auto instrumentation feature to capture thrown exceptions through the subscribed
AspNetCoreDiagnosticListener
.When an unhandled exception is thrown, ASP.NET Core's
DeveloperExceptionPageMiddleware
will raise a diagnosticevent with key "Microsoft.AspNetCore.Diagnostics.UnhandledException" if there is a listener listening to the event. If no listener is subscribed, the event is not captured.
Update how assemblies are loaded in the startup hook auto instrumentation. Register an event handler to the
AssemblyLoadContext.Default.Resolving
event to load assemblies that exist in the loader directory with matching version and public key. For assemblies starting with Elastic.Apm, that may contain DiagnosticListeners, load intoAssemblyLoadContext.Default
to allow listeners to register viaDiagnosticListener.AllListeners.Subscribe()
. For any other assemblies, load them into a separate AssemblyLoadContext to avoid version conflicts with assemblies that the application may reference. An example of the problem this avoids is System.Reflection.Metadata; The APM agent's source included version of Ben.Demystifier depends on System.Reflection.Metadata 5.0.0, but an application may reference a different version, such as ASP.NET Core 3.0, which references System.Reflection.Metadata 1.4.4.0 by default.Add integration tests for capturing errors from startup hooks for netcoreapp3.0, netcoreapp3.1 and net5.0.
Update documentation for startup hooks to indicate that this feature is supported in .NET Core 3.0 and newer. .NET Core 2.2 is End of Life (EOL) and no longer supported by Microsoft. In testing startup hooks with .NET Core 2.2, Diagnostic Listeners do not subscribe to ASP.NET Core diagnostic events. On initial investigation, the System.Diagnostics.DiagnosticSource assembly loaded in the startup hooks is version 4.0.3.1. When listeners come to subscribe, System.Diagnostics.DiagnosticSource 4.0.4.0 that listeners are compiled against is loaded into the separate AssemblyLoadContext, which might be why listeners do not end up subscribing. Considering .NET Core 2.2 is EOL however, no further effort has been put into making this work.
Fixes #1233