Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occurred while writing to logger(s) - Safe handle has been closed #3269

Closed
p10tyr opened this issue May 11, 2022 · 13 comments · Fixed by #3291
Closed

An error occurred while writing to logger(s) - Safe handle has been closed #3269

p10tyr opened this issue May 11, 2022 · 13 comments · Fixed by #3291
Labels
bug Something isn't working

Comments

@p10tyr
Copy link

p10tyr commented May 11, 2022

Bug Report

List of all OpenTelemetry NuGet
packages
and version that you are
using (e.g. OpenTelemetry 1.0.2):

-package id="OpenTelemetry" version="1.2.0" targetFramework="net472" />
-package id="OpenTelemetry.Api" version="1.2.0" targetFramework="net472" />
-package id="OpenTelemetry.Contrib.Extensions.AWSXRay" version="1.0.1" targetFramework="net472" />
-package id="OpenTelemetry.Contrib.Instrumentation.AWS" version="1.0.1" targetFramework="net472" />
-package id="OpenTelemetry.Exporter.OpenTelemetryProtocol" version="1.2.0" targetFramework="net472" />
-package id="OpenTelemetry.Exporter.OpenTelemetryProtocol.Logs" version="1.0.0-rc9.2" targetFramework="net472" />
-package id="OpenTelemetry.Instrumentation.AspNet" version="1.0.0-rc9.2" targetFramework="net472" />
-package id="OpenTelemetry.Instrumentation.AspNet.TelemetryHttpModule" version="1.0.0-rc9.2" targetFramework="net472" />
-package id="OpenTelemetry.Instrumentation.Http" version="1.0.0-rc9.2" targetFramework="net472" />
-package id="OpenTelemetry.Instrumentation.SqlClient" version="1.0.0-rc9.2" targetFramework="net472" />

Runtime version (e.g. net462, net48, netcoreapp3.1, net6.0 etc. You can
find this information from the *.csproj file):

NET 4.7.2 - WebAPI - IIS

Symptom

A clear and concise description of what the bug is.

Random requests dying with 500 error and a lot of errors logged with
Funnily enough the errors are coming from the OTLP Logger but verfied in our other APM

An error occurred while writing to logger(s) - Safe handle has been closed

System.AggregateException: An error occurred while writing to logger(s). 
---> System.ObjectDisposedException: Safe handle has been closed\r\n   
at System.Runtime.InteropServices.SafeHandle.DangerousAddRef(Boolean& success)\r\n   
at System.StubHelpers.StubHelpers.SafeHandleAddRef(SafeHandle pHandle, Boolean& success)\r\n   
at Microsoft.Win32.Win32Native.SetEvent(SafeWaitHandle handle)\r\n   
at System.Threading.EventWaitHandle.Set()\r\n   
at OpenTelemetry.BatchExportProcessor`1.OnExport(T data)\r\n   
at OpenTelemetry.BatchLogRecordExportProcessor.OnEnd(LogRecord data)\r\n   
at OpenTelemetry.Logs.OpenTelemetryLogger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter)\r\n  
 at Microsoft.Extensions.Logging.Logger.<Log>g__LoggerLog|12_0[TState](LogLevel logLevel, EventId eventId, ILogger logger, Exception exception, Func`3 formatter, List`1& exceptions, TState& state)\r\n   --- End of inner exception stack trace ---\r\n  
 at Microsoft.Extensions.Logging.Logger.ThrowLoggingError(List`1 e

What is the expected behavior?

Nothing.. just works

What is the actual behavior?

ERROR 500 on random requests

Reproduce

Not possible? Very flakey issue
We are using Tracing Exporter BatchExporter for a few months and that is OK
Using the Logs BatchExporter, clearly can see an issue there

Not sure if its a STATE that we are passing in?

  • NULL state?
  • maybe state is disposed?
  • maybe logger is disposed inside a using
  • maybe state is not in a using but inside another using that got disposed after getting to the logger??

We will close this issue if:

  • The repro project you share with us is complex. We can't investigate custom
    projects, so don't point us to such, please.
  • If we can not reproduce the behavior you're reporting.

Additional Context

Add any other context about the problem here.

@p10tyr p10tyr added the bug Something isn't working label May 11, 2022
@cijothomas
Copy link
Member

#2905 same issue I think.

@p10tyr
Copy link
Author

p10tyr commented May 11, 2022

OK is a workaround for me to change this option to false.
options.IncludeScopes = false;

I do not really need scopes. I understand this package is not stable so I was very careful to not release to production this and caught it early days.

Worth maybe just updating the document on the seperate package page to say DO NOT USE SCOPES 😆

--edit
This document page
https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/docs/logs/getting-started/README.md

@cijothomas
Copy link
Member

cijothomas commented May 11, 2022

The linked issue is not caused by scopes. We are not aware of any issues with scopes.
Also OTLP LogExporter just recently added scope support, and is not yet released . https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Exporter.OpenTelemetryProtocol/CHANGELOG.md#unreleased

So I can't saw how your issue was solved by disabling scopes. Will investigate more... If you can get a repro - please do share it, as that'd super helpful.

@p10tyr
Copy link
Author

p10tyr commented May 11, 2022

I am trying to see if I can reproduce it locally. It usually only happens on our test servers.

Also the area that the stack trace reports for me as being disposed does not make sense because there is no usings or context of state there ... apart the use of Scope. I think this somehow is related. It back out again on our test servers I hope to not see any more errors for now.

@p10tyr
Copy link
Author

p10tyr commented May 12, 2022

PS-

I did not have ParseStateValues = true; I had it turned off the whole time (default)

So this safe handles does not relate to that? Just my opinion I don't know if internally it still tried to access the state even though its set to false.

I have checked with it on and off and can see the difference in Elastics with the state KV being save (true) and not being saved (false)

I have gone over all out logs and NO local machine ever throws this error. It only looks like our remote machines have some kind of issue here. I don't know if its related to infrastructure/ networking or something specific on the hosted machines.

We do get a lot of a Task has been cancelled because of requests ending sooner than we expected due to front end stuff

@reyang
Copy link
Member

reyang commented May 18, 2022

From the callstack, the LogProcessor is already disposed (either by the user code if the user is controlling the lifecycle of LoggerProvider, or by the hosting environment - e.g. ASP.NET or ASP.NET Core).

I can imagine there are cases where the user has less or even no control over the lifecycle of LoggerProvider, and it'll be better if we can avoid punishing the app by throwing exception from LogProcessor/SDK.

System.AggregateException: An error occurred while writing to logger(s). 
---> System.ObjectDisposedException: Safe handle has been closed\r\n   
at System.Runtime.InteropServices.SafeHandle.DangerousAddRef(Boolean& success)\r\n   
at System.StubHelpers.StubHelpers.SafeHandleAddRef(SafeHandle pHandle, Boolean& success)\r\n   
at Microsoft.Win32.Win32Native.SetEvent(SafeWaitHandle handle)\r\n   
at System.Threading.EventWaitHandle.Set()\r\n   
at OpenTelemetry.BatchExportProcessor`1.OnExport(T data)\r\n   
at OpenTelemetry.BatchLogRecordExportProcessor.OnEnd(LogRecord data)\r\n   
at OpenTelemetry.Logs.OpenTelemetryLogger.Log[TState](LogLevel logLevel, EventId eventId, TState state, Exception exception, Func`3 formatter)\r\n  
 at Microsoft.Extensions.Logging.Logger.<Log>g__LoggerLog|12_0[TState](LogLevel logLevel, EventId eventId, ILogger logger, Exception exception, Func`3 formatter, List`1& exceptions, TState& state)\r\n   --- End of inner exception stack trace ---\r\n  
 at Microsoft.Extensions.Logging.Logger.ThrowLoggingError(List`1 e

@p10tyr
Copy link
Author

p10tyr commented May 18, 2022

We are not doing anything explicitly with the LoggerProviders. In this case we are using NET472 API project with logger factory created and attaching open telemetry to that as per documentation.

As mentioned. We get a lot of requests cancelled but the problem here is that random request now just get completely bricked because this error is coming up.
I can not find any kind of pattern as to why and it is quite possible that our CaslteDI, maybe some crazy custom filters or something is disposing of something at the wrong time.. but until then we can not use OTLP Logging and ASPNETCORE upgrade is already delayed 2 years 😢

OTLP Tracing works fine for us.

@reyang
Copy link
Member

reyang commented May 19, 2022

I feel this is something that OpenTelemetry .NET should provide a workaround. I've seen many places where ILogger gets used after the LoggerProvider is disposed.

@p10tyr
Copy link
Author

p10tyr commented May 19, 2022

I wonder if anybody else is having this issue?
Is it possible not so many people using this yet because its not officially in the main package, so there's no user testing?

It would be amazing to get a fix out for this as I am waiting to turn this on in production. Unfortunately I cant use myget feeds so it would either need me to compile it or have a fix in *beta

@cijothomas
Copy link
Member

#2905 same issue I think.

^^ this was incorrect. This is a separate issue!

@reyang
Copy link
Member

reyang commented May 19, 2022

As mentioned. We get a lot of requests cancelled but the problem here is that random request now just get completely bricked because this error is coming up. I can not find any kind of pattern as to why and it is quite possible that our CaslteDI, maybe some crazy custom filters or something is disposing of something at the wrong time..

@ppumkin #3291 should help to mitigate the issue you're seeing. Meanwhile I hope to get to the bottom of this so we understand whether it was due to the misuse of the DI, or a bug in the underlying framework. Would you be willing to collect more information?

Here is my thinking - add a line of code to

protected override void Dispose(bool disposing)
, record the full callstack when the processor is being disposed, when we see the ObjectDisposedException, take a dump and see which code path disposed the processor. In this way we can understand if the race condition was a framework flaw or user mistake.

@p10tyr
Copy link
Author

p10tyr commented May 20, 2022

@reyang I more than happy to help.
But I would need a little of guidance on what i need to do.. what makes it a bit more complicated it only happens on remote servers but I have admin access to the tests ones so I can install and dump .. just need instructions (never really done that before)

@p10tyr
Copy link
Author

p10tyr commented Jul 6, 2022

Just wanted to come back and say been running in test across several environments for 4 weeks now. No more issues of this.
Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants