-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in libcoreclr.so #10856
Comments
If I execute the program with env var COMPlus_GCStress=0xC, it crashes at startup:
Note that the external dependencies we are using are GraphQL, AWSSDK, Dapper, Twilio, Nlog, Npgsql, Jetbrains.Annotations - the rest is Microsoft. |
Could you please try to collect this info:
|
Does this help?
This appears to be happening the first time my code calls into EFCore, and the exception is occurring as part of the initial npgsql database info loading procedure. There are some unsafe parts to the NpgsqlReadBuffer class: https://github.com/npgsql/npgsql/blob/2ac23b7520aa53ab67e6e0192dbe3379a71bb8b9/src/Npgsql/NpgsqlReadBuffer.cs#L341 .... |
Yes, this helps. We are trying to find type for Are you able to tell why |
Here is the disassembly. It looks like it's dereferencing a null pointer. Note that it's from a different process - but with the same backtrace. LLDB crashed while I was investigating the last one. |
It seems to accessing the current The problem is that we should have found the type inside It is a bug, however it is unlikely to be the root cause for the crash at the top this issue. |
I have narrowed down the original issue I reported. In my environment, this program will reliably eventually segfault if left running long enough. For example:
I'm still trying to remove bits from the program until I can get to a minimal example, but you should hopefully be able to reproduce the issue with the provided repository, using the following commands:
|
Thanks for the repro @corruptmem . I was able to narrow down the root cause for the failure. The problem is bad GC info generated by the JIT for assignments of large structs on Unix. The Repro: using System;
using System.Threading;
using System.Runtime.CompilerServices;
struct S1
{
public long a,b;
public object str;
}
struct S2
{
public long a,b,c,d,e;
}
struct S
{
public S1 s1;
public S2 s2;
}
class Test
{
class E
{
public S a;
public S b;
[MethodImpl(MethodImplOptions.NoInlining)]
public void Update(int c)
{
for (int i = 0; i < c; i++) {
a = b; // GCInfo generated for this assignment is wrong
}
}
}
static void Main()
{
E e = new E();
e.Update(1);
}
} Compile with
Here are the JIT dumps. It is interesting that the problem does not repro on Windows: @dotnet/jit-contrib Could you please take a look? |
I have found the issue, will push a fix soon. |
I'm experiencing a segmentation fault on .NET Core 2 on Linux/docker, under versions 2.1.2 and 2.0.9, when my process is under heavy load and using a lot of memory.
I haven't tried to reproduce outside of Linux/docker, but I have reproduced on both AWS Kubernetes and my local Docker for Windows with identical stack traces.
Here are some details from capturing the segfault under the debugger:
I don't have the coredump from this particular crash (because it got OOM killed after calling VerifyHeap!), but most times it crashes, the top of the stack is identical to this (see attached file 4 for one of the times it didn't look like this).
Unfortunately I can't post the code because it's proprietary. We're working on trying to produce a minimal test case, but that's a lot easier said than done. However if you have somewhere secure and private to upload core dumps to, I could do that.
I have also attached information from the debugger from other crashes (the one above is file 3):
dumpinfo1.txt
dumpinfo2.txt
dumpinfo3.txt
dumpinfo4.txt
The environment I set up to reproduce the issue is here: https://github.com/ist-ltd/dotnet-core-debug-helper-tools - but it relies on a private docker image.
The text was updated successfully, but these errors were encountered: