Parallel creation of `Mutex` with `initiallyOwned: true` can cause `SIGSEGV` on Ubuntu 19.04 #34271

jburger · 2020-03-30T08:06:19Z

Parallel creation of System.Threading.Mutex can fail when initiallyOwned is true, this causes a segmentation fault in libpthread.so which appears to be handled in libcoreclr.so but no managed exceptions are thrown, and the process fails, a SIGABRT is listed as the stop reason for thread #1.

Steps to reproduce

Create a netcoreapp3.1 console application with the following Program.cs code:

using System;
using System.Threading;
using System.Threading.Tasks;

namespace repro
{
    class Program
    {
        static void Main(string[] args)
        {
            try {
                var t = 10000;
                
                Parallel.For(1, t, (i) => {
                    CreateMutex("t" + i.ToString());
                    CreateMutex("t" + (i-1).ToString());
                });
            } catch (Exception exception) {
                Console.WriteLine(exception);
            }
            finally {
                Console.WriteLine("here");
            }
        }

        private static void CreateMutex(string name)
        {
            using (var mutex = new Mutex(true, name))
            {
                Console.WriteLine($"mutex: {name}");
            }
        }
    }
}

dotnet build -c Release
export COMPlus_DbgEnableMiniDump=1
run ./bin/Release/netcoreapp3.1/repro
Observe an exit code of 139
Change Mutex ctor to use initallyOwned: false and observe that the code works OK

Expected behaviour

A managed exception is thrown when invalid Mutex access is attempted.

lldb output

When analyzing the resulting coredump in lldb thread #1 shows something like this:

* thread #1, name = 'repro', stop reason = signal SIGABRT
  * frame #0: 0x00007f59550a191a libpthread.so.0`__waitpid(pid=17428, stat_loc=0x00007f59523c044c, options=0) at waitpid.c:30:10
    frame #1: 0x00007f59543900fd libcoreclr.so`PROCCreateCrashDump(argv=0x00007f595468e5a0) at process.cpp:3346:22
    frame #2: 0x00007f595435d95d libcoreclr.so`sigsegv_handler(int, siginfo_t*, void*) [inlined] invoke_previous_action(code=<unavailable>, siginfo=<unavailable>, context=<unavailable>, signalRestarts=true) at signal.cpp:304:5
    frame #3: 0x00007f595435d90f libcoreclr.so`sigsegv_handler(code=11, siginfo=0x00007f59523c0af0, context=0x00007f59523c09c0) at signal.cpp:501
    frame #4: 0x00007f59550a1f40 libpthread.so.0`___lldb_unnamed_symbol1$$libpthread.so.0 + 1
    frame #5: 0x00007f595509ae24 libpthread.so.0`__pthread_mutex_unlock_full(mutex=0x00007f595012a008, decr=1) at pthread_mutex_unlock.c:149:7
    frame #6: 0x00007f5954384ebe libcoreclr.so`NamedMutexProcessData::Close(bool, bool) [inlined] MutexHelpers::ReleaseLock(mutex=<unavailable>) at mutex.cpp:932:24
    frame #7: 0x00007f5954384eb6 libcoreclr.so`NamedMutexProcessData::Close(bool, bool) [inlined] NamedMutexProcessData::ActuallyReleaseLock() at mutex.cpp:1619
    frame #8: 0x00007f5954384e8b libcoreclr.so`NamedMutexProcessData::Close(bool, bool) [inlined] NamedMutexProcessData::Abandon() at mutex.cpp:1606
    frame #9: 0x00007f5954384e6e libcoreclr.so`NamedMutexProcessData::Close(this=0x00000000009d75f0, isAbruptShutdown=<unavailable>, releaseSharedData=true) at mutex.cpp:1294
    frame #10: 0x00007f59543821ff libcoreclr.so`SharedMemoryProcessDataHeader::Close(this=<unavailable>) at sharedmemory.cpp:927:17
    frame #11: 0x00007f5954381f8b libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(CorUnix::CPalThread*, CorUnix::IPalObject*, bool, bool) [inlined] SharedMemoryProcessDataHeader::~SharedMemoryProcessDataHeader(this=0x00000000009d7550) at sharedmemory.cpp:868:5
    frame #12: 0x00007f5954381f83 libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(CorUnix::CPalThread*, CorUnix::IPalObject*, bool, bool) [inlined] void CorUnix::InternalDelete<SharedMemoryProcessDataHeader>(p=0x00000000009d7550) at malloc.hpp:148
    frame #13: 0x00007f5954381f83 libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(CorUnix::CPalThread*, CorUnix::IPalObject*, bool, bool) [inlined] SharedMemoryProcessDataHeader::DecRefCount(this=0x00000000009d7550) at sharedmemory.cpp:1028
    frame #14: 0x00007f5954381f7e libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(thread=<unavailable>, object=<unavailable>, isShuttingDown=<unavailable>, cleanUpPalSharedState=<unavailable>) at sharedmemory.cpp:813
    frame #15: 0x00007f5954375c86 libcoreclr.so`CorUnix::CPalObjectBase::ReleaseReference(this=0x00000000008e60a0, pthr=0x00000000008578a0) at palobjbase.cpp:309:13
    frame #16: 0x00007f5954368787 libcoreclr.so`CorUnix::CSimpleHandleManager::FreeHandle(this=<unavailable>, pThread=0x00000000008578a0, h=<unavailable>) at handlemgr.cpp:253:15
    frame #17: 0x00007f59543681ce libcoreclr.so`::CloseHandle(HANDLE) [inlined] CorUnix::InternalCloseHandle(pThread=0x00000000008578a0, hObject=0x0000000000000498) at handleapi.cpp:312:38
    frame #18: 0x00007f5954368188 libcoreclr.so`::CloseHandle(hObject=0x0000000000000498) at handleapi.cpp:287

Seems like the coreclr SIGSEGV handler is being called, so I'm not sure why there is no managed exception.

OS Details

NAME="Ubuntu"
VERSION="19.04 (Disco Dingo)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 19.04"
VERSION_ID="19.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=disco
UBUNTU_CODENAME=disco

.NET details

.NET Core SDK (reflecting any global.json):
 Version:   3.1.201
 Commit:    b1768b4ae7

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  19.04
 OS Platform: Linux
 RID:         ubuntu.19.04-x64
 Base Path:   /usr/share/dotnet/sdk/3.1.201/

Host (useful for support):
  Version: 3.1.3
  Commit:  4a9f85e9f8

.NET Core SDKs installed:
  2.2.207 [/usr/share/dotnet/sdk]
  2.2.402 [/usr/share/dotnet/sdk]
  3.0.103 [/usr/share/dotnet/sdk]
  3.1.201 [/usr/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.0.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.0.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

Please let me know if there is any more information I can provide.

The text was updated successfully, but these errors were encountered:

janvorli · 2020-03-30T10:22:24Z

so I'm not sure why there is no managed exception.

SIGSEGV is converted into managed exception only if it happens in managed code or a couple of thin helpers that are executed on behalf of the managed code. At all other places, SIGSEGV represents a bug in the native runtime / external shared libraries and so we fail fast instead. Converting it to a managed exception would be dangerous as we don't know what state was the current thread in. For example, if the sigsegv happened in a code running under a lock, handling the sigsegv would soon result in a deadlock.

janvorli · 2020-03-30T10:25:49Z

I can confirm it repros on my Ubuntu 16.04 machine too.

danmoseley · 2020-03-30T20:12:41Z

Thanks @jburger for excellent clear bug report.

Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the `Mutex` class. - When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior - There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown - For this corner case I settled on what seems like a decent solution and compatible with older runtimes: - When a process releases its last reference to the mutex - If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex - If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases: - The mutex gets another reference from within the same process - The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore - The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior - There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize Fix for dotnet#34271 in master Closes dotnet#28449 - probably doesn't fix the issue, but trying to enable it to see if it continues to fail

Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the `Mutex` class. - When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior - There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown - For this corner case I settled on what seems like a decent solution and compatible with older runtimes: - When a process releases its last reference to the mutex - If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex - If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases: - The mutex gets another reference from within the same process - The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore - The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior - There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize Fixes dotnet/runtime#34271 in 3.1

Fix Unix named mutex crash during some race conditions Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the `Mutex` class. - When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior - There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown - For this corner case I settled on what seems like a decent solution and compatible with older runtimes: - When a process releases its last reference to the mutex - If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex - If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases: - The mutex gets another reference from within the same process - The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore - The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior - There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize Fix for #34271 in master Closes #28449 - probably doesn't fix the issue, but trying to enable it to see if it continues to fail

kouvel · 2020-05-15T18:26:37Z

@jburger and @pawelpabich, could you please share some more information about how this was showing up originally and how the mutex was used? I see from the linked issue OctopusDeploy/Issues#6287 it was showing up while writing to logs, and the issue mentioned that a bug was fixed, I'm also curious if anything was changed to work around the problem and if you are still seeing the issue from time to time.

johnsimons · 2020-05-15T22:45:13Z

@kouvel we were using a named mutex for locking concurrent writes to files based on the filename.
We moved away from mutexes and instead replaced it with ReaderWriterLockSlim

    public class NamedLocks
    {
        readonly Dictionary<string, RefCountedLock> refCountedLocks = new Dictionary<string, RefCountedLock>();

        public IDisposable LockFor(string name)
        {
            RefCountedLock refCountedLock;

            lock (refCountedLocks)
            {
                if (!refCountedLocks.TryGetValue(name, out refCountedLock))
                {
                    refCountedLock = new RefCountedLock(name, refCountedLocks);
                    refCountedLocks[name] = refCountedLock;
                }

                refCountedLock.Acquire();
            }

            refCountedLock.Enter();

            return refCountedLock;
        }

        public int Count()
        {
            lock (refCountedLocks)
            {
                return refCountedLocks.Count;
            }
        }

        class RefCountedLock : IDisposable
        {
            readonly string name;
            readonly Dictionary<string, RefCountedLock> refCountedLocks;
            readonly ReaderWriterLockSlim @lock;

            int numberOfRefs;

            public RefCountedLock(string name, Dictionary<string, RefCountedLock> refCountedLocks)
            {
                this.name = name;
                this.refCountedLocks = refCountedLocks;

                @lock = new ReaderWriterLockSlim();

                numberOfRefs = 0;
            }

            public void Acquire()
            {
                numberOfRefs++;
            }

            public void Enter()
            {
                @lock.EnterWriteLock();
            }

            public void Dispose()
            {
                lock (refCountedLocks)
                {
                    numberOfRefs--;
                    if (numberOfRefs == 0)
                    {
                        refCountedLocks.Remove(name);
                    }
                }

                @lock.ExitWriteLock();

                if (numberOfRefs == 0)
                {
                    @lock.Dispose();
                }
            }
        }
    }

kouvel · 2020-05-15T23:36:04Z

I see, thanks @johnsimons. Do you still need the ability to share the same lock across other processes, or is it just for synchronization within one process?

johnsimons · 2020-05-15T23:45:30Z

No, we only need it for the same process. It was just a convenience thing 😀

kouvel · 2021-05-08T18:35:00Z

Fixed in 5.0 by #36268

Dotnet-GitSync-Bot added area-System.Threading untriaged New issue has not been triaged by the area owner labels Mar 30, 2020

jburger changed the title ~~Parallel creation of Mutex with initialOwner: true can cause SIGSEGV on Ubuntu 19.04~~ Parallel creation of Mutex with initiallyOwned: true can cause SIGSEGV on Ubuntu 19.04 Mar 30, 2020

janvorli self-assigned this Mar 30, 2020

janvorli added area-PAL-coreclr and removed area-System.Threading untriaged New issue has not been triaged by the area owner labels Mar 30, 2020

janvorli added this to the 5.0 milestone Mar 30, 2020

pawelpabich mentioned this issue Apr 2, 2020

Server sometimes crashes while writing to the server log on Linux OctopusDeploy/Issues#6287

Closed

kouvel assigned kouvel and unassigned janvorli May 12, 2020

kouvel modified the milestones: 5.0, 3.1.x May 12, 2020

kouvel closed this as completed May 8, 2021

ghost locked as resolved and limited conversation to collaborators Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel creation of `Mutex` with `initiallyOwned: true` can cause `SIGSEGV` on Ubuntu 19.04 #34271

Parallel creation of `Mutex` with `initiallyOwned: true` can cause `SIGSEGV` on Ubuntu 19.04 #34271

jburger commented Mar 30, 2020 •

edited

Loading

janvorli commented Mar 30, 2020

janvorli commented Mar 30, 2020

danmoseley commented Mar 30, 2020

kouvel commented May 15, 2020

johnsimons commented May 15, 2020

kouvel commented May 15, 2020

johnsimons commented May 15, 2020

kouvel commented May 8, 2021

Parallel creation of Mutex with initiallyOwned: true can cause SIGSEGV on Ubuntu 19.04 #34271

Parallel creation of Mutex with initiallyOwned: true can cause SIGSEGV on Ubuntu 19.04 #34271

Comments

jburger commented Mar 30, 2020 • edited Loading

Steps to reproduce

Expected behaviour

lldb output

OS Details

.NET details

janvorli commented Mar 30, 2020

janvorli commented Mar 30, 2020

danmoseley commented Mar 30, 2020

kouvel commented May 15, 2020

johnsimons commented May 15, 2020

kouvel commented May 15, 2020

johnsimons commented May 15, 2020

kouvel commented May 8, 2021

Parallel creation of `Mutex` with `initiallyOwned: true` can cause `SIGSEGV` on Ubuntu 19.04 #34271

Parallel creation of `Mutex` with `initiallyOwned: true` can cause `SIGSEGV` on Ubuntu 19.04 #34271

jburger commented Mar 30, 2020 •

edited

Loading