Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel creation of Mutex with initiallyOwned: true can cause SIGSEGV on Ubuntu 19.04 #34271

Closed
jburger opened this issue Mar 30, 2020 · 8 comments
Assignees
Milestone

Comments

@jburger
Copy link

jburger commented Mar 30, 2020

Parallel creation of System.Threading.Mutex can fail when initiallyOwned is true, this causes a segmentation fault in libpthread.so which appears to be handled in libcoreclr.so but no managed exceptions are thrown, and the process fails, a SIGABRT is listed as the stop reason for thread #1.

Steps to reproduce

  1. Create a netcoreapp3.1 console application with the following Program.cs code:
using System;
using System.Threading;
using System.Threading.Tasks;

namespace repro
{
    class Program
    {
        static void Main(string[] args)
        {
            try {
                var t = 10000;
                
                Parallel.For(1, t, (i) => {
                    CreateMutex("t" + i.ToString());
                    CreateMutex("t" + (i-1).ToString());
                });
            } catch (Exception exception) {
                Console.WriteLine(exception);
            }
            finally {
                Console.WriteLine("here");
            }
        }

        private static void CreateMutex(string name)
        {
            using (var mutex = new Mutex(true, name))
            {
                Console.WriteLine($"mutex: {name}");
            }
        }
    }
}
  1. dotnet build -c Release
  2. export COMPlus_DbgEnableMiniDump=1
  3. run ./bin/Release/netcoreapp3.1/repro
  4. Observe an exit code of 139
    image
  5. Change Mutex ctor to use initallyOwned: false and observe that the code works OK

Expected behaviour

A managed exception is thrown when invalid Mutex access is attempted.

lldb output

When analyzing the resulting coredump in lldb thread #1 shows something like this:

* thread #1, name = 'repro', stop reason = signal SIGABRT
  * frame #0: 0x00007f59550a191a libpthread.so.0`__waitpid(pid=17428, stat_loc=0x00007f59523c044c, options=0) at waitpid.c:30:10
    frame #1: 0x00007f59543900fd libcoreclr.so`PROCCreateCrashDump(argv=0x00007f595468e5a0) at process.cpp:3346:22
    frame #2: 0x00007f595435d95d libcoreclr.so`sigsegv_handler(int, siginfo_t*, void*) [inlined] invoke_previous_action(code=<unavailable>, siginfo=<unavailable>, context=<unavailable>, signalRestarts=true) at signal.cpp:304:5
    frame #3: 0x00007f595435d90f libcoreclr.so`sigsegv_handler(code=11, siginfo=0x00007f59523c0af0, context=0x00007f59523c09c0) at signal.cpp:501
    frame #4: 0x00007f59550a1f40 libpthread.so.0`___lldb_unnamed_symbol1$$libpthread.so.0 + 1
    frame #5: 0x00007f595509ae24 libpthread.so.0`__pthread_mutex_unlock_full(mutex=0x00007f595012a008, decr=1) at pthread_mutex_unlock.c:149:7
    frame #6: 0x00007f5954384ebe libcoreclr.so`NamedMutexProcessData::Close(bool, bool) [inlined] MutexHelpers::ReleaseLock(mutex=<unavailable>) at mutex.cpp:932:24
    frame #7: 0x00007f5954384eb6 libcoreclr.so`NamedMutexProcessData::Close(bool, bool) [inlined] NamedMutexProcessData::ActuallyReleaseLock() at mutex.cpp:1619
    frame #8: 0x00007f5954384e8b libcoreclr.so`NamedMutexProcessData::Close(bool, bool) [inlined] NamedMutexProcessData::Abandon() at mutex.cpp:1606
    frame #9: 0x00007f5954384e6e libcoreclr.so`NamedMutexProcessData::Close(this=0x00000000009d75f0, isAbruptShutdown=<unavailable>, releaseSharedData=true) at mutex.cpp:1294
    frame #10: 0x00007f59543821ff libcoreclr.so`SharedMemoryProcessDataHeader::Close(this=<unavailable>) at sharedmemory.cpp:927:17
    frame #11: 0x00007f5954381f8b libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(CorUnix::CPalThread*, CorUnix::IPalObject*, bool, bool) [inlined] SharedMemoryProcessDataHeader::~SharedMemoryProcessDataHeader(this=0x00000000009d7550) at sharedmemory.cpp:868:5
    frame #12: 0x00007f5954381f83 libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(CorUnix::CPalThread*, CorUnix::IPalObject*, bool, bool) [inlined] void CorUnix::InternalDelete<SharedMemoryProcessDataHeader>(p=0x00000000009d7550) at malloc.hpp:148
    frame #13: 0x00007f5954381f83 libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(CorUnix::CPalThread*, CorUnix::IPalObject*, bool, bool) [inlined] SharedMemoryProcessDataHeader::DecRefCount(this=0x00000000009d7550) at sharedmemory.cpp:1028
    frame #14: 0x00007f5954381f7e libcoreclr.so`SharedMemoryProcessDataHeader::PalObject_Close(thread=<unavailable>, object=<unavailable>, isShuttingDown=<unavailable>, cleanUpPalSharedState=<unavailable>) at sharedmemory.cpp:813
    frame #15: 0x00007f5954375c86 libcoreclr.so`CorUnix::CPalObjectBase::ReleaseReference(this=0x00000000008e60a0, pthr=0x00000000008578a0) at palobjbase.cpp:309:13
    frame #16: 0x00007f5954368787 libcoreclr.so`CorUnix::CSimpleHandleManager::FreeHandle(this=<unavailable>, pThread=0x00000000008578a0, h=<unavailable>) at handlemgr.cpp:253:15
    frame #17: 0x00007f59543681ce libcoreclr.so`::CloseHandle(HANDLE) [inlined] CorUnix::InternalCloseHandle(pThread=0x00000000008578a0, hObject=0x0000000000000498) at handleapi.cpp:312:38
    frame #18: 0x00007f5954368188 libcoreclr.so`::CloseHandle(hObject=0x0000000000000498) at handleapi.cpp:287

Seems like the coreclr SIGSEGV handler is being called, so I'm not sure why there is no managed exception.

OS Details

NAME="Ubuntu"
VERSION="19.04 (Disco Dingo)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 19.04"
VERSION_ID="19.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=disco
UBUNTU_CODENAME=disco

.NET details

.NET Core SDK (reflecting any global.json):
 Version:   3.1.201
 Commit:    b1768b4ae7

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  19.04
 OS Platform: Linux
 RID:         ubuntu.19.04-x64
 Base Path:   /usr/share/dotnet/sdk/3.1.201/

Host (useful for support):
  Version: 3.1.3
  Commit:  4a9f85e9f8

.NET Core SDKs installed:
  2.2.207 [/usr/share/dotnet/sdk]
  2.2.402 [/usr/share/dotnet/sdk]
  3.0.103 [/usr/share/dotnet/sdk]
  3.1.201 [/usr/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.0.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.0.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

Please let me know if there is any more information I can provide.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Threading untriaged New issue has not been triaged by the area owner labels Mar 30, 2020
@jburger jburger changed the title Parallel creation of Mutex with initialOwner: true can cause SIGSEGV on Ubuntu 19.04 Parallel creation of Mutex with initiallyOwned: true can cause SIGSEGV on Ubuntu 19.04 Mar 30, 2020
@janvorli
Copy link
Member

so I'm not sure why there is no managed exception.

SIGSEGV is converted into managed exception only if it happens in managed code or a couple of thin helpers that are executed on behalf of the managed code. At all other places, SIGSEGV represents a bug in the native runtime / external shared libraries and so we fail fast instead. Converting it to a managed exception would be dangerous as we don't know what state was the current thread in. For example, if the sigsegv happened in a code running under a lock, handling the sigsegv would soon result in a deadlock.

@janvorli
Copy link
Member

I can confirm it repros on my Ubuntu 16.04 machine too.

@janvorli janvorli self-assigned this Mar 30, 2020
@janvorli janvorli added area-PAL-coreclr and removed area-System.Threading untriaged New issue has not been triaged by the area owner labels Mar 30, 2020
@janvorli janvorli added this to the 5.0 milestone Mar 30, 2020
@danmoseley
Copy link
Member

Thanks @jburger for excellent clear bug report.

@kouvel kouvel assigned kouvel and unassigned janvorli May 12, 2020
@kouvel kouvel modified the milestones: 5.0, 3.1.x May 12, 2020
kouvel added a commit to kouvel/runtime that referenced this issue May 12, 2020
Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the `Mutex` class.
- When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior
- There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown
- For this corner case I settled on what seems like a decent solution and compatible with older runtimes:
  - When a process releases its last reference to the mutex
    - If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex
    - If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases:
      - The mutex gets another reference from within the same process
      - The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore
  - The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior
- There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize

Fix for dotnet#34271 in master
Closes dotnet#28449 - probably doesn't fix the issue, but trying to enable it to see if it continues to fail
kouvel added a commit to kouvel/coreclr that referenced this issue May 12, 2020
Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the `Mutex` class.
- When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior
- There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown
- For this corner case I settled on what seems like a decent solution and compatible with older runtimes:
  - When a process releases its last reference to the mutex
    - If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex
    - If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases:
      - The mutex gets another reference from within the same process
      - The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore
  - The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior
- There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize

Fixes dotnet/runtime#34271 in 3.1
kouvel added a commit that referenced this issue May 13, 2020
Fix Unix named mutex crash during some race conditions

Below when I refer to "mutex" I'm referring to the underlying mutex object, not an instance of the `Mutex` class.
- When the last reference to a mutex is closed while the lock is held by some thread and a pthread mutex is used, the mutex was attempted to be destroyed but that has undefined behavior
- There doesn't seem to be a way to behave exactly like on Windows for this corner case, where the mutex is destroyed when the last reference to it is released, regardless of which process has the mutex locked and which process releases the last reference to it (they could be two different processes), including in cases of abrupt shutdown
- For this corner case I settled on what seems like a decent solution and compatible with older runtimes:
  - When a process releases its last reference to the mutex
    - If that mutex is locked by the same thread, the lock is abandoned and the process no longer references the mutex
    - If that mutex is locked by a different thread, the lifetime of the mutex is extended with an implicit ref. The implicit ref prevents this or other processes from attempting to destroy the mutex while it is locked. The implicit ref is removed in either of these cases:
      - The mutex gets another reference from within the same process
      - The thread that owns the lock exits and abandons the mutex, at which point that would be the last reference to the mutex and the process would not reference the mutex anymore
  - The implementation based on file locks is less restricted, but for consistency that implementation also follows the same behavior
- There was also a race between an exiting thread abandoning one of its locked named mutexes and another thread releasing the last reference to it, fixed by using the creation/deletion process lock to synchronize

Fix for #34271 in master
Closes #28449 - probably doesn't fix the issue, but trying to enable it to see if it continues to fail
@kouvel
Copy link
Member

kouvel commented May 15, 2020

@jburger and @pawelpabich, could you please share some more information about how this was showing up originally and how the mutex was used? I see from the linked issue OctopusDeploy/Issues#6287 it was showing up while writing to logs, and the issue mentioned that a bug was fixed, I'm also curious if anything was changed to work around the problem and if you are still seeing the issue from time to time.

@johnsimons
Copy link

@kouvel we were using a named mutex for locking concurrent writes to files based on the filename.
We moved away from mutexes and instead replaced it with ReaderWriterLockSlim

    public class NamedLocks
    {
        readonly Dictionary<string, RefCountedLock> refCountedLocks = new Dictionary<string, RefCountedLock>();

        public IDisposable LockFor(string name)
        {
            RefCountedLock refCountedLock;

            lock (refCountedLocks)
            {
                if (!refCountedLocks.TryGetValue(name, out refCountedLock))
                {
                    refCountedLock = new RefCountedLock(name, refCountedLocks);
                    refCountedLocks[name] = refCountedLock;
                }

                refCountedLock.Acquire();
            }

            refCountedLock.Enter();

            return refCountedLock;
        }

        public int Count()
        {
            lock (refCountedLocks)
            {
                return refCountedLocks.Count;
            }
        }

        class RefCountedLock : IDisposable
        {
            readonly string name;
            readonly Dictionary<string, RefCountedLock> refCountedLocks;
            readonly ReaderWriterLockSlim @lock;

            int numberOfRefs;

            public RefCountedLock(string name, Dictionary<string, RefCountedLock> refCountedLocks)
            {
                this.name = name;
                this.refCountedLocks = refCountedLocks;

                @lock = new ReaderWriterLockSlim();

                numberOfRefs = 0;
            }

            public void Acquire()
            {
                numberOfRefs++;
            }

            public void Enter()
            {
                @lock.EnterWriteLock();
            }

            public void Dispose()
            {
                lock (refCountedLocks)
                {
                    numberOfRefs--;
                    if (numberOfRefs == 0)
                    {
                        refCountedLocks.Remove(name);
                    }
                }

                @lock.ExitWriteLock();

                if (numberOfRefs == 0)
                {
                    @lock.Dispose();
                }
            }
        }
    }

@kouvel
Copy link
Member

kouvel commented May 15, 2020

I see, thanks @johnsimons. Do you still need the ability to share the same lock across other processes, or is it just for synchronization within one process?

@johnsimons
Copy link

No, we only need it for the same process. It was just a convenience thing 😀

@kouvel
Copy link
Member

kouvel commented May 8, 2021

Fixed in 5.0 by #36268

@kouvel kouvel closed this as completed May 8, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants