Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<atomic>, <memory>, <execution>: make sure acquire and release are safe to use and start using them #1133

Open
AlexGuteniev opened this issue Aug 3, 2020 · 2 comments
Labels
performance Must go faster

Comments

@AlexGuteniev
Copy link
Contributor

AlexGuteniev commented Aug 3, 2020

Currently memory_order_acquire and memory_order_release are considered unsafe:
The problem is critical sections overlap in the following situation with mutexes or other synch object:

T1: a.acquire(); a.release(); b.acquire(); b.release(), 
T2: b.acquire(); b.release(); a.acquire(); a.release();

Release reorders past subsequent unrelated acquire, so sections overlap and deadlock occurs.
The current resolution is believed to be the following:

  1. Acquire should observe the release result in a finite time, so release operations cannot be reordered past infinite amount of acquire attempts
  2. In hardware, memory changes take time to propagate, but relatively a very small time, definitely not infinite time
  3. In software, the compiler either does not reorder operations at all, or does not reorder them past potentially infinite amount of other operations

Unfortunately, 1 is not what Standard currently says, and 2 and 3 has to be confirmed with compiler vendors


Before the status of acquire / release is clarified, currently seq_cst is used in some places, specifically:

  1. atomic_shared_ptr internal spinlock:
    if (!_Repptr.compare_exchange_weak(_Rep, (_Rep & _Ptr_value_mask) | _Locked_notify_needed)) {

    uintptr_t _Rep = _Repptr.exchange(reinterpret_cast<uintptr_t>(_Value));
  2. Non-lock-free atomic

    STL/stl/inc/atomic

    Lines 394 to 407 in 12c684b

    inline void _Atomic_lock_spinlock(long& _Spinlock) noexcept {
    while (_InterlockedExchange(&_Spinlock, 1)) {
    _YIELD_PROCESSOR();
    }
    }
    inline void _Atomic_unlock_spinlock(long& _Spinlock) noexcept {
    #if defined(_M_ARM) || defined(_M_ARM64)
    _Memory_barrier();
    __iso_volatile_store32(reinterpret_cast<int*>(&_Spinlock), 0);
    _Memory_barrier();
    #else // ^^^ ARM32/ARM64 hardware / x86/x64 hardware vvv
    _InterlockedExchange(&_Spinlock, 0);
    #endif // hardware
  3. Parallel algorithms in <execution> (more than just this occurrence):
    _State.store(_New_state);
  4. memory_resource.cpp
    memory_resource* const _Temp = __crt_interlocked_read_pointer(&_Default_resource);

    memory_resource* const _Temp = __crt_interlocked_read_pointer(&_Default_resource);

    memory_resource* const _Temp = __crt_interlocked_exchange_pointer(&_Default_resource, _Resource);

    memory_resource* const _Temp = __crt_interlocked_exchange_pointer(&_Default_resource, _Resource);
  5. filesystem.cpp
    auto _Result = __crt_interlocked_read_pointer(_Cache);
    if (_Result) {
    return _Result;
    }
    const HMODULE _HMod = GetModuleHandleW(_Module);
    if (_HMod) {
    _Result = reinterpret_cast<_Fn_ptr>(GetProcAddress(_HMod, _Fn_name));
    }
    if (!_Result) {
    _Result = _Fallback;
    }
    __crt_interlocked_exchange_pointer(_Cache, _Result);

Some places believed to be not affected by the issue still use acquire / release, specifically:

  1. shared_ptr external spinlock:

    STL/stl/src/atomic.cpp

    Lines 16 to 34 in 12c684b

    _CRTIMP2_PURE void __cdecl _Lock_shared_ptr_spin_lock() { // spin until _Shared_ptr_flag successfully set
    #ifdef _M_ARM
    while (_InterlockedExchange_acq(&_Shared_ptr_flag, 1)) {
    __yield();
    }
    #else // _M_ARM
    while (_interlockedbittestandset(&_Shared_ptr_flag, 0)) { // set bit 0
    }
    #endif // _M_ARM
    }
    _CRTIMP2_PURE void __cdecl _Unlock_shared_ptr_spin_lock() { // release previously obtained lock
    #ifdef _M_ARM
    __dmb(_ARM_BARRIER_ISH);
    __iso_volatile_store32(reinterpret_cast<volatile int*>(&_Shared_ptr_flag), 0);
    #else // _M_ARM
    _interlockedbittestandreset(&_Shared_ptr_flag, 0); // reset bit 0
    #endif // _M_ARM
    }
  2. <system_error>

    STL/stl/inc/system_error

    Lines 590 to 597 in 12c684b

    if (_Storage[0].load(memory_order_acquire) != 0) {
    return reinterpret_cast<_Ty&>(_Storage);
    }
    const _Ty _Target;
    const auto _Target_iter = reinterpret_cast<const uintptr_t*>(_STD addressof(_Target));
    _CSTD memcpy(_Storage + 1, _Target_iter + 1, sizeof(_Ty) - sizeof(uintptr_t));
    _Storage[0].store(_Target_iter[0], memory_order_release);
  3. atomic_wait.cpp

    STL/stl/src/atomic_wait.cpp

    Lines 147 to 152 in 12c684b

    _Wait_functions._Api_level.store(_Level, _STD memory_order_release);
    return _Level;
    }
    [[nodiscard]] __std_atomic_api_level _Acquire_wait_functions() noexcept {
    auto _Level = _Wait_functions._Api_level.load(_STD memory_order_acquire);

The task is to confirm the situation with compiler team and decide on using memory_order_acquire / memory_order_release in mentioned and possibly unmentioned preexisting code and new code

Note also that memory model implementation on arm may change in the future, see #83 , see also ##488 , #775 , #1082

AlexGuteniev added a commit to AlexGuteniev/STL that referenced this issue Aug 3, 2020
@StephanTLavavej
Copy link
Member

I would also want to make sure that any use of acq/rel doesn't lead to Independent Reads, Independent Writes problems (which full sequential consistency solves; some algorithms are affected by IRIW).

@StephanTLavavej StephanTLavavej added the performance Must go faster label Aug 8, 2020
@BillyONeal
Copy link
Member

@giroux said he is working on the memory model implications in a paper for C++23 (if pandemic lets us have a c++23)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants