<atomic>, <memory>, <execution>: make sure acquire and release are safe to use and start using them #1133

AlexGuteniev · 2020-08-03T04:46:05Z

Currently memory_order_acquire and memory_order_release are considered unsafe:
The problem is critical sections overlap in the following situation with mutexes or other synch object:

T1: a.acquire(); a.release(); b.acquire(); b.release(), 
T2: b.acquire(); b.release(); a.acquire(); a.release();

Release reorders past subsequent unrelated acquire, so sections overlap and deadlock occurs.
The current resolution is believed to be the following:

Acquire should observe the release result in a finite time, so release operations cannot be reordered past infinite amount of acquire attempts
In hardware, memory changes take time to propagate, but relatively a very small time, definitely not infinite time
In software, the compiler either does not reorder operations at all, or does not reorder them past potentially infinite amount of other operations

Unfortunately, 1 is not what Standard currently says, and 2 and 3 has to be confirmed with compiler vendors

Before the status of acquire / release is clarified, currently seq_cst is used in some places, specifically:

atomic_shared_ptr internal spinlock:

STL/stl/inc/memory

Line 3130 in 12c684b

if (!_Repptr.compare_exchange_weak(_Rep, (_Rep & _Ptr_value_mask) | _Locked_notify_needed)) {

STL/stl/inc/memory

Line 3150 in 12c684b

uintptr_t _Rep = _Repptr.exchange(reinterpret_cast<uintptr_t>(_Value));

Non-lock-free atomic

STL/stl/inc/atomic

Lines 394 to 407 in 12c684b

    
           inline void _Atomic_lock_spinlock(long& _Spinlock) noexcept {
 
               while (_InterlockedExchange(&_Spinlock, 1)) {
 
                   _YIELD_PROCESSOR();
 
               }
 
           }
 
           inline void _Atomic_unlock_spinlock(long& _Spinlock) noexcept {
 
           #if defined(_M_ARM) || defined(_M_ARM64)
 
               _Memory_barrier();
 
               __iso_volatile_store32(reinterpret_cast<int*>(&_Spinlock), 0);
 
               _Memory_barrier();
 
           #else // ^^^ ARM32/ARM64 hardware / x86/x64 hardware vvv
 
               _InterlockedExchange(&_Spinlock, 0);
 
           #endif // hardware

Parallel algorithms in <execution> (more than just this occurrence):

STL/stl/inc/execution

Line 3624 in 12c684b

_State.store(_New_state);
memory_resource.cpp

STL/stl/src/memory_resource.cpp

Line 24 in 12c684b

memory_resource* const _Temp = __crt_interlocked_read_pointer(&_Default_resource);

STL/stl/src/memory_resource.cpp

Line 33 in 12c684b

memory_resource* const _Temp = __crt_interlocked_read_pointer(&_Default_resource);

STL/stl/src/memory_resource.cpp

Line 43 in 12c684b

memory_resource* const _Temp = __crt_interlocked_exchange_pointer(&_Default_resource, _Resource);

STL/stl/src/memory_resource.cpp

Line 53 in 12c684b

memory_resource* const _Temp = __crt_interlocked_exchange_pointer(&_Default_resource, _Resource);

filesystem.cpp

STL/stl/src/filesystem.cpp

Lines 36 to 50 in 12c684b

    
           auto _Result = __crt_interlocked_read_pointer(_Cache); 
        
           if (_Result) { 
        
               return _Result; 
        
           } 
        
           const HMODULE _HMod = GetModuleHandleW(_Module); 
        
           if (_HMod) { 
        
               _Result = reinterpret_cast<_Fn_ptr>(GetProcAddress(_HMod, _Fn_name)); 
        
           } 
        
           if (!_Result) { 
        
               _Result = _Fallback; 
        
           } 
        
           __crt_interlocked_exchange_pointer(_Cache, _Result);

Some places believed to be not affected by the issue still use acquire / release, specifically:

shared_ptr external spinlock:

STL/stl/src/atomic.cpp

Lines 16 to 34 in 12c684b

    
           _CRTIMP2_PURE void __cdecl _Lock_shared_ptr_spin_lock() { // spin until _Shared_ptr_flag successfully set 
        
           #ifdef _M_ARM 
        
               while (_InterlockedExchange_acq(&_Shared_ptr_flag, 1)) { 
        
                   __yield(); 
        
               } 
        
           #else // _M_ARM 
        
               while (_interlockedbittestandset(&_Shared_ptr_flag, 0)) { // set bit 0 
        
               } 
        
           #endif // _M_ARM 
        
           } 
        
           _CRTIMP2_PURE void __cdecl _Unlock_shared_ptr_spin_lock() { // release previously obtained lock 
        
           #ifdef _M_ARM 
        
               __dmb(_ARM_BARRIER_ISH); 
        
               __iso_volatile_store32(reinterpret_cast<volatile int*>(&_Shared_ptr_flag), 0); 
        
           #else // _M_ARM 
        
               _interlockedbittestandreset(&_Shared_ptr_flag, 0); // reset bit 0 
        
           #endif // _M_ARM 
        
           }

<system_error>

STL/stl/inc/system_error

Lines 590 to 597 in 12c684b

    
           if (_Storage[0].load(memory_order_acquire) != 0) {
 
               return reinterpret_cast<_Ty&>(_Storage);
 
           }
 
           const _Ty _Target;
 
           const auto _Target_iter = reinterpret_cast<const uintptr_t*>(_STD addressof(_Target));
 
           _CSTD memcpy(_Storage + 1, _Target_iter + 1, sizeof(_Ty) - sizeof(uintptr_t));
 
           _Storage[0].store(_Target_iter[0], memory_order_release);

atomic_wait.cpp

STL/stl/src/atomic_wait.cpp

Lines 147 to 152 in 12c684b

    
               _Wait_functions._Api_level.store(_Level, _STD memory_order_release); 
        
               return _Level; 
        
           } 
        
           [[nodiscard]] __std_atomic_api_level _Acquire_wait_functions() noexcept { 
        
               auto _Level = _Wait_functions._Api_level.load(_STD memory_order_acquire);

The task is to confirm the situation with compiler team and decide on using `memory_order_acquire` / `memory_order_release` in mentioned and possibly unmentioned preexisting code and new code

Note also that memory model implementation on arm may change in the future, see #83 , see also ##488 , #775 , #1082

The text was updated successfully, but these errors were encountered:

StephanTLavavej · 2020-08-08T22:11:04Z

I would also want to make sure that any use of acq/rel doesn't lead to Independent Reads, Independent Writes problems (which full sequential consistency solves; some algorithms are affected by IRIW).

BillyONeal · 2020-08-18T00:01:45Z

@giroux said he is working on the memory model implications in a paper for C++23 (if pandemic lets us have a c++23)

AlexGuteniev mentioned this issue Aug 3, 2020

<semaphore> Implement semaphore - WIP #420

Closed

4 tasks

AlexGuteniev added a commit to AlexGuteniev/STL that referenced this issue Aug 3, 2020

microsoftGH-1133 workaround

560e237

StephanTLavavej added the performance Must go faster label Aug 8, 2020

StephanTLavavej mentioned this issue Jul 30, 2024

Improve ARM64 atomics for Clang #4870

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<atomic>, <memory>, <execution>: make sure acquire and release are safe to use and start using them #1133

<atomic>, <memory>, <execution>: make sure acquire and release are safe to use and start using them #1133

AlexGuteniev commented Aug 3, 2020 •

edited

Loading

StephanTLavavej commented Aug 8, 2020

BillyONeal commented Aug 18, 2020

<atomic>, <memory>, <execution>: make sure acquire and release are safe to use and start using them #1133

<atomic>, <memory>, <execution>: make sure acquire and release are safe to use and start using them #1133

Comments

AlexGuteniev commented Aug 3, 2020 • edited Loading

Unfortunately, 1 is not what Standard currently says, and 2 and 3 has to be confirmed with compiler vendors

Before the status of acquire / release is clarified, currently seq_cst is used in some places, specifically:

Some places believed to be not affected by the issue still use acquire / release, specifically:

The task is to confirm the situation with compiler team and decide on using memory_order_acquire / memory_order_release in mentioned and possibly unmentioned preexisting code and new code

StephanTLavavej commented Aug 8, 2020

BillyONeal commented Aug 18, 2020

AlexGuteniev commented Aug 3, 2020 •

edited

Loading

The task is to confirm the situation with compiler team and decide on using `memory_order_acquire` / `memory_order_release` in mentioned and possibly unmentioned preexisting code and new code