#651, for 64 bit type on x86 __iso_volatile_store64 #694

AlexGuteniev · 2020-04-08T04:49:54Z

#651, for 64 bit type on x86 __iso_volatile_store64

without complier barrier for memory_order_relaxed,
with complier barrier for memory_order_release,
other orders unchanged.

without complier barrier for memory_order_relaxed, with complier barrier for memory_order_release, other orders unchanged.

stl/inc/atomic

AlexGuteniev · 2020-04-10T09:29:12Z

Here's reduced ICE: DevCom-986061

AlexGuteniev · 2020-04-10T13:49:44Z

Related: I don't get the use of __ldrexd versus __iso_volatile_load64 for ARM.

I mean. if __ldrexd is better, why wouldn't compiler do for __iso_volatile_load64 whatever it does for __ldrexd.

cbezault · 2020-04-15T01:02:52Z

Totally possible __iso_volatile_load64 wasn't available on ARM (or was broken) when this was written.

BillyONeal · 2020-04-15T01:03:46Z

Totally possible __iso_volatile_load64 wasn't available on ARM (or was broken) when this was written.

ARM and ARM64 have always had the full complement of __iso_volatiles. Only x86 has ever had them missing.

AlexGuteniev · 2020-04-15T07:16:36Z

There is a difference in codegen, __ldrexd produces ldrexd, and __iso_volatile_load64 produces ldrd.
I cannot explain it, I don't understand ARM. Before it is explained, I think __ldrexd should stay.

This program:

#include <intrin.h>

volatile long long var = 0;

long long f1()
{
    return __iso_volatile_load64(&var);
}


long long f2()
{
    return  __ldrexd(&var);
}

int main()
{
    f1();
    f2();
}

Compiled with VS 2019 Preview:

D:\Temp>"C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Auxiliary\Build\vcvarsamd64_arm.bat"
**********************************************************************
** Visual Studio 2019 Developer Command Prompt v16.6.0-pre.2.1
** Copyright (c) 2020 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64_arm'

D:\Temp>cl /O2 /FA repro.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.26.28720.3 for ARM
Copyright (C) Microsoft Corporation.  All rights reserved.

repro.cpp
Microsoft (R) Incremental Linker Version 14.26.28720.3
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:repro.exe
/machine:arm
repro.obj

Produces following asm:

; Listing generated by Microsoft (R) Optimizing Compiler Version 19.26.28720.3 

	TTL	D:\Temp\repro.cpp
	THUMB
	.drectve
	DCB	"-defaultlib:LIBCMT "
	DCB	"-defaultlib:OLDNAMES "

	EXPORT	|?var@@3_JC| [ DATA ]			; var
	.bss
|?var@@3_JC| %	0x8					; var
	EXPORT	|?f1@@YA_JXZ|				; f1
	EXPORT	|?f2@@YA_JXZ|				; f2
	EXPORT	|main|
; Function compile flags: /Ogtpy
;	COMDAT main
.text$mn	SEGMENT

|main|	PROC
; File D:\Temp\repro.cpp
; Line 18
	movs        r0,#0
|$M4|
	bx          lr

	ENDP  ; |main|

; Function compile flags: /Ogtpy
;	COMDAT ?f2@@YA_JXZ
.text$mn	SEGMENT

|?f2@@YA_JXZ| PROC					; f2
; File D:\Temp\repro.cpp
; Line 13
	movw        r3,|?var@@3_JC|
	movt        r3,|?var@@3_JC|
	ldrexd      r0,r1,[r3]
|$M4|
; Line 14
	bx          lr

	ENDP  ; |?f2@@YA_JXZ|, f2

; Function compile flags: /Ogtpy
;	COMDAT ?f1@@YA_JXZ
.text$mn	SEGMENT

|?f1@@YA_JXZ| PROC					; f1
; File D:\Temp\repro.cpp
; Line 7
	movw        r3,|?var@@3_JC|
	movt        r3,|?var@@3_JC|
	ldrd        r0,r1,[r3]
|$M4|
; Line 8
	bx          lr

	ENDP  ; |?f1@@YA_JXZ|, f1

	END

AlexGuteniev · 2020-04-15T12:04:41Z

Regarding everything except __ldrexd, what would be the next step? Wait for DevCom-986061 resolution?

StephanTLavavej · 2020-04-15T20:25:59Z

We talked about this in our weekly meeting, and we believe that waiting for the ICE to be fixed is preferable over using inline assembly (which has been problematic in the past). I'll mark this PR as blocked.

cbezault

This looks good to me.
My suggestions are weak suggestions, not strong ones, feel free to resolve if you don't think it's worth the effort.

stl/inc/atomic

cbezault · 2020-06-24T23:07:52Z

stl/inc/atomic

+
+#if defined(_M_ARM) || defined(_M_ARM64)
+    _Memory_barrier();
+#else // ^^^ ARM32/ARM64 hardware / x86/x64 hardware vvv


Is the implementation for x86/x64 likely safe on any other hypothetical architecture? Assumptions about architecture is what has gotten us into this mess of mediocre ARM support we're in now.

Good question. It is both unsafe, and possibly inefficient where it is safe.

So I've addressed this place.

But there are some places like here:

STL/stl/inc/atomic

Lines 489 to 495 in 4deaf6c

#if defined(_M_ARM) || defined(_M_ARM64)

_Memory_barrier();

__iso_volatile_store16(_Mem, _As_bytes);

_Memory_barrier();

#else // ^^^ ARM32/ARM64 hardware / x86/x64 hardware vvv

(void) _InterlockedExchange16(_Mem, _As_bytes);

#endif // hardware

I think any of the implementations looks working (apart from _Memory_barrier() undefined on x86/x64, but it can be defined). But it still does not make sense to use the wrong implementation for each platform for performance reasons. Should compilation be broken for unknown platform here as well?

I'm more okay with it in this case. _InterlockedExchange16 will likely be defined on future architectures. Also you're not directly touching this so no reason to hold up the PR on that.

Co-authored-by: Curtis J Bezault <[email protected]>

…into atomic_x86_store # Conflicts: # stl/inc/atomic

stl/inc/atomic

cbezault · 2020-06-25T17:31:48Z

stl/inc/atomic

+
+#if defined(_M_ARM) || defined(_M_ARM64)
+    _Memory_barrier();
+#else // ^^^ ARM32/ARM64 hardware / x86/x64 hardware vvv


I'm more okay with it in this case. _InterlockedExchange16 will likely be defined on future architectures. Also you're not directly touching this so no reason to hold up the PR on that.

Co-authored-by: Curtis J Bezault <[email protected]>

…ng_ll_seq_cst.

BillyONeal · 2020-06-29T21:30:31Z

Replayed: https://devdiv.visualstudio.com/DefaultCollection/DevDiv/_git/msvc/pullrequest/257934

stl/inc/atomic

StephanTLavavej

(@BillyONeal has offered to commit/test these minor changes to the GitHub and MSVC PRs; thanks Billy!)

BillyONeal · 2020-06-30T04:36:13Z

@AlexGuteniev Thank you for your contribution!

This was added by microsoft#694 after microsoft#653.

This reverts commit ad1a26a which was later modified by #694. Fixes #971. Co-authored-by: Stephan T. Lavavej <[email protected]>

AlexGuteniev added 2 commits April 8, 2020 07:39

microsoft#651, for 64 bit type on x86 __iso_volatile_store64

618ec58

without complier barrier for memory_order_relaxed, with complier barrier for memory_order_release, other orders unchanged.

Comment fall through

c13a661

BillyONeal approved these changes Apr 9, 2020

View reviewed changes

BillyONeal reviewed Apr 9, 2020

View reviewed changes

stl/inc/atomic Show resolved Hide resolved

BillyONeal self-requested a review April 10, 2020 03:34

AlexGuteniev added 7 commits April 10, 2020 06:59

Removing macros, now I have ICE

1d23034

more macro removal

60e0046

clang format

5a20d71

ARM build fix

ec56c8e

ARM build fix

9d1f185

ARM build fix

0351eb6

ARM build fix

7a8520c

AlexGuteniev added 2 commits April 10, 2020 12:54

correct type for load 16

458dba8

correct type for load 16

0343b90

StephanTLavavej added the performance Must go faster label Apr 12, 2020

AlexGuteniev added 3 commits April 12, 2020 21:57

Acquire/release for internal spinlock

cdda967

Acquire/release for shared_ptr spinlock

0e1953b

clang format

ab6b378

StephanTLavavej added the blocked Something is preventing work on this label Apr 15, 2020

AlexGuteniev added 4 commits April 16, 2020 08:45

Optimize store a bit more for seq_cst

e463c38

undo shared_ptr change, can go separately

ec09985

add pure case

a82bbeb

Undo weaker memory order for internal spin lock

12369f5

CaseyCarter removed a link to an issue Jun 24, 2020

<atomic>: use __iso_volatile_store64 on x86 if it is available #651

Closed

CaseyCarter linked an issue Jun 24, 2020 that may be closed by this pull request

<atomic>: In pre-C++20 mode, the constructor should be trivial #661

Open

CaseyCarter removed a link to an issue Jun 24, 2020

<atomic>: In pre-C++20 mode, the constructor should be trivial #661

Open

cbezault linked an issue Jun 24, 2020 that may be closed by this pull request

<atomic>: use __iso_volatile_store64 on x86 if it is available #651

Closed

cbezault approved these changes Jun 24, 2020

View reviewed changes

AlexGuteniev and others added 5 commits June 25, 2020 07:07

Update stl/inc/atomic

6276c6f

Co-authored-by: Curtis J Bezault <[email protected]>

Update stl/inc/atomic

d24ab46

Co-authored-by: Curtis J Bezault <[email protected]>

Unsupport unknown hardware

cb5bf7d

Merge branch 'atomic_x86_store' of https://github.com/AlexGuteniev/STL …

4ade86d

…into atomic_x86_store # Conflicts: # stl/inc/atomic

whitespace

5876a8e

cbezault reviewed Jun 25, 2020

View reviewed changes

cbezault approved these changes Jun 25, 2020

View reviewed changes

AlexGuteniev and others added 4 commits June 26, 2020 10:08

Update stl/inc/atomic

6018567

Co-authored-by: Curtis J Bezault <[email protected]>

Merge remote-tracking branch 'origin/master' into atomic_x86_store

c492c8d

Remove hardware test from kill_dependency

a122f3c

Call the intrinsic directly rather than _Atomic_compare_exchange_stro…

a8af518

…ng_ll_seq_cst.

BillyONeal approved these changes Jun 29, 2020

View reviewed changes

Suppress windows.h interlocked /analyze warning.

e339636

StephanTLavavej approved these changes Jun 29, 2020

View reviewed changes

stl/inc/atomic Outdated Show resolved Hide resolved

stl/inc/atomic Outdated Show resolved Hide resolved

stl/inc/atomic Outdated Show resolved Hide resolved

StephanTLavavej requested changes Jun 29, 2020

View reviewed changes

Apply STL CR comments.

b1bcc3f

BillyONeal requested a review from StephanTLavavej June 30, 2020 02:17

StephanTLavavej approved these changes Jun 30, 2020

View reviewed changes

BillyONeal merged commit bc077c6 into microsoft:master Jun 30, 2020

AlexGuteniev deleted the atomic_x86_store branch June 30, 2020 05:21

StephanTLavavej added a commit to AlexGuteniev/STL that referenced this pull request Jul 2, 2020

Remove push/disable in addition to pop.

35c5406

This was added by microsoft#694 after microsoft#653.

StephanTLavavej added a commit that referenced this pull request Jul 3, 2020

Revert #653 "<chrono>: Cache QPF() and divide just once" (#972)

5be7d49

This reverts commit ad1a26a which was later modified by #694. Fixes #971. Co-authored-by: Stephan T. Lavavej <[email protected]>

tt4g mentioned this pull request Jan 6, 2021

spdlog application crash gabime/spdlog#1785

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#651, for 64 bit type on x86 __iso_volatile_store64 #694

#651, for 64 bit type on x86 __iso_volatile_store64 #694

AlexGuteniev commented Apr 8, 2020

AlexGuteniev commented Apr 10, 2020 •

edited

Loading

AlexGuteniev commented Apr 10, 2020

cbezault commented Apr 15, 2020

BillyONeal commented Apr 15, 2020

AlexGuteniev commented Apr 15, 2020

AlexGuteniev commented Apr 15, 2020

StephanTLavavej commented Apr 15, 2020

cbezault left a comment

cbezault Jun 24, 2020

AlexGuteniev Jun 25, 2020

AlexGuteniev Jun 25, 2020

cbezault Jun 25, 2020

cbezault Jun 25, 2020

BillyONeal commented Jun 29, 2020

StephanTLavavej left a comment

BillyONeal commented Jun 30, 2020

	#if defined(_M_ARM) \|\| defined(_M_ARM64)
	_Memory_barrier();
	__iso_volatile_store16(_Mem, _As_bytes);
	_Memory_barrier();
	#else // ^^^ ARM32/ARM64 hardware / x86/x64 hardware vvv
	(void) _InterlockedExchange16(_Mem, _As_bytes);
	#endif // hardware

#651, for 64 bit type on x86 __iso_volatile_store64 #694

#651, for 64 bit type on x86 __iso_volatile_store64 #694

Conversation

AlexGuteniev commented Apr 8, 2020

AlexGuteniev commented Apr 10, 2020 • edited Loading

AlexGuteniev commented Apr 10, 2020

cbezault commented Apr 15, 2020

BillyONeal commented Apr 15, 2020

AlexGuteniev commented Apr 15, 2020

AlexGuteniev commented Apr 15, 2020

StephanTLavavej commented Apr 15, 2020

cbezault left a comment

Choose a reason for hiding this comment

cbezault Jun 24, 2020

Choose a reason for hiding this comment

AlexGuteniev Jun 25, 2020

Choose a reason for hiding this comment

AlexGuteniev Jun 25, 2020

Choose a reason for hiding this comment

cbezault Jun 25, 2020

Choose a reason for hiding this comment

cbezault Jun 25, 2020

Choose a reason for hiding this comment

BillyONeal commented Jun 29, 2020

StephanTLavavej left a comment

Choose a reason for hiding this comment

BillyONeal commented Jun 30, 2020

AlexGuteniev commented Apr 10, 2020 •

edited

Loading