Fix a_store operation in atomic.h #403

wenyongh · 2023-03-22T01:46:19Z

The wasm opcodes generated by a_barrier mode are like below:

    atomic.fence
    local.get
    i32.const
    i32.store
    atomic.fence

Which use i32.store and are not atomic operations, and cause some
wasi-threads cases hang in WAMR:
bytecodealliance/wasm-micro-runtime#2024 (comment)

sbc100 · 2023-03-22T01:56:27Z

libc-top-half/musl/src/internal/atomic.h

@@ -182,13 +182,26 @@ static inline void a_dec(volatile int *p)
 #define a_store a_store
 static inline void a_store(volatile int *p, int v)
 {
+#ifdef __wasilibc_unmodified_upstream


Would it be better so not defined a_barrier in wasm32/atomic_arch.h if its not doing the right thing?

sbc100 · 2023-03-22T01:59:21Z

libc-top-half/musl/src/internal/atomic.h

+	 *   i32.const
+	 *   i32.store
+	 *   atomic.fence
+	 * which are not atomic operations, so we use a_swap instead.


Clearly the *p = v operation above is non-atomic since p and v are just normal C types (not atomic types), so that fact that the above sequence contains non-atomic operations should be be surprise right? We would expect that on all platforms.

I guess atomic.fence doesn't seem to be doing what the musl developers think that a_barrier is supposed to do? Do you know why?

Again, maybe better to just not defined a_barrier if we can't make it do what musl expects here?

Clearly the *p = v operation above is non-atomic since p and v are just normal C types (not atomic types), so that fact that the above sequence contains non-atomic operations should be be surprise right? We would expect that on all platforms.

maybe it's assumed to be compiled to an atomic-enough instruction. like mov on x86.

or maybe a_store doesn't require to be that atomic.

I guess atomic.fence doesn't seem to be doing what the musl developers think that a_barrier is supposed to do? Do you know why?

Again, maybe better to just not defined a_barrier if we can't make it do what musl expects here?

are you aware of any doc which explains what musl expects?
i'm not.

Yes, it is not easy to just use atomic.fence pairs to make the a_store an atomic operation, especially for interpreter, since interpreter needs to load and dispatch the opcodes one by one and pop/push operands when handling the opcodes, there will be many extra load/store/jump operations inserted among these operations, so it should be not atomic. For JIT/AOT mode, maybe the machine code generated can be atomic after some optimizations are applied, e.g. the code generated may be just like memory barrier + store instruction + memory barrier. Maybe it is why we found the hang issue only in interpreter mode, and haven't found the issue in AOT mode.

And yes, it is better to not define a_barrier, seems it is just converted into atomic.fence wasm opcode, and is not what must expects to do.

since interpreter needs to load and dispatch the opcodes one by one and pop/push operands when handling the opcodes

instruction fetch doesn't matter because it's read-only.
operand stack doesn't matter as it's local to the thread.
i don't see any fundamental differences between the interpreter and jit/aot here.

The AOT code is generated dynamically according to the input bytecode, the machine codes generated may be like memory barrier + store instruction + memory barrier, which ensures the memory ordering of access.
The interpreter is to interpret wasm opcode one by one, note that the machine code compiled consists lots of code pieces of opcode handler, these code pieces are un-ordered, for example, there may be code pieces of I32.STORE and ATOMIC.FENCE:

handler_of_I32.STORE: ... store instruction fetch next opcode and jump to its handler ... ... handler_of_ATOMIC.FENCE: memory barrier fetch next opcode and jump to its handler

Or:

handler_of_ATOMIC.FENCE: memory barrier fetch next opcode and jump to its handler ... ... handler_of_I32.STORE: ... store instruction fetch next opcode and jump to its handler

Note there is only one memory barrier instruction generated and there may be many other instructions between store and memory barrier instruction. The memory order cannot be promised, interpreter may run memory barrier first, then jump forward/backward to handler of I32.store, and then jump forward/backward again to handler of ATOMIC.FENCE.

Again, maybe better to just not defined a_barrier if we can't make it do what musl expects here?

Thanks @sbc100, I removed the a_barrier definition in wasm32/atomic_arch.h.

The AOT code is generated dynamically according to the input bytecode, the machine codes generated may be like memory barrier + store instruction + memory barrier, which ensures the memory ordering of access. The interpreter is to interpret wasm opcode one by one, note that the machine code compiled consists lots of code pieces of opcode handler, these code pieces are un-ordered, for example, there may be code pieces of I32.STORE and ATOMIC.FENCE:

handler_of_I32.STORE: ... store instruction fetch next opcode and jump to its handler ... ... handler_of_ATOMIC.FENCE: memory barrier fetch next opcode and jump to its handler

Or:

handler_of_ATOMIC.FENCE: memory barrier fetch next opcode and jump to its handler ... ... handler_of_I32.STORE: ... store instruction fetch next opcode and jump to its handler

Note there is only one memory barrier instruction generated and there may be many other instructions between store and memory barrier instruction. The memory order cannot be promised, interpreter may run memory barrier first, then jump forward/backward to handler of I32.store, and then jump forward/backward again to handler of ATOMIC.FENCE.

i don't understand your concern.
as far as the interpreter executes handler_of_I32.STORE and handler_of_ATOMIC.FENCE in the order as in the original wasm bytecode, it should be fine.

I am not sure, isn't the memory ordering related to the place of barrier inside the compiled code? And will it prevent the compiler from generation some re-ordered instructions?

memory barrier is cpu instruction which takes effect when it's executed.

TerrorJack · 2023-03-22T02:00:36Z

Out of curiosity, it seems a_swap isn't defined in https://github.com/WebAssembly/wasi-libc/blob/main/libc-top-half/musl/arch/wasm32/atomic_arch.h yet, but it seems to be an oversight? Since wasm atomic opcodes contain xchg which can be used to implement it. And there's also atomic store opcodes that can be used to directly implement a_store...

sbc100 · 2023-03-22T02:07:24Z

Out of curiosity, it seems a_swap isn't defined in https://github.com/WebAssembly/wasi-libc/blob/main/libc-top-half/musl/arch/wasm32/atomic_arch.h yet, but it seems to be an oversight? Since wasm atomic opcodes contain xchg which can be used to implement it. And there's also atomic store opcodes that can be used to directly implement a_store...

As a point of reference, in emscripten the atomic_arch.h stuff is implemented in terms of __c11_atomic_ intrinsics which allows them to be fairly generic and call back on clang/llvm to do that right thing:

https://github.com/emscripten-core/emscripten/blob/592aeca6a44b20ddeea6948fd9b5793bb8fc36de/system/lib/libc/musl/arch/emscripten/atomic_arch.h#L23-L27

TerrorJack · 2023-03-22T02:09:44Z

Yeah, I was thinking of the old gnu style __sync_swap and such, but C11 atomics definitely look better.

yamt · 2023-03-22T02:38:15Z

The wasm opcodes generated by a_barrier mode are like below:
    atomic.fence
    local.get
    i32.const
    i32.store
    atomic.fence
Which use i32.store and are not atomic operations, and cause some wasi-threads cases hang in WAMR: bytecodealliance/wasm-micro-runtime#2024 (comment)

i'm not sure how it can cause the hang.
do you mean that wamr's STORE_U32 happens to be compiled to a non-atomic store?

yamt · 2023-03-22T06:50:50Z

if my reading of the spec [1] is correct, it seems like a bug in wamr to me.

[1] WebAssembly/threads#197

wenyongh · 2023-03-22T09:27:48Z

Which use i32.store and are not atomic operations, and cause some wasi-threads cases hang in WAMR: bytecodealliance/wasm-micro-runtime#2024 (comment)

i'm not sure how it can cause the hang. do you mean that wamr's STORE_U32 happens to be compiled to a non-atomic store?

It is really complex to explain the whole process of pthread_barrier_wait, somewhat like:
Thread A is doing a_swap to take a lock, and if the lock is taken, it will enter the critical code; if not, it will run into atomic.wait to wait again.
Thread B's non-atomic i32.store 0 to the lock is omitted by the thread A's a_swap, a_swap re-writes 1 to the lock and returns fail, so thread A waits again, but since the lock is written to 1, it waits indefinitely for other thread to notify it. But thread B already finishes its task and doesn't notify it again, it also waits indefinitely.

wenyongh · 2023-03-22T09:34:23Z

if my reading of the spec [1] is correct, it seems like a bug in wamr to me.

[1] WebAssembly/threads#197

Wondering, why i32.store needs to take the lock too? It is non-atomic, and spec doesn't mention that, instead, spec mentions atomic store needs to be seq-cst order.

yamt · 2023-03-22T09:55:32Z

if my reading of the spec [1] is correct, it seems like a bug in wamr to me.
[1] WebAssembly/threads#197

Wondering, why i32.store needs to take the lock too? It is non-atomic, and spec doesn't mention that, instead, spec mentions atomic store needs to be seq-cst order.

as i stated in the linked issue, it's actually somehow stated "atomic" in the spec.
and these musl-internal locking mechanisms seem to rely on the property.
i agree it's a bit strange. that's why i opened the question issue in the spec repo.

yamt · 2023-03-22T10:07:28Z

Which use i32.store and are not atomic operations, and cause some wasi-threads cases hang in WAMR: bytecodealliance/wasm-micro-runtime#2024 (comment)

i'm not sure how it can cause the hang. do you mean that wamr's STORE_U32 happens to be compiled to a non-atomic store?

It is really complex to explain the whole process of pthread_barrier_wait, somewhat like: Thread A is doing a_swap to take a lock, and if the lock is taken, it will enter the critical code; if not, it will run into atomic.wait to wait again. Thread B's non-atomic i32.store 0 to the lock is omitted by the thread A's a_swap, a_swap re-writes 1 to the lock and returns fail, so thread A waits again, but since the lock is written to 1, it waits indefinitely for other thread to notify it. But thread B already finishes its task and doesn't notify it again, it also waits indefinitely.

i asked about STORE_U32 because i thought that as far as STORE_U32 is compiled into eg. x86 mov instruction it's still atomic enough.

but i noticed that it's actually more about the cmpxchg implementation than i32.store.
when cmpxchg is implemented with a lock as it is in wamr interpreter, i32.store can be executed in the middle of cmpxchg and effectively break the lock as you say.

while it can be fixed by using an atomic opcode instead of i32.store as in this PR,
i'd like to wait for clarification in the threads repo as the trick to implement unlock with non-atomic store
is somehow common. (note that x86 version of musl a_store is mov + memory barrier, which basically matches the current wasm version.)

g0djan · 2023-03-22T10:26:09Z

Seems like this PR should solve the same problem that I tried to investigate in #402

yamt · 2023-03-22T11:04:08Z

after reading clarifications by @conrad-watt in WebAssembly/threads#197 , i think this patch is fine.
(i feel it's more straightforward to define a_store in wasm atomic_arch.h though.)

g0djan · 2023-03-22T11:07:45Z

I checked this PR solely fixes #402 for me, but I still observe data race warnings between some LOAD/STORE operations.
@wenyongh thank you for the fix!

abrown

Looks like there is consensus that this is the right fix. I'll merge it and upstream it to wasi-sdk next week (along the lines of WebAssembly/wasi-sdk#412) but I had a quick question: should we wrap this in #ifdef __wasilibc_unmodified_upstream? My sense would be "no," since these WebAssembly arch files won't be in upstream MUSL, but I just wanted to check.

Thanks, @wenyongh!

wenyongh · 2023-03-24T02:09:59Z

Looks like there is consensus that this is the right fix. I'll merge it and upstream it to wasi-sdk next week (along the lines of WebAssembly/wasi-sdk#412) but I had a quick question: should we wrap this in #ifdef __wasilibc_unmodified_upstream? My sense would be "no," since these WebAssembly arch files won't be in upstream MUSL, but I just wanted to check.

Thanks, @wenyongh!

Welcome and thanks, @abrown!

From the README.md under libc-top-half, it mentions that Changes to upstream code are wrapped in preprocessor directives controlled by the macro __wasilibc_unmodified_upstream, my understanding is that the macro is only used when modifying the original musl source code, since the folder arch/wasm32 was newly added, it isn't the original source code of musl, we shouldn't need to add the macro wrapper:
https://github.com/bminor/musl/tree/master/arch

And instead, if it is needed, we should also add the wrapping for other macro definitions in arch/wasm/atomic_arch.h, like a_cas, a_clz_32 and so on.

WebAssembly/wasi-libc#403 fixed an issue with `a_barrier` that should be included in the next release of wasi-sdk. This change updates wasi-libc to the latest `HEAD` of `main` to include it.

Fix a_store operation in atomic.h

1e4222f

sbc100 reviewed Mar 22, 2023

View reviewed changes

wenyongh mentioned this pull request Mar 22, 2023

Fix atomic.wait, get wasi_ctx exit code and thread mgr issues bytecodealliance/wasm-micro-runtime#2024

Merged

restore a_store and remove defining a_barrier instead

214e215

wenyongh mentioned this pull request Mar 22, 2023

Implement sync primitives instead of using pthread ones bytecodealliance/wasm-micro-runtime#2028

Merged

conrad-watt mentioned this pull request Mar 22, 2023

Questions about non-atomic instructions like i32.store WebAssembly/threads#197

Open

yamt approved these changes Mar 22, 2023

View reviewed changes

g0djan mentioned this pull request Mar 22, 2023

Fix/Simplify the atomic.wait/nofity implementations bytecodealliance/wasm-micro-runtime#2044

Merged

abrown approved these changes Mar 23, 2023

View reviewed changes

abrown merged commit 1dfe5c3 into WebAssembly:main Mar 24, 2023

abrown mentioned this pull request Mar 27, 2023

wasi-libc: update to include atomic barrier fix WebAssembly/wasi-sdk#317

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a_store operation in atomic.h #403

Fix a_store operation in atomic.h #403

wenyongh commented Mar 22, 2023

sbc100 Mar 22, 2023

sbc100 Mar 22, 2023

yamt Mar 22, 2023

wenyongh Mar 22, 2023

yamt Mar 22, 2023

wenyongh Mar 22, 2023

wenyongh Mar 22, 2023

yamt Mar 22, 2023

wenyongh Mar 22, 2023

yamt Mar 22, 2023

TerrorJack commented Mar 22, 2023

sbc100 commented Mar 22, 2023

TerrorJack commented Mar 22, 2023

yamt commented Mar 22, 2023

yamt commented Mar 22, 2023

wenyongh commented Mar 22, 2023

wenyongh commented Mar 22, 2023

yamt commented Mar 22, 2023

yamt commented Mar 22, 2023

g0djan commented Mar 22, 2023

yamt commented Mar 22, 2023

g0djan commented Mar 22, 2023

abrown left a comment

wenyongh commented Mar 24, 2023

Fix a_store operation in atomic.h #403

Fix a_store operation in atomic.h #403

Conversation

wenyongh commented Mar 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TerrorJack commented Mar 22, 2023

sbc100 commented Mar 22, 2023

TerrorJack commented Mar 22, 2023

yamt commented Mar 22, 2023

yamt commented Mar 22, 2023

wenyongh commented Mar 22, 2023

wenyongh commented Mar 22, 2023

yamt commented Mar 22, 2023

yamt commented Mar 22, 2023

g0djan commented Mar 22, 2023

yamt commented Mar 22, 2023

g0djan commented Mar 22, 2023

abrown left a comment

Choose a reason for hiding this comment

wenyongh commented Mar 24, 2023