-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a_store operation in atomic.h #403
Conversation
@@ -182,13 +182,26 @@ static inline void a_dec(volatile int *p) | |||
#define a_store a_store | |||
static inline void a_store(volatile int *p, int v) | |||
{ | |||
#ifdef __wasilibc_unmodified_upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better so not defined a_barrier
in wasm32/atomic_arch.h
if its not doing the right thing?
* i32.const | ||
* i32.store | ||
* atomic.fence | ||
* which are not atomic operations, so we use a_swap instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly the *p = v
operation above is non-atomic since p
and v
are just normal C types (not atomic types), so that fact that the above sequence contains non-atomic operations should be be surprise right? We would expect that on all platforms.
I guess atomic.fence
doesn't seem to be doing what the musl developers think that a_barrier
is supposed to do? Do you know why?
Again, maybe better to just not defined a_barrier
if we can't make it do what musl expects here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly the
*p = v
operation above is non-atomic sincep
andv
are just normal C types (not atomic types), so that fact that the above sequence contains non-atomic operations should be be surprise right? We would expect that on all platforms.
maybe it's assumed to be compiled to an atomic-enough instruction. like mov
on x86.
or maybe a_store
doesn't require to be that atomic.
I guess
atomic.fence
doesn't seem to be doing what the musl developers think thata_barrier
is supposed to do? Do you know why?Again, maybe better to just not defined
a_barrier
if we can't make it do what musl expects here?
are you aware of any doc which explains what musl expects?
i'm not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is not easy to just use atomic.fence
pairs to make the a_store
an atomic operation, especially for interpreter, since interpreter needs to load and dispatch the opcodes one by one and pop/push operands when handling the opcodes, there will be many extra load/store/jump operations inserted among these operations, so it should be not atomic. For JIT/AOT mode, maybe the machine code generated can be atomic after some optimizations are applied, e.g. the code generated may be just like memory barrier + store instruction + memory barrier
. Maybe it is why we found the hang issue only in interpreter mode, and haven't found the issue in AOT mode.
And yes, it is better to not define a_barrier
, seems it is just converted into atomic.fence
wasm opcode, and is not what must expects to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since interpreter needs to load and dispatch the opcodes one by one and pop/push operands when handling the opcodes
instruction fetch doesn't matter because it's read-only.
operand stack doesn't matter as it's local to the thread.
i don't see any fundamental differences between the interpreter and jit/aot here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AOT code is generated dynamically according to the input bytecode, the machine codes generated may be like memory barrier + store instruction + memory barrier
, which ensures the memory ordering of access.
The interpreter is to interpret wasm opcode one by one, note that the machine code compiled consists lots of code pieces of opcode handler, these code pieces are un-ordered, for example, there may be code pieces of I32.STORE
and ATOMIC.FENCE
:
handler_of_I32.STORE:
...
store instruction
fetch next opcode and jump to its handler
...
...
handler_of_ATOMIC.FENCE:
memory barrier
fetch next opcode and jump to its handler
Or:
handler_of_ATOMIC.FENCE:
memory barrier
fetch next opcode and jump to its handler
...
...
handler_of_I32.STORE:
...
store instruction
fetch next opcode and jump to its handler
Note there is only one memory barrier
instruction generated and there may be many other instructions between store
and memory barrier
instruction. The memory order cannot be promised, interpreter may run memory barrier
first, then jump forward/backward to handler of I32.store, and then jump forward/backward again to handler of ATOMIC.FENCE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, maybe better to just not defined
a_barrier
if we can't make it do what musl expects here?
Thanks @sbc100, I removed the a_barrier
definition in wasm32/atomic_arch.h
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AOT code is generated dynamically according to the input bytecode, the machine codes generated may be like
memory barrier + store instruction + memory barrier
, which ensures the memory ordering of access. The interpreter is to interpret wasm opcode one by one, note that the machine code compiled consists lots of code pieces of opcode handler, these code pieces are un-ordered, for example, there may be code pieces ofI32.STORE
andATOMIC.FENCE
:handler_of_I32.STORE: ... store instruction fetch next opcode and jump to its handler ... ... handler_of_ATOMIC.FENCE: memory barrier fetch next opcode and jump to its handlerOr:
handler_of_ATOMIC.FENCE: memory barrier fetch next opcode and jump to its handler ... ... handler_of_I32.STORE: ... store instruction fetch next opcode and jump to its handlerNote there is only one
memory barrier
instruction generated and there may be many other instructions betweenstore
andmemory barrier
instruction. The memory order cannot be promised, interpreter may runmemory barrier
first, then jump forward/backward to handler of I32.store, and then jump forward/backward again to handler of ATOMIC.FENCE.
i don't understand your concern.
as far as the interpreter executes handler_of_I32.STORE
and handler_of_ATOMIC.FENCE
in the order as in the original wasm bytecode, it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, isn't the memory ordering related to the place of barrier inside the compiled code? And will it prevent the compiler from generation some re-ordered instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memory barrier is cpu instruction which takes effect when it's executed.
Out of curiosity, it seems |
As a point of reference, in emscripten the |
Yeah, I was thinking of the old gnu style |
i'm not sure how it can cause the hang. |
if my reading of the spec [1] is correct, it seems like a bug in wamr to me. |
It is really complex to explain the whole process of |
Wondering, why i32.store needs to take the lock too? It is non-atomic, and spec doesn't mention that, instead, spec mentions atomic store needs to be seq-cst order. |
as i stated in the linked issue, it's actually somehow stated "atomic" in the spec. |
i asked about STORE_U32 because i thought that as far as STORE_U32 is compiled into eg. x86 but i noticed that it's actually more about the cmpxchg implementation than while it can be fixed by using an atomic opcode instead of |
Seems like this PR should solve the same problem that I tried to investigate in #402 |
after reading clarifications by @conrad-watt in WebAssembly/threads#197 , i think this patch is fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is consensus that this is the right fix. I'll merge it and upstream it to wasi-sdk next week (along the lines of WebAssembly/wasi-sdk#412) but I had a quick question: should we wrap this in #ifdef __wasilibc_unmodified_upstream
? My sense would be "no," since these WebAssembly arch
files won't be in upstream MUSL, but I just wanted to check.
Thanks, @wenyongh!
Welcome and thanks, @abrown! From the README.md under libc-top-half, it mentions that And instead, if it is needed, we should also add the wrapping for other macro definitions in |
WebAssembly/wasi-libc#403 fixed an issue with `a_barrier` that should be included in the next release of wasi-sdk. This change updates wasi-libc to the latest `HEAD` of `main` to include it.
WebAssembly/wasi-libc#403 fixed an issue with `a_barrier` that should be included in the next release of wasi-sdk. This change updates wasi-libc to the latest `HEAD` of `main` to include it.
The wasm opcodes generated by a_barrier mode are like below:
Which use i32.store and are not atomic operations, and cause some
wasi-threads cases hang in WAMR:
bytecodealliance/wasm-micro-runtime#2024 (comment)