Implement sync primitives instead of using pthread ones #2028

eloparco · 2023-03-15T00:18:08Z

Attempt to replace pthread sync primitives since they seem to cause data races when running with the thread sanitizer. Those data races appear as concurrent accesses to load/store operations when running in classic interpreter mode.

In particular, this PR implements mutex and barrier to replace pthread_mutex_t and pthread_barrier_t in the tests with threads.

core/iwasm/libraries/lib-wasi-threads/test/sync_primitives.h

eloparco · 2023-03-16T08:48:53Z

After this PR I don't see the load/store race conditions anymore. I need to fix the failure on the specification test for atomics (that's breaking in the CI) and then do some more testing.

g0djan · 2023-03-16T11:57:19Z

After this PR I don't see the load/store race conditions anymore. I need to fix the failure on the specification test for atomics (that's breaking in the CI) and then do some more testing.

Me neither, no TSAN warnings

wenyongh · 2023-03-17T07:31:07Z

@eloparco Wondering why we replace the pthread_barrier_wait with self-implemented barrier_wait, can we confirm that the hang issue is caused by wasi-libc pthread_barrier_wait or by runtime? If not, could we keep the implementation of both, and add a macro to control whether using it or self-implemented barrier_wait?

BTW, I tried to debug the hang issue and found that the last thread didn't notify others after it entered pthread_barrier_wait, seems there is memory overwritten. Need to check more. One thing to confirm, the aux stack of the thread created is allocated previously in the bytecode, no need to be allocated by runtime, right?

eloparco · 2023-03-17T08:28:53Z

@eloparco Wondering why we replace the pthread_barrier_wait with self-implemented barrier_wait, can we confirm that the hang issue is caused by wasi-libc pthread_barrier_wait or by runtime? If not, could we keep the implementation of both, and add a macro to control whether using it or self-implemented barrier_wait?

This PR is part of the investigation to understand if the hang problems are coming from the usage of pthread primitives from wasi-libc. Using the self-implemented mutex and barrier the TSAN warnings and deadlocks disappear.
Yes, we can keep both using a macro, I don't think it's worth running each test with both.

BTW, I tried to debug the hang issue and found that the last thread didn't notify others after it entered pthread_barrier_wait, seems there is memory overwritten. Need to check more. One thing to confirm, the aux stack of the thread created is allocated previously in the bytecode, no need to be allocated by runtime, right?

I need to check, but it may just be the consequence of the data race since they cause undefined behavior?

wenyongh · 2023-03-17T08:54:51Z

@eloparco Wondering why we replace the pthread_barrier_wait with self-implemented barrier_wait, can we confirm that the hang issue is caused by wasi-libc pthread_barrier_wait or by runtime? If not, could we keep the implementation of both, and add a macro to control whether using it or self-implemented barrier_wait?

This PR is part of the investigation to understand if the hang problems are coming from the usage of pthread primitives from wasi-libc. Using the self-implemented mutex and barrier the TSAN warnings and deadlocks disappear. Yes, we can keep both using a macro, I don't think it's worth running each test with both.

Yes, no need to test both, keeping pthread_barrier_wait version is just to be able to reproduce the issue and debug it.

BTW, I tried to debug the hang issue and found that the last thread didn't notify others after it entered pthread_barrier_wait, seems there is memory overwritten. Need to check more. One thing to confirm, the aux stack of the thread created is allocated previously in the bytecode, no need to be allocated by runtime, right?

I need to check, but it may just be the consequence of the data race since they cause undefined behavior?

I am not sure, I tried to understand the source code the pthread_barrier_wait source code, add comments below if you are interested too:
https://github.com/WebAssembly/wasi-libc/blob/main/libc-top-half/musl/src/thread/pthread_barrier_wait.c

        ...
	/* Otherwise we need a lock on the barrier object */
	while (a_swap(&b->_b_lock, 1))                                => wait and try to get the lock
		__wait(&b->_b_lock, &b->_b_waiters, 1, 1);
	inst = b->_b_inst;                                                      => lock is gotten, the below operations is locked

	/* First thread to enter the barrier becomes the "instance owner" */
	if (!inst) {                                                                   => the first thread enters into this branch to setup the instance,
		struct instance new_inst = { 0 };                              seems always OK
		int spins = 200;
		b->_b_inst = inst = &new_inst;
		a_store(&b->_b_lock, 0);
		if (b->_b_waiters) __wake(&b->_b_lock, 1, 1);
		while (spins-- && !inst->finished)
			a_spin();
		a_inc(&inst->finished);
		while (inst->finished == 1)
#ifdef __wasilibc_unmodified_upstream
			__syscall(SYS_futex,&inst->finished,FUTEX_WAIT|FUTEX_PRIVATE,1,0) != -ENOSYS
			|| __syscall(SYS_futex,&inst->finished,FUTEX_WAIT,1,0);
#else
			__futexwait(&inst->finished, 1, 0);
#endif
		return PTHREAD_BARRIER_SERIAL_THREAD;
	}

	/* Last thread to enter the barrier wakes all non-instance-owners */
	if (++inst->count == limit) {                                 => the last(fourth) thread enters into this branch,
		b->_b_inst = 0;                                                    Fails when hang: thread didn't enter into this branch
		a_store(&b->_b_lock, 0);
		if (b->_b_waiters) __wake(&b->_b_lock, 1, 1);
		a_store(&inst->last, 1);
		if (inst->waiters)
			__wake(&inst->last, -1, 1);
	} else {                                                                     => the second and third thread enters into this branch,
		a_store(&b->_b_lock, 0);                                       seems always OK
		if (b->_b_waiters) __wake(&b->_b_lock, 1, 1);
		__wait(&inst->last, &inst->waiters, 0, 1);
	}

Not easy to know the behavior of last thread, since when I added more printf, it doesn't hang again..

eloparco · 2023-03-17T10:21:07Z

We can check the mutex lock/unlock code, since that one gives TSAN warnings as well and it should be easier to debug.

g0djan · 2023-03-17T10:41:33Z

@wenyongh prints stoped working for me as well at some point. I wanted to use WAMR source debugger so I can attach when it stuck, but the lld patch is out of date. I opened #2035 for that

g0djan · 2023-03-18T00:14:36Z

Didn't manage to build wasmtime with thread sanitizer but managed to run tests that hang for wamr on pthread_barrier_wait and on wasmtime it doesn't hang after 10k runs, the same .wasm binary file easily hangs. on wamr after 500 iterations

So it might mean that problem is not with pthread_barrier actually in that case

wenyongh · 2023-03-18T00:47:41Z

Didn't manage to build wasmtime with thread sanitizer but managed to run tests that hang for wamr on pthread_barrier_wait and on wasmtime it doesn't hang after 10k runs, the same .wasm binary file easily hangs. on wamr after 500 iterations

So it might mean that problem is not with pthread_barrier actually in that case

Yes, I also found that it doesn't hang when running with WAMR AOT mode, so it should doesn't matter with pthread_barrier_wait.

The root cause might be the CPU runtime memory ordering:
https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory_ordering

For example, the data written by thread A is in cache, but not actually written into memory yet, and the data read by thread B is from memory and may be invalid. We may need to add some memory barrier operations in interpreter like what AOT does: I tried to added some, and it didn't hang so often, I uploaded a rough patch, maybe you can have a further investigation:
mem_order.zip

wenyongh

LGTM

g0djan · 2023-03-20T13:45:10Z

core/iwasm/libraries/lib-wasi-threads/test/common.h

+#if USE_PTHREAD_SYNC_PRIMITIVES != 0
+#include <pthread.h>
+#else
+#include "sync_primitives.h"


@wenyongh @eloparco

I'd like to use this primitives to easier find the other errors, but do we want to use it as default one though?

Pros:

stable CI, if people break something they will be sure it's because of their changes

Cons:

now pthreads spot the problems that need to be fixed anyway but we would hide it

Yes, had better use the pthread_barrier_wait after the hang issue is resolved. But note that there will be some normal load/store data races in wasi-libc implementation, e.g. a_swap:
https://github.com/WebAssembly/wasi-libc/blob/main/libc-top-half/musl/src/internal/atomic.h#L106-L115

int old; do old = *p; while (a_cas(p, old, v) != old); return old;

Here old = *p is not atomic, and a_cas(p, old, v) is atomic, if two threads operate on the same address p, data races may be reported by sanitizer.

Yes, had better use the pthread_barrier_wait after the hang issue is resolved.

So should we merge this one as it is then? Until we get it fixed in wasi-libc.

Here old = *p is not atomic, and a_cas(p, old, v) is atomic, if two threads operate on the same address p, data races may be reported by sanitizer.

Nice catch, that's probably what the thread sanitizer is complaining about for the load/store data races. It may be the same for other operations as well, I see the a_fetch_add a few lines later doing something similar.

loganek · 2023-03-22T09:51:42Z

How much effort would be to fix wasi-libc? I personally would prefer to do it there rather than re-implementing those primitives; this is extra maintenance effort + WASI community won't benefit from your changes here as they're WAMR internal.

wenyongh · 2023-03-22T10:02:14Z

How much effort would be to fix wasi-libc? I personally would prefer to do it there rather than re-implementing those primitives; this is extra maintenance effort + WASI community won't benefit from your changes here as they're WAMR internal.

Just to remove a line in libc-top-half/musl/arch/wasm32/atomic_arch.h, I submitted a PR to it:
WebAssembly/wasi-libc#403

ttrenner · 2023-03-22T10:02:29Z

How much effort would be to fix wasi-libc? I personally would prefer to do it there rather than re-implementing those primitives; this is extra maintenance effort + WASI community won't benefit from your changes here as they're WAMR internal.

The question can be raised in general: which things should be part of wasi and which should be part of wamr. Often this results in a chicken & egg problem: no one is implementing it as long as it is not standardized, but on the other hand no one pushes standardization if no one wants it to have/implement it. So I would propose to go forward and push it/make it visible at the community (with the risk of doing some waste).

loganek · 2023-03-22T10:16:22Z

How much effort would be to fix wasi-libc? I personally would prefer to do it there rather than re-implementing those primitives; this is extra maintenance effort + WASI community won't benefit from your changes here as they're WAMR internal.

The question can be raised in general: which things should be part of wasi and which should be part of wamr. Often this results in a chicken & egg problem: no one is implementing it as long as it is not standardized, but on the other hand no one pushes standardization if no one wants it to have/implement it. So I would propose to go forward and push it/make it visible at the community (with the risk of doing some waste).

I don't think the question is which bits should be a part of wasi-libc and which should not be - those primitives are already part of wasi-libc, but they don't work (at least according to this PR). From what I understand, we re-implement those sync primitives to not use the (probably) broken ones from wasi-libc, but what I'm suggesting is that we should rather fix wasi-libc instead.

wenyongh · 2023-03-25T03:00:27Z

@eloparco The PR for wasi-libc to fix the hang issue was merged, I think we may update CI to use that commit for both Ubuntu-20.04 and Ubuntu-22.04, and here use pthread_barrier_wait by default, so as to better test the wasi-threads cases. What's your opinion?
WebAssembly/wasi-libc@1dfe5c3

…ion to use

…c commit that fixes atomic operation

wenyongh · 2023-03-26T00:25:54Z

.github/workflows/compilation_on_android_ubuntu.yml

          git fetch https://github.com/WebAssembly/wasi-libc \
-            8f5275796a82f8ecfd0833a4f3f444fa37ed4546
+            1dfe5c302d1c5ab621f7abf04620fae92700fd22


Had better also compile wasi-libc in Ubuntu-22.04 instead using the pre-release wasi-sdk-thread version since there is issue in it:
Merge L337 to L346, and remove L366 and

I thought of that, but it would mean that now we remove the usage wasi-sdk 20 pre-release from the ci, basically reverting all the changes previously added in #2021. When wasi-sdk 20 final release comes out we will have to add those changes again.

I was leaving the wasi-sdk 20 part as it is since in practice we didn't encounter problems in the main branch, the hanging issues, only reproducible by running each test many times, always passed unnoticed in the CI.

Got it, so let's keep it.

wenyongh · 2023-03-26T00:26:14Z

.github/workflows/compilation_on_android_ubuntu.yml

          git fetch https://github.com/WebAssembly/wasi-libc \
-            8f5275796a82f8ecfd0833a4f3f444fa37ed4546
+            1dfe5c302d1c5ab621f7abf04620fae92700fd22


Same as above

…ytecodealliance#2028) Update wasi-libc version to resolve the hang issue when running wasi-threads cases. Implement custom sync primitives as a counterpart of `pthread_barrier_wait` to attempt to replace pthread sync primitives since they seem to cause data races when running with the thread sanitizer.

eloparco mentioned this pull request Mar 15, 2023

Fix atomic.wait, get wasi_ctx exit code and thread mgr issues #2024

Merged

g0djan reviewed Mar 15, 2023

View reviewed changes

core/iwasm/libraries/lib-wasi-threads/test/sync_primitives.h Outdated Show resolved Hide resolved

eloparco marked this pull request as ready for review March 15, 2023 23:06

eloparco force-pushed the eloparco/sync-primitives branch from e47659c to d8f1e88 Compare March 15, 2023 23:15

hritikgupta reviewed Mar 16, 2023

View reviewed changes

core/iwasm/libraries/lib-wasi-threads/test/sync_primitives.h Show resolved Hide resolved

hritikgupta reviewed Mar 16, 2023

View reviewed changes

core/iwasm/libraries/lib-wasi-threads/test/sync_primitives.h Outdated Show resolved Hide resolved

eloparco force-pushed the eloparco/sync-primitives branch 2 times, most recently from 6dd4b53 to 8ef3797 Compare March 16, 2023 10:24

wenyongh approved these changes Mar 20, 2023

View reviewed changes

g0djan reviewed Mar 20, 2023

View reviewed changes

eloparco force-pushed the eloparco/sync-primitives branch from 0392599 to dae6923 Compare March 26, 2023 00:11

eloparco added 5 commits March 26, 2023 00:13

feat: implement sync primitives instead of using pthread ones

c83b8b4

fix: notify all waiting threads

fcea51c

fix: reset barrier when threshold is reached

4f5ff0c

feat: use preprocessor option to decide pthread primitive implementat…

1227d42

…ion to use

fix: remove outdated sample now part of the tests

c379e63

feat: use pthread primitives by default and update ci to use wasi-lib…

6b2999f

…c commit that fixes atomic operation

eloparco force-pushed the eloparco/sync-primitives branch from dae6923 to 6b2999f Compare March 26, 2023 00:18

wenyongh reviewed Mar 26, 2023

View reviewed changes

wenyongh merged commit 0f73ce1 into bytecodealliance:main Mar 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sync primitives instead of using pthread ones #2028

Implement sync primitives instead of using pthread ones #2028

eloparco commented Mar 15, 2023 •

edited

Loading

eloparco commented Mar 16, 2023

g0djan commented Mar 16, 2023

wenyongh commented Mar 17, 2023

eloparco commented Mar 17, 2023

wenyongh commented Mar 17, 2023

eloparco commented Mar 17, 2023

g0djan commented Mar 17, 2023

g0djan commented Mar 18, 2023 •

edited

Loading

wenyongh commented Mar 18, 2023

wenyongh left a comment

g0djan Mar 20, 2023

wenyongh Mar 22, 2023

eloparco Mar 22, 2023

loganek commented Mar 22, 2023

wenyongh commented Mar 22, 2023

ttrenner commented Mar 22, 2023

loganek commented Mar 22, 2023

wenyongh commented Mar 25, 2023 •

edited

Loading

wenyongh Mar 26, 2023

eloparco Mar 26, 2023

wenyongh Mar 26, 2023

wenyongh Mar 26, 2023

Implement sync primitives instead of using pthread ones #2028

Implement sync primitives instead of using pthread ones #2028

Conversation

eloparco commented Mar 15, 2023 • edited Loading

eloparco commented Mar 16, 2023

g0djan commented Mar 16, 2023

wenyongh commented Mar 17, 2023

eloparco commented Mar 17, 2023

wenyongh commented Mar 17, 2023

eloparco commented Mar 17, 2023

g0djan commented Mar 17, 2023

g0djan commented Mar 18, 2023 • edited Loading

wenyongh commented Mar 18, 2023

wenyongh left a comment

Choose a reason for hiding this comment

g0djan Mar 20, 2023

Choose a reason for hiding this comment

wenyongh Mar 22, 2023

Choose a reason for hiding this comment

eloparco Mar 22, 2023

Choose a reason for hiding this comment

loganek commented Mar 22, 2023

wenyongh commented Mar 22, 2023

ttrenner commented Mar 22, 2023

loganek commented Mar 22, 2023

wenyongh commented Mar 25, 2023 • edited Loading

wenyongh Mar 26, 2023

Choose a reason for hiding this comment

eloparco Mar 26, 2023

Choose a reason for hiding this comment

wenyongh Mar 26, 2023

Choose a reason for hiding this comment

wenyongh Mar 26, 2023

Choose a reason for hiding this comment

eloparco commented Mar 15, 2023 •

edited

Loading

g0djan commented Mar 18, 2023 •

edited

Loading

wenyongh commented Mar 25, 2023 •

edited

Loading