Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sync primitives instead of using pthread ones #2028

Merged
merged 6 commits into from
Mar 26, 2023

Conversation

eloparco
Copy link
Contributor

@eloparco eloparco commented Mar 15, 2023

Attempt to replace pthread sync primitives since they seem to cause data races when running with the thread sanitizer. Those data races appear as concurrent accesses to load/store operations when running in classic interpreter mode.

In particular, this PR implements mutex and barrier to replace pthread_mutex_t and pthread_barrier_t in the tests with threads.

@eloparco eloparco marked this pull request as ready for review March 15, 2023 23:06
@eloparco eloparco force-pushed the eloparco/sync-primitives branch from e47659c to d8f1e88 Compare March 15, 2023 23:15
@eloparco
Copy link
Contributor Author

After this PR I don't see the load/store race conditions anymore. I need to fix the failure on the specification test for atomics (that's breaking in the CI) and then do some more testing.

@eloparco eloparco force-pushed the eloparco/sync-primitives branch 2 times, most recently from 6dd4b53 to 8ef3797 Compare March 16, 2023 10:24
@g0djan
Copy link
Contributor

g0djan commented Mar 16, 2023

After this PR I don't see the load/store race conditions anymore. I need to fix the failure on the specification test for atomics (that's breaking in the CI) and then do some more testing.

Me neither, no TSAN warnings

@wenyongh
Copy link
Contributor

@eloparco Wondering why we replace the pthread_barrier_wait with self-implemented barrier_wait, can we confirm that the hang issue is caused by wasi-libc pthread_barrier_wait or by runtime? If not, could we keep the implementation of both, and add a macro to control whether using it or self-implemented barrier_wait?

BTW, I tried to debug the hang issue and found that the last thread didn't notify others after it entered pthread_barrier_wait, seems there is memory overwritten. Need to check more. One thing to confirm, the aux stack of the thread created is allocated previously in the bytecode, no need to be allocated by runtime, right?

@eloparco
Copy link
Contributor Author

@eloparco Wondering why we replace the pthread_barrier_wait with self-implemented barrier_wait, can we confirm that the hang issue is caused by wasi-libc pthread_barrier_wait or by runtime? If not, could we keep the implementation of both, and add a macro to control whether using it or self-implemented barrier_wait?

This PR is part of the investigation to understand if the hang problems are coming from the usage of pthread primitives from wasi-libc. Using the self-implemented mutex and barrier the TSAN warnings and deadlocks disappear.
Yes, we can keep both using a macro, I don't think it's worth running each test with both.

BTW, I tried to debug the hang issue and found that the last thread didn't notify others after it entered pthread_barrier_wait, seems there is memory overwritten. Need to check more. One thing to confirm, the aux stack of the thread created is allocated previously in the bytecode, no need to be allocated by runtime, right?

I need to check, but it may just be the consequence of the data race since they cause undefined behavior?

@wenyongh
Copy link
Contributor

@eloparco Wondering why we replace the pthread_barrier_wait with self-implemented barrier_wait, can we confirm that the hang issue is caused by wasi-libc pthread_barrier_wait or by runtime? If not, could we keep the implementation of both, and add a macro to control whether using it or self-implemented barrier_wait?

This PR is part of the investigation to understand if the hang problems are coming from the usage of pthread primitives from wasi-libc. Using the self-implemented mutex and barrier the TSAN warnings and deadlocks disappear. Yes, we can keep both using a macro, I don't think it's worth running each test with both.

Yes, no need to test both, keeping pthread_barrier_wait version is just to be able to reproduce the issue and debug it.

BTW, I tried to debug the hang issue and found that the last thread didn't notify others after it entered pthread_barrier_wait, seems there is memory overwritten. Need to check more. One thing to confirm, the aux stack of the thread created is allocated previously in the bytecode, no need to be allocated by runtime, right?

I need to check, but it may just be the consequence of the data race since they cause undefined behavior?

I am not sure, I tried to understand the source code the pthread_barrier_wait source code, add comments below if you are interested too:
https://github.com/WebAssembly/wasi-libc/blob/main/libc-top-half/musl/src/thread/pthread_barrier_wait.c

        ...
	/* Otherwise we need a lock on the barrier object */
	while (a_swap(&b->_b_lock, 1))                                => wait and try to get the lock
		__wait(&b->_b_lock, &b->_b_waiters, 1, 1);
	inst = b->_b_inst;                                                      => lock is gotten, the below operations is locked

	/* First thread to enter the barrier becomes the "instance owner" */
	if (!inst) {                                                                   => the first thread enters into this branch to setup the instance,
		struct instance new_inst = { 0 };                              seems always OK
		int spins = 200;
		b->_b_inst = inst = &new_inst;
		a_store(&b->_b_lock, 0);
		if (b->_b_waiters) __wake(&b->_b_lock, 1, 1);
		while (spins-- && !inst->finished)
			a_spin();
		a_inc(&inst->finished);
		while (inst->finished == 1)
#ifdef __wasilibc_unmodified_upstream
			__syscall(SYS_futex,&inst->finished,FUTEX_WAIT|FUTEX_PRIVATE,1,0) != -ENOSYS
			|| __syscall(SYS_futex,&inst->finished,FUTEX_WAIT,1,0);
#else
			__futexwait(&inst->finished, 1, 0);
#endif
		return PTHREAD_BARRIER_SERIAL_THREAD;
	}

	/* Last thread to enter the barrier wakes all non-instance-owners */
	if (++inst->count == limit) {                                 => the last(fourth) thread enters into this branch,
		b->_b_inst = 0;                                                    Fails when hang: thread didn't enter into this branch
		a_store(&b->_b_lock, 0);
		if (b->_b_waiters) __wake(&b->_b_lock, 1, 1);
		a_store(&inst->last, 1);
		if (inst->waiters)
			__wake(&inst->last, -1, 1);
	} else {                                                                     => the second and third thread enters into this branch,
		a_store(&b->_b_lock, 0);                                       seems always OK
		if (b->_b_waiters) __wake(&b->_b_lock, 1, 1);
		__wait(&inst->last, &inst->waiters, 0, 1);
	}

Not easy to know the behavior of last thread, since when I added more printf, it doesn't hang again..

@eloparco
Copy link
Contributor Author

We can check the mutex lock/unlock code, since that one gives TSAN warnings as well and it should be easier to debug.

@g0djan
Copy link
Contributor

g0djan commented Mar 17, 2023

@wenyongh prints stoped working for me as well at some point. I wanted to use WAMR source debugger so I can attach when it stuck, but the lld patch is out of date. I opened #2035 for that

@g0djan
Copy link
Contributor

g0djan commented Mar 18, 2023

Didn't manage to build wasmtime with thread sanitizer but managed to run tests that hang for wamr on pthread_barrier_wait and on wasmtime it doesn't hang after 10k runs, the same .wasm binary file easily hangs. on wamr after 500 iterations

So it might mean that problem is not with pthread_barrier actually in that case

@wenyongh
Copy link
Contributor

Didn't manage to build wasmtime with thread sanitizer but managed to run tests that hang for wamr on pthread_barrier_wait and on wasmtime it doesn't hang after 10k runs, the same .wasm binary file easily hangs. on wamr after 500 iterations

So it might mean that problem is not with pthread_barrier actually in that case

Yes, I also found that it doesn't hang when running with WAMR AOT mode, so it should doesn't matter with pthread_barrier_wait.

The root cause might be the CPU runtime memory ordering:
https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory_ordering

For example, the data written by thread A is in cache, but not actually written into memory yet, and the data read by thread B is from memory and may be invalid. We may need to add some memory barrier operations in interpreter like what AOT does: I tried to added some, and it didn't hang so often, I uploaded a rough patch, maybe you can have a further investigation:
mem_order.zip

Copy link
Contributor

@wenyongh wenyongh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

#if USE_PTHREAD_SYNC_PRIMITIVES != 0
#include <pthread.h>
#else
#include "sync_primitives.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyongh @eloparco

I'd like to use this primitives to easier find the other errors, but do we want to use it as default one though?

Pros:

  • stable CI, if people break something they will be sure it's because of their changes

Cons:

  • now pthreads spot the problems that need to be fixed anyway but we would hide it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, had better use the pthread_barrier_wait after the hang issue is resolved. But note that there will be some normal load/store data races in wasi-libc implementation, e.g. a_swap:
https://github.com/WebAssembly/wasi-libc/blob/main/libc-top-half/musl/src/internal/atomic.h#L106-L115

	int old;
	do old = *p;
	while (a_cas(p, old, v) != old);
	return old;

Here old = *p is not atomic, and a_cas(p, old, v) is atomic, if two threads operate on the same address p, data races may be reported by sanitizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, had better use the pthread_barrier_wait after the hang issue is resolved.

So should we merge this one as it is then? Until we get it fixed in wasi-libc.

Here old = *p is not atomic, and a_cas(p, old, v) is atomic, if two threads operate on the same address p, data races may be reported by sanitizer.

Nice catch, that's probably what the thread sanitizer is complaining about for the load/store data races. It may be the same for other operations as well, I see the a_fetch_add a few lines later doing something similar.

@loganek
Copy link
Collaborator

loganek commented Mar 22, 2023

How much effort would be to fix wasi-libc? I personally would prefer to do it there rather than re-implementing those primitives; this is extra maintenance effort + WASI community won't benefit from your changes here as they're WAMR internal.

@wenyongh
Copy link
Contributor

How much effort would be to fix wasi-libc? I personally would prefer to do it there rather than re-implementing those primitives; this is extra maintenance effort + WASI community won't benefit from your changes here as they're WAMR internal.

Just to remove a line in libc-top-half/musl/arch/wasm32/atomic_arch.h, I submitted a PR to it:
WebAssembly/wasi-libc#403

@ttrenner
Copy link
Contributor

How much effort would be to fix wasi-libc? I personally would prefer to do it there rather than re-implementing those primitives; this is extra maintenance effort + WASI community won't benefit from your changes here as they're WAMR internal.

The question can be raised in general: which things should be part of wasi and which should be part of wamr. Often this results in a chicken & egg problem: no one is implementing it as long as it is not standardized, but on the other hand no one pushes standardization if no one wants it to have/implement it. So I would propose to go forward and push it/make it visible at the community (with the risk of doing some waste).

@loganek
Copy link
Collaborator

loganek commented Mar 22, 2023

How much effort would be to fix wasi-libc? I personally would prefer to do it there rather than re-implementing those primitives; this is extra maintenance effort + WASI community won't benefit from your changes here as they're WAMR internal.

The question can be raised in general: which things should be part of wasi and which should be part of wamr. Often this results in a chicken & egg problem: no one is implementing it as long as it is not standardized, but on the other hand no one pushes standardization if no one wants it to have/implement it. So I would propose to go forward and push it/make it visible at the community (with the risk of doing some waste).

I don't think the question is which bits should be a part of wasi-libc and which should not be - those primitives are already part of wasi-libc, but they don't work (at least according to this PR). From what I understand, we re-implement those sync primitives to not use the (probably) broken ones from wasi-libc, but what I'm suggesting is that we should rather fix wasi-libc instead.

@wenyongh
Copy link
Contributor

wenyongh commented Mar 25, 2023

@eloparco The PR for wasi-libc to fix the hang issue was merged, I think we may update CI to use that commit for both Ubuntu-20.04 and Ubuntu-22.04, and here use pthread_barrier_wait by default, so as to better test the wasi-threads cases. What's your opinion?
WebAssembly/wasi-libc@1dfe5c3

@eloparco eloparco force-pushed the eloparco/sync-primitives branch from 0392599 to dae6923 Compare March 26, 2023 00:11
@eloparco eloparco force-pushed the eloparco/sync-primitives branch from dae6923 to 6b2999f Compare March 26, 2023 00:18
git fetch https://github.com/WebAssembly/wasi-libc \
8f5275796a82f8ecfd0833a4f3f444fa37ed4546
1dfe5c302d1c5ab621f7abf04620fae92700fd22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had better also compile wasi-libc in Ubuntu-22.04 instead using the pre-release wasi-sdk-thread version since there is issue in it:
Merge L337 to L346, and remove L366 and

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of that, but it would mean that now we remove the usage wasi-sdk 20 pre-release from the ci, basically reverting all the changes previously added in #2021. When wasi-sdk 20 final release comes out we will have to add those changes again.

I was leaving the wasi-sdk 20 part as it is since in practice we didn't encounter problems in the main branch, the hanging issues, only reproducible by running each test many times, always passed unnoticed in the CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so let's keep it.

git fetch https://github.com/WebAssembly/wasi-libc \
8f5275796a82f8ecfd0833a4f3f444fa37ed4546
1dfe5c302d1c5ab621f7abf04620fae92700fd22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@wenyongh wenyongh merged commit 0f73ce1 into bytecodealliance:main Mar 26, 2023
victoryang00 pushed a commit to victoryang00/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
…ytecodealliance#2028)

Update wasi-libc version to resolve the hang issue when running wasi-threads cases.

Implement custom sync primitives as a counterpart of `pthread_barrier_wait` to
attempt to replace pthread sync primitives since they seem to cause data races
when running with the thread sanitizer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants