-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasm2c: we need to pick standard C data structure implementations to finish thread proposal #2258
Comments
I'm definitely not an expert in wait/notify primitives. Would we be first non-Web Wasm engine to support this proposal? We might be blazing totally new trails in terms of implementing this outside the comforts of a JavaScript VM that already has mature implementations of wait/notify. If you want useful feedback on the design, it might be necessary to read a broader overview of your thinking first. :-) BTW: do you have a particular multi-threaded application in mind that it would be cool to support with wasm2c? (Is emscripten/LLVM already producing these instructions?) Off the top of my head it seems like we'd have a few options:
Is this sort of the landscape as you see it? |
Maybe for these advances features we could maybe generate C++ and use |
I think this maybe supported in wamr and wasmtime as well Wamr for example uses lists and hashmaps to implement this
Ah actually, I was deliberately avoiding this to keep things simple. Broadly speaking, await notify has to employ a mix of slower spinlock fallback strategies and platform specific fast APIs when available as used in the C++ standard library I think this is actually a bit of a distraction though. The TLDR here is that there is no way to implement these strategies in Wasm compliant way without hashmaps and lists. The full details in case you are curious, is that wasm specifies that there are no "spurious wakeups" which is bad design choice imho, as this means wasm implementations are on the hook to maintain datastructures that prevent spurious notifies (hence the hashmaps and lists, i.e. the association map from your earlier comment). In practice, spurious notifies are best handled a layer above, as there are often fast domain specific approaches to check if a wakeup is spurious. For example, the typical use of wait/notify is to indicate when a memory value has changed, so a spurious wakeup can be checked by re-reading the value at the memory location and comparing with the old problem (ABA problems notwithstanding)
i kinda just want pthread_create and pthread_join to work tbh :) there are libraries we want to sandbox which do use threadpools internally. In our research, we have previously seen this in video decoding libraries like libvpx
I think we are on the same page, but I'll document the caveat here just in case --- We definitely won't get to a place where we can fully rely on std::atomic for atomic operations like atomic add, sub etc.
We could potentially use C++ std::atomic_wait/std::atomic_notify for just the await/notify part of the spec. A pure C implementation currently requires the following
while the C++ implementation will include all of the above with C++ 11 and provides a builtin atomic wait/notify with C++ 20 Personally I don't have a strong preference between the C or C++11 version and I'm happy to go with either. I am not a fan of relying on C++20 yet, as many platforms don't have this yet, and Firefox doesn't use this yet. If we go with the C implementation, we would have to find external implementation of at least the map data structures (the others are reasonable enough to implement by hand if needed). What are your preferences @sbc100 @keithw ? |
Hmm, can you link to the Wasmtime / V8 / SpiderMonkey implementations? I'm curious what they do. (Wasmtime doesn't check the box at https://webassembly.org/roadmap/ so I assumed they don't support this one.) I don't quite understand the connection between preventing spurious wakeups and maintaining a dynamic data structure (like a map) -- doesn't std::atomic also prevent spurious wakeups without any ancillary data structure? (Point taken that std::atomic can't be used for add, sub, etc.) What is your thinking about whether this belongs in the generated code vs. creating a new runtime API surface and punting these dynamic data structures to the runtime? Right now the generated code never calls malloc or free (and doesn't maintain any dynamic global data structures at all). Tentatively it seems like it would be nice to maintain this -- you ok with that? |
The wasmtime implementation is here bytecodealliance/wasmtime#5255 (comment) They also rely on a map, but with conditional variables. Not sure if these cvs are backed by futex syscalls behind the scenes though. I haven't looked at the browser engine implementations though as these are likely to be heavily tied to JavaScript engines/APIs which we can't easily reuse.
So here is my train of thought. When using syscalls like futex, there is chance of spurious wakeups. If wasm allowed spurious wakeups, we could just do something like
But wasm doesn't allow for spurious wakeups. So we need some way to check if the wakeup is spurious. For this check, we can maintain an auxiliary data structure which has state that we can check. Here is a simple example
Depends on the implementation I think. See discussion around tables here https://github.com/ogiroux/atomic_wait In general though, my conclusion is that the table-free implementations are not possible if we want to meet all wasm criteria such as:
I don't have a preference where this code lives tbh as long as I can "git clone wabt", and use wasm2c to compile a wasm file which uses pthreads and have everything work as expected. So I'm fine adding some of this to the wasm runtime code in this repo versus generated code if y'all prefer that. Takeaway: whether implemented in the generated code or runtime code, the algorithms above need a map data structure (and a list as well to support multiple waiters) to store state to prevent spurious wakeups. We should either pick a C data structure library and include that as a dependency of wasm2c (sort of like how the simde library is pulled in as a dependency to implement wasm simd), or go with the c++11 impl. |
I think if you're amenable to having this be a runtime responsibility, and having our own runtime implementation require C++20 and basically just shell out to std::atomic_wait / atomic_notify_x, that might be the best path to happiness for everybody -- it seems like a lot less code to worry about maintaining/testing/reviewing than if we have to implement this stuff from scratch. :-/ I did find an article on the (seemingly bleeding-edge) implementation of atomic_wait/notify in libstdc++. They agree with you that an ancillary table is required to efficiently wait on a 64-bit value on Linux. It looks like they're using a strategy without dynamic data structures (map or list) -- they seem to statically declare a table of 16 waiters (futex or pthreads condition variable), and they use a dead-simple hash function (basically just modulo) to map addresses to the 16 waiters, and that seems to be basically it as far as data structures. It looks like each |
BTW, isn't it also UB in C any time that two threads load/store from an overlapping memory region unless both operations are atomic? Are we okay with this, or... what does the Wasm proposal say is supposed to happen if two agents load/store from the same shared memory if at least one of them isn't using atomic operations? |
The Wasm proposal says you'll get very weak, defined behaviour (roughly, all possible interleavings of all bytes of all racing writes may be observed by each racing read). Naively translating Wasm non-atomics back to C non-atomics would theoretically leave open the possibility for non-Wasm-conforming behaviours to be observed when the C is compiled natively. A more conservative translation would be to convert Wasm non-atomics to C11 relaxed atomics if the memory is shared. |
Hi all, I just randomly found this issue which seemed to be looking for implementations of wait/notify outside browser. Just in case it helps, here is an implementation in Go https://github.com/wasilibs/wazero/blob/main/internal/wasm/memory.go#L298 Indeed, the core data structures are a map of lists. Synchronization is with channels, which we're lucky to have in Go. We're currently holding off on merging this officially as the threads proposal settles (hence "finish thread proposal" in this issue title catching my eye, I understand this is about wasm2c specifically :) ), but linking just in case it can help. Note that while I don't know if this approach is optimal, it is more or less the same implementation as the semi-official Go semaphore implementation (golang.org/x/sync/semaphore) so I suspect it's at least reasonable. |
Thanks conrad-watt and anuraaga for the info :) I think based on the above, I agree we can tentatively start with the C++ 20 runtime implementation as a stopgap --- this will allow us to see what the implementation looks like and play with real code. However, if/when this code is actively used in Firefox, this use case would likely need either a C++11 or C implementation of the same (and thus need to interact with the OS specific futex APIs) . But we can cross the bridge when we get to it. |
@keithw @sbc100 I've been looking at finishing the wait/notify part of the thread atomics proposal (https://github.com/WebAssembly/threads/blob/main/proposals/threads/Overview.md) and after going through a few different implementations, I have come to the conclusion that we would definitely need a map and list data structures we can use from C. Especially for the map data structure, I want to avoid rolling my own and wanted to see if we could figure out a standard C data structure library we can pull into wasm2c's output. Please let me know your thoughts on this.
The text was updated successfully, but these errors were encountered: