-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Asyncify/Bysyncify] Mechanism to exclude indirect calls in certain functions #2218
Comments
We have an option already to ignore all indirect calls ( If the indirect call type is enough to determine what matters and what doesn't, that would be interesting to know as well - I can add optimizations for that. But in a big enough project with similar indirect call types all around, that won't help much of course. Overall, we may need a mechanism like this, although I've been hoping it's not necessary as the overhead is much lower in my tests compared to Asyncify/Emterpretify, and these hand-crafted lists are error-prone and add complexity. But it's possible we still need such a thing. |
Oh, another question - which optimization level was the overhead measured on? Unoptimized builds with Bysyncify will be very large and slow, but optimized builds should be pretty reasonable (except for pathological cases of massive functions etc.). |
Yes, that's too coarse grained unfortunately. We do often have indirect class on the stack (it's a programming language after all, so users can do whatever they want ;)). For example, we know that gmp, mpfr, and LLVM are never on the unwind stack, but they do provide callbacks to count allocations, which are indirect calls, so end up getting transformed all the way through.
Unfortunately, I don't think this is sufficient :(, there's just too many possible targets in the code base.
Yes, agreed. However, one additional point I should make is that when we generate wasm from julia proper (as opposed to the runtime), we do have this information at the language level, so can just plumb it through. At that point, it wouldn't be a hand-maintained source annotation, but just another semantic from the high level language that gets plumbed through (since the high level language can reason about any possible side effects for a given call site, even if it ends up as an indirect call).
I tried both optimized and unoptimized builds, but the bysyncify version of even the optimized build was noticably slower than even the unoptimized build without it (the whole thing itself is fairly slow at the moment, since the JIT is off and I didn't cross compile the julia system image, so everything is interpreted - that's to be worked on, but in the meantime it makes any slow down super noticable). |
Thanks @Keno, yeah, that makes sense. First I'd like to finish investigating the perf issue in that PR - I am hoping that when optimized the overhead will decrease from 300% to just 25%. If so, then this becomes less urgent, but still interesting. As for how to do this, there isn't a great way to annotate functions through the entire pipeline, sadly. One option might be to have a |
I think the problem with this approach is that inlining is no longer semantically correct (because a call to the magic function will end up in the outer function, which may have indirect calls that unwind). I do still think plumbing something through the stack might be worthwhile. In the particular case I was looking at, there's an indirect call in the allocator that gets picked up, so basically every function in the system gets transformed, despite most not needing it. I do agree it would be a bit odd to, say, add a flag to the wasm object format specifically for bysyncify, but perhaps we could simply do something like emit the llvm.global.annotations array into a separate section of the wasm file and have an appropriate contract with bysyncify to understand that. I feel like such a mechanism could be generically useful to exploration as wasm evolves. |
Talking to @dschuff and @jgravelle-google , there are some ideas about how we could pass metadata through from clang and LLVM IR all the way to wasm and binaryen. But there is no actual plan or work yet. |
Ok, after getting data from Julia, DosBox, and Doom 3, it definitely looks like we want some sort of manual list mechanism, as indirect calls are too much of a problem (I did notice Julia has a bunch of In the long term some source-level mechanism would be good as discussed above. That'll take work and time though. In the short term just a plain list of functions, like The whitelist option would mean "only modify these functions, literally nothing else"; the blacklist option would mean "don't modify these specific functions, otherwise do what you think is right". Would the whitelist be enough for everyone? It sounded on the mailing list like that might be the case for DosBox and Doom 3. |
I don't think whitelist is sufficient for us, though blacklist would be, since we just have a few specific problematic functions that do indirect calls, but we know cannot task switch, but many other functions can be on the stack during task switch and we don't want to have to list them. |
Makes sense, thanks. Ok, I'll work to add both types of lists. |
The blacklist means "functions here are to be ignored and not instrumented, we can assume they never unwind." The whitelist means "only these functions, and no others, can unwind." I had hoped such lists would not be necessary, since Asyncify's overhead is much smaller than the old Asyncify and Emterpreter, but as projects have noticed, the overhead to size and speed is still significant. The lists give power users a way to reduce any unnecessary overhead. A slightly tricky thing is escaping of names: we escape names from the names section (see #2261 #1646). The lists arrive in human-readable format, so we escape them before comparing to the internal escaped names. To enable that I refactored wasm-binary a little bit to provide the escaping logic, cc @yurydelendik If both lists are specified, an error is shown (since that is meaningless). If a name appears in a list that is not in the module, we show a warning, which will hopefully help people debug typos etc. I had hoped to make this an error, but the problem is that due to inlining etc. a single list will not always work for both unoptimized and optimized builds (a function may vanish when optimizing, due to duplicate function elimination or inlining). Fixes #2218.
A slight complication here is that we need function names for the lists. This extends the current symbol-map mechanism to keep around function names in the wasm until the very end of the optimization pipeline. (There are still risks with inlining and duplicate function elimination changing things, though, which can't be solved without adding some new special function annotation.) Uses WebAssembly/binaryen#2218
A slight complication here is that we need function names for the lists. This extends the current symbol-map mechanism to keep around function names in the wasm until the very end of the optimization pipeline. (There are still risks with inlining and duplicate function elimination changing things, though, which can't be solved without adding some new special function annotation.) Uses WebAssembly/binaryen#2218
In investigating the performance of bysyncify (for JuliaLang/julia#32532), I'm noticing a significant number of functions being instrumented that could never appear on stack during an unwind. The reason that these functions are being instrumented is that they're using indirect calls. However, the call targets for these indirect calls is limited and none of the call targets can possibly unwind. It would be useful to be able to exclude these from the bysyncify transform. In the short term, a command line argument to emcc and bysyncify is probably sufficient, though longer term it would be nice to be able to annotate such functions in the C source (since that's a more natural place to keep this knowledge in sync). I don't know the wasm object file format well enough to know if there is some mechanism we could piggy back on to plumb this information through (cc @sbc100 for thoughts on that).
The text was updated successfully, but these errors were encountered: