Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider optimizations for heavily-nested scenarios #75

Open
lukewagner opened this issue Aug 27, 2024 · 3 comments
Open

Consider optimizations for heavily-nested scenarios #75

lukewagner opened this issue Aug 27, 2024 · 3 comments

Comments

@lukewagner
Copy link
Member

lukewagner commented Aug 27, 2024

Once this proposal gets to Phase 3 and we can study the performance of realistic workloads, I think it's important for this group to measure a workload that, e.g., uses stack-switching for both generators with green-threading where there are stacks of the form:

| green-thread-handler | generator-handler 0 | ... | generator-handler N | suspend to green-thread-handler |

Because generator nesting depth is controlled by the user program, this N can be arbitrarily-large (e.g., generators can be called recursively) and a naive implementation will require an O(N) search when switching between green threads.

Assuming that the cost of this O(N) search isn't somehow amortized away, it seems like we should work out a predictable optimization that engines can perform so that, in practice, suspension to the green-thread-handler is O(1) (like a native engine would do). Although optimizing the O(N) case in general doesn't seem possible, identifying a subset of cases that cover the above scenario does seem imminently possible. However, I think it's important for us to have a proper shared discussion about what this predictable optimization is before declaring this feature finished so that (1) we can publish it in some form and continue to claim that wasm has predictable performance, (2) if there is some small tweak or addition to the proposal that would make the optimization significantly simpler or more effective, we still have the opportunity to make that tweak.

@lukewagner
Copy link
Member Author

lukewagner commented Aug 28, 2024

Just to seed the discussion with a concrete idea, here's a simple optimization that might be sufficient to cover the generator+green-thread case:

In the execution context, maintain a possibly-empty, 1-element "switch-cache" with two fields -- a tag $e and a pointer to the innermost handler with (on $e ...) where the ... much be switch -- by doing the following:

  • When entering wasm from the host, clear the switch-cache.
  • When performing a resume:
    • If there is a switch handler, set the cache before resuming the continuation and clear the cache if the continuation returns. (EDIT: wrong, see next comment)
    • If there is a non-switch handler whose tag matches the tag in the switch-cache: clear the switch-cache.
  • When performing a suspend or throw: clear the switch-cache if the stack search hits the handler in the switch-cache.
  • When performing a switch:
    • If the switch tag matches the tag in the switch-cache: use the handler in the switch-cache.
    • Otherwise, perform a stack search as usual and fill the switch-cache with the results of the search.

Obviously there are more complex scenarios (with multiple nested switch handlers with different tags or resume used to indirectly switch) that wouldn't be covered by this optimization and extensions to the optimization to handle such cases (with more overhead to maintain) so there's an interesting engineering tradeoff to be made. The difference between this optimization kicking in or not can be the difference between aggregate O(N) and O(N2) behavior, so I think it's important that we clearly spell out what optimizations a toolchain can rely on since, if I'm a toolchain author, this may significantly impact the compilation strategy I choose. If I can't rely on any such optimization, I may not even want to use directly-nested handlers at all (even when it would otherwise be the natural way to do it) if I also want to allow green threading and don't want O(N2) worst-case behavior.

@tlively
Copy link
Member

tlively commented Aug 28, 2024

If there is a switch handler, set the cache before resuming the continuation and clear the cache if the continuation returns.

This would only be correct if the resumed continuation did not include a deeper switch handler for the same tag. We have the same problem when switching to a continuation. To maintain the invariant that the cache always points to the innermost switch handler, we could record the innermost switch handler on each continuation and use it to populate the cache on switch and resume. Alternatively, we could simply record whether each continuation contains a switch handler and leave the cache cleared when switching to a continuation that contains one.

We should definitely have spec tests that exercise these edge cases.

If there is a non-switch handler whose tag matches the tag in the switch-cache: clear the switch-cache.

This part shouldn't be necessary since switch handlers and suspend handlers with the same tag do not interfere with each other. (See #78)

@lukewagner
Copy link
Member Author

lukewagner commented Aug 28, 2024

This would only be correct if the resumed continuation did not include a deeper switch handler for the same tag.

Oops, right; I was only thinking of the resuming a fresh continuation; thanks! (All the more reason to have these optimizations well-described and well-tested...) But yeah, saving this cache on the continuation itself makes sense as a fix.

This part shouldn't be necessary since switch handlers and suspend handlers with the same tag do not interfere with each other. (See #78)

Oh fantastic; I was hoping for exactly that and skimming the explainer for whether or not that was the case and thinking it might be a tweak to suggest to avoid the overhead of guarding for it all the time in this optimization.

dhil pushed a commit to dhil/wasm-stack-switching that referenced this issue Jan 15, 2025
This introduces an IndexValue typedef, which is a union of Number and
BigInt, and two algorithms, IndexValueToU64 and U64ToIndexValue, which
can be used to convert between IndexValue and WebAssembly's u64 type
(used in the embedding spec).

It also makes several drive-by fixes and improvements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants