-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: First-Class Stacks #1360
Comments
On first glance this seems more low level than #1359, which generally seems like a good direction. Can you give a summary of the major differences? "a process for extending stacks with additional stack frame".. this is the least clear to me. Generally a stack receives a new stack frame on every function call.. what is the run-time cost of this "extension"? Does it have benefits (in terms of needed to pre-allocate less memory)? |
I should say that multi-shot continuations make programming with resources quite difficult. When a programmer writes a function, there is an implicit assumption that the function will return exactly once (either normally or exceptionally). Developers write code that defensively guards against these two cases. With multi-shot continuations, the functions may return more than once and this can break code that has nothing to do with continuations. Consider the hypothetical example,
|
@aardappel I'll try to write that summary up for you shortly. As for extension, it lets you add a stack frame to a stack that is not being executed. I don't know if that answers your question though. @kayceesrk Your concern is one of the reasons why we chose not to offer direct support for multi-shot continuations or stack duplication. That is, programs do not have to worry about their stacks being arbitrarily duplicated. But for programs that do want multi-shot continuations, they can combine stack inspection and stack extension to build duplicates of (just) their own stack frames. So if you want the functionality, you can implement it for yourself, but only for yourself. |
I'm not clear on why |
I also don't understand the semantic difference between the |
Yes. We've found a number of more advanced examples have multiple receiving events, and we didn't find that types would save much (if any) performance, for reasons similar to call tags (switching on the event tag is easy, and then you can turn it into an exception if you don't recognize it). I also am writing up an example right now that happens to make use of the dynamic scoping of handlers.
Can you clarify what this similar function you are envisioning does? |
Hm, well, the example you have catches two different exceptions at that point, so I retract my earlier comment. It is different from the equivalent function. |
@aardappel This proposal is certainly lower level than that of #1359. I've written up a translation of #1359 into this proposal in soil-initiative/stacks#10. The translation should illustrate both the versatility of this proposal and how #1359's instructions are each a combination of lower-level operations, some of which are not obviously cheap. I can give a summary of it and the takeaways later, but at the moment I need to go to bed. This was a surprisingly fun exercise that I probably should not have let keep me up so late 😄 |
IIUC
where
I suppose you are planning to show how something like this works in your example. |
I see your reasoning, but the comparison is imprecise. For example, shift from shift/reset involves performing a stack inspection to find the closest containing reset in the evaluation context. That reset then informs you what stack to switch to. Notice that this involves two steps: find the stack to switch to, and then switch to it. This is generally true for control operators in surface-level languages, and it is likewise true for #1359. But this proposal is much lower lever, and as such we have made those steps separate, and we have found that applications make use of them in a wide variety of manners. The issue with composability you allude to has to do with how the stack is found. As you say, shift/reset has the problem that its semantics for finding the stack often finds the wrong stack (i.e. features interfere). But our proposal, being low level, does not prescribe in any way how the stack is found. It is up to the application to specify that process, and we have found that they do so in a wide variety of manners. Part of the misunderstanding is my fault though. In the linked detailed write-up, I make clear that this is designed to complement a design for stack inspection (such as in #1356), which is the low-level feature used by most control operators to implement their stack-finding process. The design in #1356, and used by our examples, repurposes call tags in a manner that ensures composability in much the same way that the event in #1356 ensures composability. If you want to understand this in more detail, I recommend reading the write up on how to implement composable stacks in this proposal provided here. Or, if you're up for a deep dive, you can read the write up on how to implement all of #1356 in this proposal here. |
Thanks for the response @RossTate.
I'll have a read through this. |
@RossTate Thanks for https://github.com/soil-initiative/stacks/pull/10/files?short_path=4d5ad29#diff-4d5ad29d317b0ffd003c55a1b691f51e. The first thing I wonder is the It seems obvious that your proposal is more low level / has smaller/simpler primitives, though typically if you express 2 language features with the other, they can both end up looking verbose from the perspective of the other :) |
@aardappel Unfortunately, I am not quite sure what you are asking (as in, I can read it multiple ways, each of which is sensible, but among which I cannot tell which is your intention). Do you mind clarifying? If it helps in the meanwhile, I've added a second translation of #1359 to the pull request, this time using thread-local storage as was suggested here (supposing such an extension were made to WebAssembly).
Certainly. I tried to emphasize the points in the translation that are meaningful rather than those that are just shuffling. However, there do seem to be some important features missing from #1359. For example, this concern about the inability to inspect continuations, say for linear-memory GC roots, has yet to be addressed. So it seems likely that translating this proposal into #1359, while likely possible, would be unlikely to be faithfully represent a realistic implementation. |
@RossTate apologies, I am simply trying to get a sense of to what extend your translation of the #1359 primitives reflects the required semantics of #1359 or is simply an "impedance mismatch" between the two proposals. In particular, whether the use of a |
Ah, okay. First, to make sure we're on the same page, I wanna check you didn't miss the last sentence of this note:
So So I suspect what you're noticing with |
Hi @RossTate, thanks for the presentation today. I liked the fact that the proposal is no longer tied to stack inspection. I had the following observations regarding the two kinds of switch instructions At this point, the main differences between the two proposals are:
Regarding affine use of stackrefs, you'd mentioned the idea of Am I missing anything else that's important? |
I just want to say thanks for that great summary of the differences, @kayceesrk! |
@kayceesrk would #1359 benefit equally from the implementation optimisations which #1360 aims to achieve through ensuring that stackrefs are unique/affine? Or are there fewer synchronization issues in #1359 because of the stack-of-stacks structure? |
As Ross mentioned in the call, using up the continuation is cheap compared to the cost of stack switching. A continuation is just a ref that points to the stack, which gets overwritten to null when the continuation is resumed such that subsequent resumptions are not possible. No synchronisation is necessary since there isn't any parallelism in Wasm (yet). Even when there is, it only needs a CAS. As for whether fewer synchronizations are necessary, yes, #1359 will require fewer since it only creates a single continuation even for a deeply nested stack-of-stacks, whereas #1360 will need to allocate one continuation per a level of nesting. In practice though, we've seen that the depth of the handler stack is quite small, and the difference shouldn't matter. Certainly, useful examples such as async/await and generators have small nesting depth. |
Actually, I would argue that the cost of stack switch in #1360 is low. We have tried to get it to where it is comparable to the cost of a function call. In fact, it should be possible to switch stacks with a very small number (<10?) x64 instructions. |
Thanks @kayceesrk! I'm glad you made the talk. I think your summary hits a number of the big points which I'll try to refine/append here:
I don't find these two particularly analogous.
Cool!
Just to echo @kayceesrk (at least that's what I think I'm doing), I think the optimizations would apply to both proposals roughly equally in practice. That's part of way I didn't emphasize it in the presentation—I didn't want to accidentally give the impression it was a meaningful difference. @kayceesrk Have you had a chance to look at the translation in soil-initiative/stacks#10? I'm still happy to meet up to go over it with you. |
I'm not sure I understand the special linear type handling for |
Thanks for the great question, @Brion. The JS API has coercions to and from WebAssembly types. I would expect that the coercion from a |
Ok. I thought you were not using stack inspection, but turns out I was wrong. I am a little bit confused about why you would want to do this by hops or frame by frame. Isn't a
In that case, do you need two primitives, or can you just squash it into one? Does
Still haven't had a chance. Apologies. I'll find time to do this in the coming week or two. Another question I had is regarding the use of exnref for discriminating the reason for context switch. Would you need this at all? Couldn't you just encode that as part of the value that would be passed between the stacks when switching (through some other mechanism)? Are your slides from the talk available somewhere? I'd like to understand how values are passed when switching stacks. |
We are not using stack inspection to implement stack switching. But we would use stack inspection in combination with stack switching to implement #1359. In particular, because
Yes, per the two translations I gave, but either translation requires additional infrastructure (i.e. an additional
Yes. The difference is
No worries, though it would help a lot in determining if #1360 serves your needs just as well as #1359, or if there is some opportunity for further improvement we should incorporate.
We use exceptions but not
Sorry, I was waiting for the meeting notes to be added, but I've gone ahead and added them via WebAssembly/meetings#633. |
There are a number of control-flow patterns that go beyond the classic "last-in first-out" form. Coroutining, iterators based on yield, asynchronous I/O and, more exotically, delimited continuations, are all examples of non-linear control flow. These patterns are becoming increasingly important in applications that are intended to be dynamically responsive.
While it is possible to emulate many of these control-flow patterns using standard core WASM features, there is significant cost in doing so — in terms of run-time efficiency, code size, and composability.
At the same time, the space of control-flow patterns is sufficiently diverse that it may not be appropriate to design special mechanisms for each of them. Even focusing on coroutining, for example, there are at least two common patterns of coroutines—so-called symmetric and asymmetric coroutines—with significantly different implementation requirements. Furthermore, the details of how data is exchanged between coroutines also varies greatly; reflecting choices made at the language-design level as well as application design.
This proposal focuses on a suite of low-level mechanisms that would allow language implementers to build different variations on cooperative multi-tasking. Specifically, we focus on enabling the execution of WebAssembly programs on multiple stacks that a WebAssembly application can switch between.
Although important, we do not directly provide the mechanisms necessary to exchange messages between coroutines, nor do we "take a stand" on issues such as whether to support asymmetric vs. symmetric coroutines. Instead we provide the primitives that—in conjunction with other functionality WebAssembly already provides—enables the language implementer to develop their own mechanisms. We do, however, establish the framework for communicating the status of a coroutine when switching to another stack. That status must encode whether the coroutine is terminating and may encode values as part of that event.
A detailed presentation of this proposal can be found here, including detailed examples of how it can be used to support cooperative lightweight threads, asynchronous I/O, and (coming soon) multi-shot tagged delimited continuations. Here we (@fgmccabe and I) present a summary of the key ideas.
Stacks
The proposal introduces a single type:
stackref
. Astackref
is a reference to a complete stack. In particular, the stack never returns and is equipped with a "grounding" frame generated by the host that determines what to do if ever the stack completes execution through other means (e.g. traps). Depending on how the stack was created, that grounding frame might terminate the thread worker the stack is running on or might cause the next event in the event loop to be processed. Details aside, the point is the stack is self contained and can be executed on its own.In terms of implementation, a
stackref
references the leaf or active frame of the stack, rather than its root. This is represented by an instruction pointer paired with a stack pointer.Stack Switching
The key instruction in the proposal is
stack.switch
:stack.switch $event : [t* stackref] -> unreachable
event $event : [t* stackref]
This instruction transfers control to the designated stack. The event is repurposed from the exception-handling proposal. It is used to convey a message to the target stack. The
t*
values are the payload of that message plus astackref
. Thestackref
in the message is the reference to the stack that was just switched from, which is now in a suspended state.So if a lightweight thread (implemented as a
stackref
) wanted to yield control to another lightweight thread, it would do so with the following (where we usecatch $yielding $yielded
as shorthand for "catch only$resuming
exceptions and branch to$resumed
):The output type of
stack.switch
isunreachable
. That is because the instruction leaves the current stack in a suspended state, and when control is transferred back, the conveyed message is conceptually thrown as an exception from that point. In practice, we expect engines to optimize for this pattern and only throw an exception (i.e. start an unwinding stack walk) if there isn't a statically known catcher for the event.Notice that this design provides extremely fast switches between stacks. One essentially just swaps the instruction-pointer and stack-pointer registers with the registers storing the instruction pointer and stack pointer of the target
stackref
.Stack Construction
Because the host needs contextual information to create a grounding frame (e.g. is this module instance running on a thread worker or on the event loop), we provide no instruction for creating stacks, instead expecting modules to import a function for creating stacks from the environment, e.g.
(import "host" "new_stack" (func $new_stack (result stackref)))
.Instead, we provide a process for extending stacks with additional stack frames. This is particularly useful for initializing a new stack, but can also be used to implement (at the application-level) multi-shot continuations.
For this, we need a way to describe stack frames, and in particular stack frames that are ready to be switched to receive events. The following is an example of a
func
one could use to describe the initial stack frame of a lightweight thread:Notice the instruction
stack.start
. This is used only bystack.extend
(below) to say where execution should start when the added (suspended) stack frame is resumed, and it can only be preceded by event-handling setup instructions (likeblock
andtry
). So in this example, the thread is set up to so that the first time it is resumed it starts to$do_work
. The implementation of which may in turn call$yield_to
, but eventually all the work gets done, in which case the lightweight thread switches control to the thread manager, handing off the result of its work.To extend a stack, say to add
$thread_root
to a newly created stack, one usesstack.extend
:stack.extend $func : [t* stackref] -> unreachable
func $func (param t*)
Stack Destruction
This example also illustrates how one can implement their own thread-aborting mechanism. Given a
stackref
for a lightweight thread, you can send it an$aborting
message. That message is not handled by$yield_to
, and as such will be thrown and cause the stack to be unwound, until eventually$thread_rout
is reached, in which case it will transfer control back to the stack that initiated the aborting (which in turn willdrop
thestackref
in the payload).Stack Redirection
Many applications "compose" stacks together. This composition, however, is an illusion. The stack components are of course distinct entities, but stack walks are redirected from one stack to another. In this manner, a WebAssembly application can run its entire body on its own stack, but have any exceptions thrown by host code called within that body be redirected to the host stack that initiated the application in the first place. Changing this redirection lets an application conceptually detach its stack from the current host stack, e.g. the current event handler, and attach it to another host stack, e.g. the handler for a promise. A full account of how to support asynchronous I/O with stack redirection can be found here.
Stack-walk redirection is supported by the following instruction:
stack.redirect $local instr* end : [ti*] -> [to*]
local $local : stackref
instr* : [ti*] -> [to*]
Whenever a stack walk (say due to an exception being thrown or due to a stack inspection) reaches
stack.redirect
, the walk is redirected to thestackref
in the local variable$local
at the time.Summary
With a new
stackref
type, a newrout
construct, and a few new instructions, this proposal enables applications to treat stacks as first-class values. Every instruction is constant time. The proposal is designed to make use of the existing exception-handling infrastructure, and is designed to compose well with stack inspection. The proposal is independent of garbage collection (e.g. introduces no cycles), and the only implicit memory management is that a stack must be implicitly freed when itsstackref
is implicitly dropped. But with the combination of these proposals, we have conducted case studies suggesting that this proposal can implement a wide variety of features in the same manner as many existing industry implementations. The only significant shortcoming we have found so far is that stack duplication, needed for multi-shot continuations, is substantially more complicated than would be necessary if the language had complete trusted control over the runtime.The text was updated successfully, but these errors were encountered: