-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typed continuations to model stacks #1359
Comments
Is this identical to what you presented in the past @rossberg ? (If not a summary of the changes might be useful.) |
@kripken, yes, up to some renaming and minor tweaks. |
Got it, thanks @rossberg . |
Is there another proposal on the way on how to support applications of stack inspection, e.g. two-phase exception handling, resumable exceptions, C#/Python-style generators, stack tracing, linear-memory garbage collection, stack duplication? |
The new function type |
@RossTate Doesn't this support those, if indirectly, since the exceptions are resumable? |
@taralx I believe you're right that all of those features can be implemented in terms of this proposal. However, for many of those features, allocating an entire new stack would be overkill. Proposals for stack inspection, two-phase unwinding, etc. are essentially special cases of the feature proposed here that can be implemented more efficiently. |
Yes, and on top of that, this proposal requires running code just to find out if an inspection site is relevant to the inspection at hand. So I'm wondering if interaction with (more efficient) stack inspection has been taken into account in this design. |
Putting stack inspection aside, the lightweight-threads example seems to be a relatively inefficient implementation of lightweight threads. Ideally, switching between threads just involves a few instructions to switch contexts. But in this proposal, switching threads involves a stack walk to find the relevant |
That's a good point. But you cannot actually get an evtref for an unknown suspension either, since each handler is guarded by a list of known events. The idea here was that a handler has no business interacting with an unknown suspension -- that would destroy composition. But maybe there are use cases for a handle-all.
I don't think so, since validation already has to be able to treat multiple definitions of the same function type as compatible. Function types are structural, even in the MVP. FWIW, there is a similar question regarding So generally speaking, we have to decide whether we want to avoid synthesising, at the price of more type annotations and checking on instructions like these, or if we're fine with synthesising such types (which may require slightly more work in the spec primarily).
It does, and more importantly, it does so in a composable fashion -- an important point I forgot to include (added). I.e., you can use multiple of these features together at the same time, even if they are implemented separately. @tlively is correct however that there is one special case where creating a new stack can be overkill, namely when the continuation does not escape the handler. In practice, though, handling some of these cases without a stack would require the introduction of either fragmented stack frames, display chains, or scope restrictions that essentially make it equivalent to just calling another function. It is worth investigating if adding those mechanisms is a relevant win over implementing them in user space.
That is not correct. Handlers are associated with resume points, not branches. Consequently, a handler is only present "at the bottom" of each non-suspended stack, and there is no need to walk an actual stack. Moreover, each resume declares a list of tags it handles. So transferring control to a handler can e.g. be implemented by an indirect jump through a (thread-local) side table associating each event tag with its active handler. |
Yes, and the same is true for every other suggestion that has been made for directly supporting these features. Only the suggestion of implementing shadow stacks within WebAssembly without direct support is not composable.
Let's work through the first such "special case" that was discussed four months ago: linear-memory garbage collection. Runtimes and programs with their own (linear-)memory management generally have some way to inspect the stack to find and later update the GC roots on demand. These programs now want to compile to WebAssembly and want some way to inspect the stack on demand to find and update these roots (albeit in a safer manner per WebAssembly's security requirements). Your proposal is to allocate a new stack at every point someone would want to perform such inspection. For these applications, nearly every stack frame has such a root and therefore such a need. So your proposal is essentially to have these applications allocate every stack frame on its own separate stack. @aardappel, as someone who has been particularly advocating for this feature, do you think this would serve your needs? Now the second "special case" that was discussed was stack tracing. All the most-used garbage-collected languages have some means for collecting/examining the stack trace (sometimes even lazily using an interactive stack walk) during execution even in production mode (not just debug mode). Again, nearly every frame would need to provide stack-trace information, so again your proposal would have all these languages allocate every stack frame on its own separate stack.
Ah, missed this change. There's a bug though. Oh, and another (unrelated) bug:
This assumes a particular implementation strategy and infrastructure, essentially baking in decisions that would often be made at the application level (e.g. should this dynamically scoped variable be implemented using a stack walk or using thread-local storage). If we're going to assume there is always the infrastructure for thread-local storage, it might be a good idea to incorporate the feature directly into WebAssembly, since it's useful for a lot more than just continuations. Regardless, this means that in the examples above every call and return (in the surface language) would have to update thread-local storage. In general, every suspend/resume, forward, and throw will have to update thread-local storage, which common implementations of lightweight threads do not need. |
When you presented this proposal six months ago, it was really great. You provided a lot of great ideas and garnered a lot of interest for an unconventional yet useful feature. That sparked a lot of productive conversations within the CG. Those conversations explored the design space, identifying weaknesses in the specifics you presented and then searching for alterations and extensions that would address those weaknesses. That is exactly what the effect of a good presentation should be. But this design has barely changed to incorporate those many related conversations within the group over the months since. |
This design seems especially well suited to implement first class stack-full asymmetric co-routines, as it maps relative 1:1 on its implementation needs. One place where it is more general than such a co-routine feature is the use of event tags, which essentially allow one to yield not just from the current co-routine, but from any of the parents, presumably suspending a chain of co-routines in the process. It is not quite clear to me when this is useful, and I could imagine it having an implementation cost. I guess this gives us the flexibility of continuations where each stack frame has its own lifetime. The machinery also seems quite heavy-weight for typical uses of "generators", which tend to function more like fancy for-loops than anything else. There would have to be some kind of guarantee that it is easy to collapse this functionality (no dynamic allocations, re-use of the same stack etc etc) for this to be feasible. Leaving this up to the cleverness of the engine makes it useless. That holds even more if this is to be useful for stack inspection like @RossTate mentions. Even if we can specify that in simple cases these instructions translate to something close to no-ops, it seems like it would generate a lot bigger binaries than the simpler stack inspection proposal. To me, the whole point of having something like a stack inspection feature is that it is more efficient than a shadow stack, so if there's even a small chance that it isn't (e.g. requires a dynamic allocation) then it will not be used. So as it stands, this feature seem limited in its applicability, unless its generality can be subdivided in semantically more restrictive uses with different implementation requirements. That is counter-intuitive maybe: the more general a feature is, the more things it can represent. But potentially the less features it can represent efficiently, which is what matters the most for Wasm, I feel. |
Another consideration is how to inspect (efficiency aside) the delimited continuations. For example, a garbage-collecting lightweight-thread manager needs to collect (and later update) the roots across all the lightweight threads. Given a particular |
I think, for fundamental performance/implementation reasons, we should expect that there are two features we need:
I think this proposal speaks to 1, whereas stack iteration proposals speak to 2. They should obviously be designed to compose, but as far as I can tell from an perf/impl POV, I don't think it's practical to assume that one proposal is going to satisfy all our use cases and achieve all of our goals here. I also think the relative priorities of these features are different: there are several high-priority use cases for 1 whereas at least one wasm engine has specifically voiced an intention to delay 2 until we have wasm GC in place. |
Yeah, a common thread across feedback we got for #1340 was to prioritize first-class stacks over stack inspection. My presentation last week was to gauge whether stack inspection should still be a feature in the pipeline, and based on online and offline feedback the answer seems to be likely yes (provided it comes out after GC). This is important to know for designing first-class stacks because, just as stack inspection can be combined with unwinding to support two-phase exception handling, stack inspection can be combined with (restricted) detaching or redirecting to support (one-shot) delimited continuations. So if you want a "unifying view on stack-related mechanisms", you need to take stack inspection into account, which this proposal has not. |
No, it is not. It is one of the main innovation of effect handlers, and the reason why PL folks got excited about them, that they are a compositional mechanism for defining effects, including control effects.
I have not been proposing that. The only thing I have mentioned is that effect handlers can express this form of stack inspection (and they immediately explain how that would interact with other control features). But obviously, they would only suitable as an actual strategy if we were able to reliably optimise the special case I mentioned -- which may or may not be the case in this setup. If not, there needs to be a more explicit alternative that can be.
Why? I don't think it does.
Ah thanks, you're right, I mixed up signatures. Fixed.
We could do that. But it's not clear whether it would be all that useful. The idea is that cont.throw is used to "destroy" a stack, when it is not supposed to be completed anymore. At least with the use cases listed, it shouldn't throw back to the handler anymore in that situation. But maybe there are some odd use cases? This is one of the details that needs figuring out.
It doesn't assume, it just points out one possible implementation strategy. It may not even be the best one, as fast implementations have used other strategies.
I honestly don't know what you mean. AFAICT, there hasn't been much discussion of this proposal. There have been various discussions about various ideas, and occasionally I pointed out the connection. Correct me if I'm wrong, but none of these recent ideas has been presented as a substitute for this one, so its utility still stands. Whether it is sufficient to also subsume some of the others (efficiently) is a separate discussion to be had.
Yes, well, the nice thing about this proposal is that it can express it, even if not efficiently, but thereby at least suggesting a canonical semantics for integrating such a feature.
Ah, it may look like it from first glance, but it's not at all specific to or optimised for co-routines. Its heritage is (algebraic) effect handlers, which are a principled way of expressing and composing almost any sort of state or control effect.
Ah, the ability to nest resumption is what makes this mechanism composable. It is what allows each control abstraction to be implemented separately, e.g., you can run an async function in a green thread and still yield to the scheduler without worrying about intermediate handlers. That significantly simplifies implementations, as well as enabling the use of control abstractions in a heterogenous setting with multiple interacting languages.
It is much less heavyweight than you might think. We have reason to believe that implementing generators this way can be just as or even more efficient on average than the common approach of constructing a state machine, which often involves additional local control flow to reenter the previous state and requires copying live locals to the heap at every yield, or allocating them there in the first place. Plus, it immediately allows deep generators (yielding from a nested call) without any additional overhead. |
The design outlined above is similar to that of effect handlers in Multicore OCaml. In practice, the overhead of implementing a range of abstractions from generators to lightweight threads to async/await on top of effect handlers in Multicore OCaml is close to zero. |
There is a standard way of implementing this which Multicore OCaml does and so goes Go, GHC Haskell, Fibers in OpenJDK and other languages which use lightweight threads. Essentially, the continuations (blocked & ready to run goroutines in Go, blocked lightweight threads in GHC) become a distinguished object in the GC, and the GC knows how to scan the stack associated with the continuation. The suspended continuations are marked as regular objects would be. For any suspended continuation reachable from the GC roots, the GC scans the stack associated with the continuation in the same way it would scan the current program stack. |
@kayceesrk That comment was meant to be in the context of modules that implement their own garbage collector in linear memory, though it's my fault for not making that clear. So my point was that a module-implemented garbage collector would not be able to perform the scans you describe. |
Effect handlers are a surface-level language feature. Depending on specifics of the handlers, they can be compiled in a variety of ways. While continuations are always a viable option for any handler, they are generally by far the least efficient option. Even the body of work you are basing this on has significant discussion of analyses and optimizations for identity common special cases and eliminating, but many of those optimized forms are not expressible in this proposal. So what's problematic about this proposal is that it's based on high-level concepts but does not provide the corresponding low-level primitives that those concepts are often implemented with.
WebAssembly/exception-handling#105 was written up to describe these low-level primitives. You were explicitly informed here that it could express algebraic effects, but you never engaged in that conversation. #1340 was a Phase 0 proposal to explore those ideas. But in that conversation concerns were raised about security, prioritization, efficiency, and complexity. The presentation I gave last week was following up on these concerns regarding stack inspection, which we needed to gauge in order to determine how best to address the concerns regarding first-class stacks.
This more explicit alternative has already been discussed a fair amount. It is stack inspection, e.g. #1340 and #1356. Notice that your So this proposal is two iterations behind others that were heavily inspired by it, but which have since broken its concepts down into smaller parts that can be combined more flexibly, and which have already developed answers to many of the open questions in this proposal. |
Will this proposal be compatible with the changes I proposed in WebAssembly/exception-handling#123, namely, removal of |
@aheejin From what I can tell, it should be compatible with those changes. |
soil-initiative/stacks#10 now has two translations of this proposal in terms of #1360, one using stack inspection and one using thread-local storage (assuming such a thing exists in the host and is made accessible to WebAssembly). Each implementation strategy has its strengths and weaknesses, and different effects might be better expressed with one or with the other (the translations compose so that different effects can in fact make different choices even within the same application). One of the problems with the proposal here is that it does not let the application pick its own implementation strategy; instead it bakes in a high-level feature and leaves it to the engine to decide/guess which strategy to use, much like how the JVM bakes in interface-method dispatch. |
The green threads scheduler example shown by @rossberg at the 2020-08-04 meeting appears with the addition of exception handling to catch exception from the On x86_64 where we have native stack switching we add
|
Updated OP to match the version I presented today, removing separate evtref type and br_on_evt instruction. Also removed cont.forward instruction, since it may not be needed and its semantics seems to be confusing. |
@kayceesrk Thanks for the talk! It was very clear and well presented. In your talk, you showed some assembly code on how you implemented algebraic effects in Multicore OCaml. I feel like a goal of a WebAssembly design for stacks should be to enable WebAssembly programs to essentially write those assembly-level operations themselves, provided the functionality can be provided in a way that is similarly composable. Based on your experience with Multicore OCaml, what would be the key primitives for enabling that? In the previous CG meeting it was suggested that these continuations would be used for effects that would have a handler in roughly every stack frame. The implementation strategy you presented involved a linear walk through effect handlers, which suggests that having frequent effect handlers would affect performance. Have you run experiments in the context of such effects? If not and you would like a concrete example to try, make a pass through the source code so that every function call has a handler indicating what file and line number the function call is on (or was on in the original source code) so that effect handlers can be used to implement stack tracing (turning off any optimizations you have to eliminate continuation allocations). I'd be interested to hear what the relative performance of the original and the modified programs are. |
@rossberg Those were much appreciated improvements! |
The straightforward low-level implementation of shallow handlers is to allocate the stack for the handlee/child without the handler code on the stack and then to specify the handling code in the current/parent stack. That's exactly what the current proposal does. To encode deep handlers with shallow handlers, you make sure to always run the continuation with the relevant handler around it, which again is exactly how the current proposal indirectly supports deep handlers. The current proposal isn't particularly lower-level; it's baked in specifically shallow handlers, but it happens to do so in WebAssembly. On the other hand, direct support for deep handlers would permit the handling code be part of (the root of) the allocated stack. |
My understanding regarding high-level effect handlers is the following, @slindley, @dhil and @kayceesrk please correct me if i'm wrong. See Daniel & Sam's paper for more details.
|
If Multicore OCaml used shallow handlers, then the only additional cost would be a single additional store to update the handler function slot in the stack at resumption. Multicore OCaml uses deep handlers since that is what you want almost all of the time for expressing concurrency (async i/o, generators). Choosing deep handlers leads to succinct high-level programs. |
Okay, so the answer seems to be that you opted not to directly support deep handlers because you were willing to accept the overhead of the encoding into shallow handlers.
Yes, and there's a low-level mechanism for deep handlers that optimizes for deep handlers but enables efficient encoding of shallow handlers. It sounds like this is a better fit for common applications of effect handlers. Such an alternative is also a better fit for languages that allow the prompt/tag to be determined dynamically. It's also easier to extend with optimizations for tail resumption and continuation elimination. Why was such an option not considered? |
You seem to be making a lot of claims about an alternative design. Do you have any empirical evidence for any of them? |
Well no one has any empirical evidence for any of these designs. But I can say that #1360 supports optimizations for tail-resumptive effect handlers—meaning that it can express the more optimized lowered code that tail-resumptive effect handlers permit—and as such it can eliminate the 226 stack switches in @kayceesrk's example for generators depending on how the generated values are consumed. In particular, generators are very often used by The trick is that #1360 breaks down algebraic effects into lower-level components, namely stack inspection and stack creation/switching (without a trampoline). The "perform operation" step translates to an (eager) stack inspection ( #1360 also allows optimizations for more mixed patterns. For example, some effect handlers are tail-resumptive on some control-flow paths but not all. In these cases, #1360 lets you allocate a new stack in case the continuation escapes the handler. But #1360 also lets the stack mark at the root of the allocated stack execute the handler before any stack switch is performed. If control goes down a tail-resumptive branch, then the mark simply "answers" with whatever the handler provides. Only when control goes down a non-tail-resumptive branch is control switched to the parent stack (extended to execute the remaining computation). The proposal here does not support these optimizations. Furthermore, notice that it's important, especially in the final case with mixed tail resumption, for the handler to be executed on the newly allocated stack rather than the parent stack, like in deep handlers. So the orientation of the current proposal around shallow handlers makes it difficult to add support for such optimizations in the future. |
Note: I realized that we might be using the term "generator" differently, and after looking at the documentation for Multicore OCaml I indeed found that was the case. So I edited the above comment to give a more nuanced and detailed answer that bridges the difference in terminology. |
The example of generators in todays presentation appeared to be precisely the "for each" pattern for which all stack allocation and switching can be eliminated (even with "deep" yields, i.e. no need to employ an automaton transformation). This proposal cannot express that optimized implementation strategy though. Can you explain why you do not want the optimized implementation strategy to be expressible? |
To discuss the finer points of this proposal and #1360, I think it would be helpful, if both could be formulated in terms of a "global model of the world" that includes a notion of suspended stacks, the currently executing stack, and how they relate to each other. It looks to me like both proposals show formal semantics only from the "viewpoint" of a single thread. The consequence of that is that, for the most interesting instructions, we need to fall back to informal language, because the semantics can not be expressed completely in the single-threaded execution model. |
@sabine The semantics of both proposals are expressible and implementable in a single-threaded execution model. Could you clarify why you believe they aren't? I feel like that will uproot a miscommunication that hopefully will provide the insight you're looking for. |
@RossTate even if the semantics can be expressed single-threaded, should it be called out that it is not expected that the continations can transfer between threads as it makes it explicit that work stealing schedulers on different threads won't be able to grab a continuation? It was mentioned in the meetings, but it probably should be more explicit than just the meeting notes. It precludes the work-stealing that Erlang's BEAM does, so we won't be able to do that in Lumen. |
Certainly the CG should be made aware that the applications of both proposals to multi-threading are still limited by the multi-threading model of WebAssembly and the embedder. |
@RossTate certainly, the semantics must be implementable in a single-threaded execution model, and thus, it must be expressible in a single-threaded execution model. I think what I'm having trouble with is the use of big-step semantics and the use of informal explanations to describe the core parts of stack switch semantics (and it's very likely that I also need to read some papers that you are all familiar with). For example,
I can see that both proposals follow the regular WebAssembly specification conventions from https://webassembly.github.io/spec/core/exec/conventions.html, where it says "The execution rules also assume the presence of an implicit stack that is modified by pushing or popping values, labels, and frames.". I have this feeling that stack switch semantics would need less informal explanations if we could use a formalism that has an explicit model of the stack (which we could then reuse to express what a suspended stack looks like). Using small-step semantics with an explicit stack, we might be able to formulate mathematically what happens on suspend, resume, and the more expressible primitives from #1360. I might be wrong, though, and expressing what happens in small-step semantics is not trivial or is not as helpful as I think it would be. |
@sabine, the semantics of continuations can certainly be described in a small-step semantics. We actually had that written down for an earlier version of this proposal, but haven't adapted it yet to the current incarnation. Roughly, a continuation/stack is represented by an evaluation context E, and there would be a new administrative instruction for representing handlers on the stack. It's very similar to how you model exceptions, with the continuation just capturing and reifying the evaluation context you have thrown across, and resume reentering it. Not sure that would be too helpful to most people at the current stage, though. |
One standard way of explicitly reifying stacks is using a CEK machine. @dhil has a WIP extension of the Wasm semantics with effect handlers which I assume is modelled as a CEK machine. |
@rossberg is the earlier (outdated) version of small-step semantics available somewhere? |
Ah, thanks for the clarification @sabine! The link @rossberg above should be useful for understanding the evaluation-context approach. @kayceesrk mentions the CEK machine, which is another approach. It makes the stack explicit part of the machine state, much like the heap. So if the stack of the machine state looks like (cons K K'), i.e. a stack K' with a pointer to another stack K on top, then a stack switch changes the stack of the machine state to look like (cons K' K). The |
Saw this OOPSLA '20 paper from a retweet by @yaahc of @AndrewCMyers about bidirectional effect handlers. I'm not good at reading the formal semantics in PL papers yet, so going off just the prose, these sections caught my interest related to WASM's proposed usage of effect handler: Multicore OCAML's async/await + exception interaction
The part about Multicore OCAML seems relevant since the WASM Stacks Group has been using Multicore OCAML as an example, working implementation, so will we also need a special Using normal effect handlers to emulate bidirectionalThe paper also includes implementation test cases where if we think the effect handlers will need to also produce effects (such as exceptions from
Accidental handling
The paper proves a lot of goodness of bidirectionality because of the type system, so I'm sure it actually would apply to our since it seems to be that like the Exception proposal already, we will have some sort of tag system for effect handlers and won't actually be able to show all effects are handled to allow mixing different languages together. The paper did make me wonder how we handle the effect handlers not swallowing the wrong effects without all the types the paper outlines, which also involves lifetimes like in Rust too. Does this even apply to us?Again, I don't normally read PL papers, so I could be completely misunderstanding how this paper relates to WASM's effect handlers, but I thought I should bring it up since it seemed on-topic. |
Oh cool! Glad to see Yizhou has kept this line of work going at his new position. There's a lot to digest from this paper. To keep from fragmenting the discussion too much, I'll focus just on the second item you raised. Interestingly, the measurements they provided are more relevant to Stack Inspection (#1356). The reason is that both evaluations are done on an implementation of tail-resumptive algebraic effects that does no stack switching, and whereas this proposal (#1359) does not support such an implementation strategy, #1356 does with its |
Following up, the second big thing in this paper is its use of lexical scoping. This proposal, as well as Stack Inspection, use dynamic scoping—one walks up the stack to find the handler for a given effectful operation. This has its advantages—in particular, it works well in untyped (in terms of effects) settings—but also its disadvantages—such as accidentally finding a different handler than what the programmer intended, as described in the paper. To implement lexically scoped handlers, you pass the handler to the effectful function as an explicit low-level argument. The value you are handing off is in general a closure. But the stuff they do with lifetimes ensures that the closure's lifetime never exceeds the lifetime of its stack frame. This enables them to use stack-allocated closures rather than heap-allocated closures. This implementation technique corresponds to the higher-order parameters discussed in WebAssembly/exception-handling#105. |
Motivation
Wasm currently lacks any support for multiple stacks of execution and switching between them.
That prevents it from efficiently representing most control flow abstractions that appear in many modern programming languages, such as
In the spirit of Wasm as a low-level engine, it's highly undesirable to add a multitude of ad-hoc features for each of this mechanisms individually.
Instead, it's preferable to identify a sufficiently general mechanism for managing execution stacks that allows expressing all of them.
More concretely, Wasm needs a way to represent, create, and switch between stacks.
Challenges
In order to be able to express features like green threads and others, switching stacks must be possible at any stack depth, not just at the bottom of the stack (“deep coroutines”).
In order to be able to express features like generators, it must be possible to pass values (back and forth!) when switching to a different stack.
In order to be able to validate the usage of stacks, both stacks and the values being passed when switching must be properly typed.
In order to inform the semantics of the interaction with other control flow, esp exceptions, it is desirable to have a semantic framework that can explain all such constructs.
In order for multiple of the above features to be usable at the same time, they must be expressible in a composable fashion that allows arbitrary nesting.
The mechanism must not require copying or moving stack frames, since some engines could not easily do that due to the heterogeneous nature of their stacks.
It must not be possible to create cyclic references to stacks, since that would require GC.
Proposal
One known way to address the above challenges is to view stacks slightly more abstractly, through the notion of a delimited continuation.
Broadly speaking, that represents “the rest of” a computation (a continuation) “up to” a certain point (delimited).
Concretely, that certain point is the bottom of a specific stack, while the stack frames on the stack essentially describe the remaining tasks to be completed.
In particular, we can apply known ways to type such continuations, which is not a problem with an immediately obvious solution otherwise.
One specific way of typing continuations and the values communicated back and forth is by following the approach taken by so-called effect handlers, one modern way of representing delimited continuations, which essentially are exception handlers with resumption, where both throw and resume can carry along arguments.
Like with asymmetric coroutines, switching between stacks then occurs through a pair of resume and suspend instructions.
The suspend instruction “throws” a value akin to an exception — we call it an event below — which the resume handles.
The event's tag, like a regular exception tag, determines the types of values passed from suspend to resume point, but also those passed back.
Details
The first central addition is a new form of resumable exception, a.k.a. event.
Such an event is declared with a respective new form of exception definition
(in the binary format this is an exception definition with a certain flag bit):
Unlike a regular exception, events can have return values.
The other central addition is a new (reference) type of continuations:
(cont $ft)
where
$ft
is a type index denoting a function type[t1*] -> [t2*]
.Intuitively, this describes a suspended stack that is resumable with values of types
t1*
and will ultimately terminate with values of typest2*
.A new stack is created with the instruction
which takes a function reference to run on the new stack.
Execution isn’t started before resuming the continuation.
The instruction
resumes such a continuation, by passing it the expected arguments
t1*
.Once the computation terminates, it returns with
t2*
.If the computation suspends before termination (see next instruction), with one of the event tags listed, then execution branches to the respective label
$handler
.This happens without unwinding the stack.
The label receives the event arguments
tp*
and a continuation of type(cont $ft')
, where$ft'
is the function type[tr*] -> [t2*]
of the next continuation.Continuations are single-shot, that is, they cannot be resumed more than once.
An attempt to resume an already resumed continuation traps.
A computation running on its own stack (i.e., initiated as a continuation), can suspend itself with the following instruction:
cont.suspend $evt : [tp*] -> [tr*]
where
$evt
is an event of respective type[tp*] -> [tr*]
.Essentially, this passes control back to the parent stack.
That is, like
throw
the exception takes argumentstp*
, and transfers control to the innermostcont.resume
, or rather, its handler label, respectively.But unlike regular exceptions, execution can also be resumed, which returns here with values of type
tr*
.As described above, the innermost active handler matching the event tag receives the event's
tp*
and a new continuation.Resuming this continuation (with
cont.resume
) will therefore pass backtr*
to the originatingcont.suspend
.Resumption can nest, i.e., a continuation may itself resume another continuation.
Consequently, an event may generally pass through multiple parent stacks before hitting a matching resume handler.
The same chain of stacks is maintained when resuming.
Note how the exception tag ties together the types of the values passed in both direction between a matching suspend and resume.
Different suspension points within the same computation can use different exception tags and thereby pass different types of values back and forth between parent and child stack.
Continuations are first-class values that are allowed to escape a handler.
That way, a stack can be kept live to be resumed at a later point.
The stack’s lifetime ends when the final resume terminates regularly (or with an exception, see below).
Managing the lifetime of continuations/stacks and evtrefs generally requires reference counting in the VM.
However, it’s impossible for them to form cycles, so general garbage collection is not needed.
Finally,
aborts a continuation by injecting the (regular) exception
$exn
with arguments of typete*
at the suspension point.This is a way to explicitly unwind the stack associated with the continuation.
In practice, a compiler should make sure that every continuation is used linearly, i.e., either resumed or aborted with a throw.
However, there is no simple way to enforce this in a type system, so Wasm validation won’t be able to check this constraint.
Example: A simple generator
To see how these mechanisms compose, consider encoding a generator enumerating i64 integers from 0 up until a certain condition is met.
The consumer can signal back this condition by returning a bool (i32) to the generator.
This can be expressed by the following event.
Here is the actual generator:
Here is one possible consumer, which runs the generator until a given max value is reached:
Note how the handler
$h
takes both the i64 argument produced by the generator and the next continuation.Example: A simple thread scheduler
For a more interesting example, consider a simple scheduler for green threads.
We define two events with which a thread can signal the scheduler to either yield or spawn a new thread:
where
We further assume that there is a global variable encoding a thread queue in form of a list of coninuations:
The exact representation does not matter, so
list-of
is just pseudo code.All we need is three obvious operations on this queue, ignoring their details:
Given these prerequisites, a simple scheduler loop can be implemented as follows:
TODO: More examples.
Interaction with regular exceptions
Regular exceptions and events are disjoint.
An exception handler will not catch events and vice versa.
If resuming a continuation throws a regular exception that is uncaught on its stack then it propagates through the active resume, unwinding the stack encapsulated in the continuation and ending its lifetime.
Yielding an event on the other hand does not trigger exception handlers between the suspend and the resume point.
They become active again when the continuation resumes and execution thereby switches back to their stack.
Note that this dichotomy could, in principle, be used to implement exceptions with two-phase unwind: the first phase is represented by an event, whose handler then injects back a proper exception to unwind. However, such a an implementation would be fairly expensive, since every exception handler would have to run on its own stack.
It is still useful as a semantic explanation of what ought to happen, and how two-phase unwind would interact with single-phase exceptions.
The text was updated successfully, but these errors were encountered: