-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding the post-drop block/br type system #736
Comments
To be specific, those examples are from the tests in the test repo's |
Which is identical to the spec repo's branch of the same name. |
Are you talking about the |
They are from |
Ah, I see.
AIUI, the idea is that if a
This one doesn't make sense in isolation. Is there supposed to be a
|
(The first example is from Ok, thanks, the unreachable's type was confusing me perhaps. If I follow you, then it looks like the type system has gotten more complex and now an additional internal type is needed? What I mean is that before, binaryen had
Those types appear to no longer be sufficient, as if I understand things now, then
Is that correct? |
For the stack machine, we made the decision not to typecheck unreachable code. That's not really an undue burden on the V8 and SM implementations, since they already modeled reachability for SSA renaming. For binaryen, I assume this is an additional thing to check. I'm not entirely sure that one can easily model the reachability requirement with types alone. Instead it is state attached to each block and to the current "stack". |
I see. In that case, maybe it makes more sense for binaryen to just jump to that state, instead of trying to get to where binary-0xc is right now? (which looks like post-drop but pre-full stack machine) Unless that's far off. @rossberg-chromium, when do you expect ml-proto's binary-0xc branch to get to the final state of full stack machine / not typechecking unreachable code etc.? |
(Or is there already another branch somewhere that is at that point?) |
0xC will be updated to be the full stack machine over the next few days. |
(in the design repo) |
I think I was confused about this too, not sure what the differences a "full stack machine" means for the type system. |
Some details are available in this slide deck: https://docs.google.com/presentation/d/1dRsN5lKY60d3IOILi4bttJXOX4ge-2tA1PaEX-d5So0/edit#slide=id.p But I think @rossberg-chromium 's rules will make it easier to see. |
@titzer Interesting slide thank you. It claims to support multiple values, which is any area of concern for me. So this seems to extend the break operators to be able to pick a number of values from the top of the stack and retain these at the exit of the target block while discarding the other values in the block? The problem is that it is not possible to access the multiple values from a call. For example it is not possible to drop the top-of-stack value to expose the next value. Perhaps the proposal is to remove that restriction on It is not even possible to store these values in local variables as the Fwiw I believe that any re-arranging of values on the values stack is compatible with an expression based presentation. This can be demonstrated with a The other show stopper is that the 's-expression syntax breaks down for stack machine', or in other words it is not presentable in an expression based format. There might be ways to work around that with canonicalization, but I have not worked through it all. So it still appears to me that the proposal is not tenable and might not meet the expectations of the web community. It really does not seem a big burden to keep the |
While I don't fully understand every detail of the slide deck (I can't even tell who wrote or uploaded it) I think I get the gist of it. The slides note that "void-in-the-middle is not consistent with the standard interpretation of an AST, without additional blocks". That's true, yet nothing prevents us from constructing a tree from the postorder and calling it an AST. I think being able to display expressions using infix notation will be a great feature of Wasm even if it moves away from an AST interpretation. So in a comment in #697 I've explored a continuum of potential notations from "tall-postorder" to "wide-tree". |
@kripken, the well-typedness of your original examples depends on the surrounding expression: if the block is put into a void context then the But with a full stack machine this becomes a moot point, so you might want to skip implementing the intermediate step. The stack machine implementation of the spec is currently on the |
@binji, the "explicit drop" changes are the biggest part, as far as type system changes for implementations are concerned. The rest towards "full stack machine" should in fact be a (minor) simplification for those that did not materialise an AST: mostly, the validator no longer has to remember when voids are "pushed on the stack". It also does not need to check syntactic arities anymore. For the spec, the move to a stack machine is quite a fundamental change, however. The |
@titzer, don't worry, unreachability is straightforward to model with typing rules. Already done. ;) |
@rossberg-chromium: ok, thanks, I guess I'll plan to go straight to the I still worry about having to understand the new type system from the tests (which has always been a tricky process), so I might wait until the spec is written, unless we think that will be far off? |
@rossberg-chromium: I am also concerned about the full rewrite of the core of your interpreter. Is that due to something specific in your implementation, or would you expect all existing interpreters to need rewrites? |
@kripken, yes, the difference is that ml-proto defined everything in terms of the expression AST, and that has vanished. Browser implementations did not actually build an AST, so the new semantics is a closer match to what they already do. As Ben mentioned in his presentation, it was a 200 loc change in V8, and done in like 2 days (including a complete impl of multi-returns). In a nutshell, the accumulative changes in typing over 0xB are:
None of this requires any new type. It's just an error if a block leaves more than one thing on the stack (in the MVP). In addition, we discussed getting rid of polymorphism by not typechecking unreachable code, where unreachability can be defined as follows in terms of the flat operator sequence:
There are nicer ways to describe this in terms of a special bottom type, which is pretty much the |
Just to be clear, it was a ~200 line change to the decoder; the rest of the changes to V8 were about 1500 lines, mostly to tests and the interpreter. The change is here: https://codereview.chromium.org/2176653002/ |
Thanks for the details @rossberg-chromium! So this sounds worrying. Much like ml-proto, binaryen defines everything in terms of an AST. At this point the binaryen optimizer is a non-trivial amount of code (includes a register allocator, duplicate function eliminator, and many other passes), and the binaryen utilities like s2wasm and asm2wasm are likewise non-trivial in size and are a core part of the toolchain story for WebAssembly. If all that needs to be rewritten because it's AST-based then it's a serious matter. But I feel like I don't have a full grasp on the issue yet. If this is just validation, I'd be happy to rewrite binaryen's binary and s-expression validators from scratch, that's not too bad. But if valid WebAssembly code can't be represented, transformed, and optimized the way the binaryen AST currently is, then that would justify considering scrapping binaryen, in which case we would need to figure out other solutions for the toolchain functionality it currently provides - worrying in the MVP timeframe. |
On the ml-proto
Is that due to loop no longer having a label to break out? |
@kripken, yes. The branch targets the outer loop. |
@kripken, re your previous question: it should be fairly straightforward to keep using an AST in binaryen if you are willing to build one that it is a superset of actual Wasm, i.e., adds a form of let-node. Luke and Dan have suggested this as well. For ml-proto that wasn't an option because it is supposed to directly mirror the actual language as defined (and that definition has changed). Tools have more leeway in using custom ILs. |
@rossberg-chromium: I'm a little confused though because e.g. About the AST issue, I'm still not sure what adding a let node would look like in terms of compiling to and from wasm, especially keeping that lossless. Also unsure about code-size optimizations with such a construct (if the AST is "unnatural" then proper opts might be hard). But I guess I'll implement it when I get to the tests that need it and it'll be more concrete for me. Hopefully it won't be a problem. |
Another question, I see in
Is that the new flat stack machine text format? Is it intended to be mixable with the old notation as in the second function (no ambiguity issues that the parens would have prevented)? |
It looks like you are allowing non-parenned operations to pop parenned ones, e.g.
I assume that the (Of course I think it's interesting to investigate a mixed-mode text format for the future; my comment is about the changes to the s-expression format for the test suite today.) |
@rossberg-chromium It can be a type error in a s-exp text format, not necessarily the one being proposed, and that is my point. @kripken might not choose your text format even if using the same code. There is type information for the number of results expected for a function, in the function signature. |
@rossberg-chromium: I see, and I guess the only way to do a return without a value is by |
@kripken I doubt it would be The point @rossberg-chromium is making might also be that the s-exp no longer has much syntax, that arguments push to the stack and counts are not checked except for the wasm encoding being valid, but there seems no reason that binaryen has to follow. For example I presume that the following is valid in the @rossberg-chromium proposal, but there is no reason for the text format parser to accept this as valid and I don't think the test suit should include such ill-formed syntax for tests.
|
@kripken, no, it's still just S-exprs are just syntax sugar, i.e., macro-like things without inherent semantics. So Wasm itself can't check anything about them (the type system doesn't know anything about them) and thus would (have to) allow weird abuses. But some kind of linter could check for them. @JSStats, your example is currently invalid, because binary operators still only allow two S-expr operands. The only way to abuse the syntax is by putting void-expressions somewhere, e.g.:
which of course is no better. |
@rossberg-chromium Ok, thank you. Problematic but perhaps valid:
Problematic and perhaps invalid in some contexts:
I wonder if the text parser needs to know some type information. Binaryen might still need an AST in order to catch such problems and to be able to report them. |
@JSStats, the MVP does not yet support multiple returns, so you can't currently write these examples. If a "parser" needs type information then it is no longer just a parser. It is in fact a translator from one (external) language into another (internal) language. There is nothing wrong with that for many use cases (including Binaryen), but technically you are dealing with two different languages at that point. The stack machine's S-expr interpretation as "macros" is intentionally dumb-as-bread because we don't want to get into the business of specifying two languages for Wasm plus a compilation function between them. |
@rossberg-chromium Ok, I see your point, and that does seem to support abandoning the s-exp format for a testing format, and just using a sequence of opcodes plus perhaps some white space conventions to make it clear how blocks nest. My only concern then is that wasm development might not recognize the constraint of supporting a one-to-one translator to a familiar text format, and there might be no demonstration or testing that this is possible. |
@rossberg-chromium "S-exprs are just syntax sugar, i.e., macro-like things without inherent semantics. So Wasm itself can't check anything about them (the type system doesn't know anything about them) and thus would (have to) allow weird abuses." I didn't fully appreciate what that meant until now. Hasn't this change made s-expressions much worse than before the stack change? |
@rossberg-chromium Actually, wait, I still don't understand this. Are you saying that in
the |
@kripken, yes, that definitely has made them worse, in the sense that they have much looser meaning and checking now. But unfortunately, everything else would require defining another language. The example does not require |
That seems confusing to me, because when |
@kripken The example |
@kripken, I am not sure I understand what you are saying. But it's worth noting that, once we generalise to multi-values post-MVP, any number of operators/expressions in a block is a candidate for consumption, e.g., @JSStats, no, this example has always been valid, through the polymorphic result type of |
@rossberg-chromium Yes, missed that. But The unreachable code still needs to be valid opcodes, it's just not typed checked? But binaryen need not accept
Perhaps a reason to do so is to prevent |
I am trying to understand the control flow implied by block, loop, br, br_if and end. It is br and br_if that I am not sure I've understood (from the binary encoding description). Because of my uncertainty of br and br_if, I'm also doubting whether I've understood what end does. I've done my best to understand them by creating different tests and compare the outputs. I've created a pdf document containing how I believe the control flow to be affected by the different statements. Could you verify that this is indeed correct or not? |
@kf2093 Slides look right, but that is not the half of it. The |
Once I figure out the control flow I was going to further dig deeper into the affects on the value stack.
There are two ends in this case. I am guessing there is an implicit block for each function code although there is not an outer explicit block wasm instruction. I think from my slides what I am trying to confirm is that for a |
@kf2093, you're reading is correct: a br targeting a block is like a break, a br targeting a loop is like a continue. However, please note that the issue tracker is not the right forum for asking basic questions about the Wasm design. |
Sorry about that! Where's the best place to ask? (Hopefully I can contribute more to the project once I get up to scratch) |
You may want to take a look at
https://docs.google.com/document/d/1CieRxPy3Fp62LQdtWfhymikb_veZI7S9MnuCZw7biII/edit
…On Tue, Jan 17, 2017 at 1:25 PM, kf2093 ***@***.***> wrote:
Sorry about that! Where's the best place to ask? (Hopefully I can
contribute more to the project once I get up to scratch)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#736 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALnq1KC5en5ILk39qTmToMjq8AsYrcuMks5rTLM7gaJpZM4JZ2oM>
.
|
I've understood the control flow aspects to br/br_if/loop/block/end. I hope this next question is more along the lines of what can be discussed here. For an interpreter a branch instruction to a Should interpreters be perform a first pass when loading in wasm to resolve |
@kf2093, you are commenting on an old issue about the details of the static validation rules for blocks. So I'm afraid that question still is off-topic. |
@kf2093 Yes, wasm is not optimized for interpreters, and the code needs to be validated before running it, so at least one pass will be needed and during that pass you could add information on branch target locations or transform it to another representation. There are some interpreter implementations to look at, e.g. the spec interpreter, the v8 source, and in binaryen. It seems fine to me to use the relative control stack indexes to reference the target labels: efficient use of the control stack and minimal redundant information in the encoding. |
The original question seems to have been clarified long ago. Feel free to reopen if not. |
Looks like the drop changes in 0xc change the block/br type system significantly. I've been trying to figure out what's going on from the new spec tests, but can't seem to. Is this documented somewhere?
For example, one source of puzzlement is that this should be invalid
Intuitively that seems like it should be valid, as no value is sent to the block? So I can only guess it is invalid because the type of unreachable is unacceptable here. Yet, on the other hand this should be valid,
So somehow the type of
br
is different from the type ofunreachable
? That seems odd as both halt control flow.The text was updated successfully, but these errors were encountered: