-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LES as the text format #697
Comments
I think this is interesting, and I find a lot of the rationale compelling. In particular simple parsing and familiarity. What is the status of LES? Is it used anywhere? The familiarity argument also applies to JS, though - what all current web devs are used to. But, that does come with harder parsing in the general case. |
(edited) @kripken Thank you for saying so. LES is "beta" and in its second major iteration (LESv2), and unfortunately has only two confirmed users that I know of - which, on the plus side, means that changes can still be made to suit tastes of CG members. I'm ready to accept a variety of changes as long as the parser stays language-agnostic and simple (for reference, the current grammar is 239 lines for the parser and 221 lines for the lexer, including syntax tree construction and error handling but excluding token interpretation and helper methods.) Fun fact: my LES parser is written in LES. We know that the JS parser isn't appropriate for Wasm (there are obvious problems like the lack of types, and less obvious problems like the inability to use arbitrary characters as identifiers, and the lack of separate signed and unsigned operators), and the complexity of its parser is inappropriately high, especially given that Wasm's text format is tangential rather than central to the standard. |
Note that JS may well get optional types, similar to TypeScript which is basically a preview of them. It does makes sense I think for wasm notation to be similar to JS where possible, and the TypeScript/JS declarations for types is one possible area for that - |
i'm assuming that wont be anytime soon though |
Glad someone is exploring this. It still does not seem to have the type inference so how can expressions work as well without a change to the binary encoding? |
I recently posted in my blog about potential changes to LES, some designed for Wasm and some not. @JSStats: LES is only a syntax, so type inference is outside its scope. Could you give me an example of the problem you're talking about? |
@qwertie The text format being proposed uses 'Infix syntax for arithmetic, with simple overloading.' Without a change to the binary encoding this needs some type inference to present, and would that be a part of the proposal here too? |
@JSStats If I understand your question, then yes - LES would use the same kind of "type inference" as almost every programming language. Note that my design strategy so far is to take community proposals (chiefly @sunfishcode's) and apply those proposals to LES. Thus, general questions like that tend to have the same answer here as they would in #704. |
After some communication between implementers, we've decided to focus on having all browsers be able to display linear opcodes for the MVP timeframe. Moving to Discussion. |
@flagxor What is meant by "display linear opcodes"? |
@JSStats although I liked the design as it was before, and although the new design in the slide show isn't a perfect fit to expression notation, the new design (with multiple values) still could allow expression notation in most cases (as you have noted), so it's not necessarily a big loss in terms of presentation - but @flagxor seems to imply they want to drop expression/infix notation in all cases, a proposal to which I would be opposed. I'm torn. In terms of end-user comprehensibility, an AST is best. Flat assembly code is very tall and hard to follow; the new proposal isn't totally flat, but the same principle applies. On the other hand, the new stack-machine interpretation appears to have (minor) performance and size advantages and it's understood that concision and speed are top priorities. Perhaps the biggest advantage of the new interpretation is thought to be multi-value support, but there are so many ways to skin that cat, it's hard to say the new proposal is the best. |
The slides say that "S-expression and Textual infix notation breaks down for stack machine", but it's not a fatal issue, because we can still use expression notation opportunistically. Considering an example from the slides:
The text format could allow a linear form. In LESv3 (keeping in mind that it's undecided whether to use semicolons or not):
But whether via s-expressions or something else, most code could still be expressed in a mostly nested way. As an s-expression it would be
It could even be a one-liner and use type inference:
But I wouldn't recommend this, as the meaning seems non-obvious and one would expect Let's try another one:
I assume that
Or in "maximally tall" notation:
But more likely we'd want to use a compact form
Note: "pop 2" might be something slightly different from the original
Here the arity of |
@flagxor @JSStats @qwertie Forth language has good 45year history of efficiently expressing "stack machine" code. And here is relevant proposal for "PAF: A portable assembly language" proposal: http://www.complang.tuwien.ac.at/anton/euroforth/ef13/papers/ertl-paf.pdf |
@JSStats I don't think this is the appropriate thread to discuss that issue. |
@drom I guess PAF resembles the new Wasm design in many ways. Are you aware of any features/benefits of PAF that the new Wasm doesn't have? Exception handling is the only one I notice. The syntax is entirely unfamiliar. |
There are two stages here. First, the move to drop/tee, which has some clear perf benefits. At that point things are equivalent to both an AST (that has explicit drop) and a stack machine (that has some requirements for structure). The second stage is the move away from an AST to a full stack machine ("relaxation from 0xc" in the slides), which may have minor perf benefits but also minor downsides. There isn't actual data on perf that I am aware of; I would guess there is no significant change. The main benefit appears to be future support for multi-values. |
FWIW I was the author of the slides. I implemented the full stack machine in V8 (patch here: https://codereview.chromium.org/2176653002/), including full multi-value support in both the interpreter and compiler. So I am 100% confident that multi-values work :-) I was planning to propose the full stack machine in an issue. It's mostly not a single PR, since the stack machine does not directly affect the binary format, but documentation such as the AstSemantics (to be renamed ExecutionSemantics I presume), and obviously implementations. |
Now that Wasm is moving toward a stack machine, I think that LES is now more relevant than ever, because there seems to be a need for not one wasm language but two: one for wasm itself, and another for the pseudo-wasm AST used by producers (i.e. binaryen). Both languages would support expressions, but each would interpret them differently. In the "official" wasm, While only one Wasm variation is expected so far, it's not hard to imagine others:
A general-purpose syntax like LESv3 allows people to do all this without touching the parser, let alone writing a new one. I was disappointed that no one commented on my ideas for LESv3, but I did get around to writing a parser + unit tests. In doing so, my ideas evolved slightly beyond the blog post. Bikeshedders WantedSo, I've prepared a new post that explains my design choices for LESv3 and requests your opinions on matters I'm unsure about. I’d much rather have your opinion now than after I’ve written separate parsers for multiple languages! |
Text format proposal. Closing this in favor of the proposal. |
I would like to propose the next version of LES to be used as the WebAssembly text format. More specifically:
This gives the CG some freedom to make some changes to LES and not others. Specifically, any elements that make sense only in WebAssembly (e.g. keywords for wasm opcodes) would not be permitted, but changes such as tweaks to operator precedence, handling of semicolons, the grammar of LES "superexpressions", or the name used for "infinity", are fine.
Why use LES for WebAssembly?
Fewer parsers will be needed in the future: in today's world, every new language needs a new parser to be written, bikeshedded, and specified from scratch. While LES is not sufficient for all languages, it will be enough for some (and in particular it is sufficient for Wasm, which has weaker reasons than most languages to use a custom grammar); being part of Wasm encourages use of LES elsewhere (edit: consider DSLs and advanced search boxes. I wrote some GIS software that did formulas & searches, so used an ANTLR-based parser. But why are people still writing custom parsers for expressions? There should be a standard - so here it is.)
More importanty, learning curves decrease: imagine a future world in which a user has already used LES as a programming language or a data language. She wants to write Wasm assembly for the first time, and it's easier to learn, because the syntax is already familiar. She may have many concepts to learn regarding the semantics of Wasm, but at least the syntax is easy. Conversely, she may learn Wasm first and another LES-based language later. Either way, the same benefit accrues.
By analogy we may compare LES with punctuation and grammar in natural language. When I learn a new language, I have a lot of new words to learn, of course. Luckily, most languages share the same meaning for punctuation: I don't have to re-learn a new version of the comma, dash, parentheses and so on. But grammar is another story: it always varies between languages, sometimes dramatically. Similarly, different programming languages will always vary in concepts and idioms, but it isn't necessary for every language to also have different punctuation and grammar - and even most of the words could stay the same. LES is the only language (AFAIK) that takes the kind of standardization you see in s-expressions and applies it in the Algol-family space.
Various things become easier: let's say I want to analyze some C++ code... from within my Rust or Python code. But all the C++ parsers are written in C/C++! See the problem? We don't have easy ways to cross arbitrary language barriers today. Now imagine a world after Wasm MVP: people will want to process Wasm text (and binaries) from many different languages. Because it is hard to cross language barriers, the way this will happen is that many different people will write their own readers and writers for their language of choice. If the text format isn't something generic like LES, the story ends here with folks merrily parsing Wasm. But if the format is generic, the results are more interesting:
easier to parse correctly than YAML, and is built atop better primitives), so
some people will start using it as a data format. It's especially good for data with
embedded code (e.g. build systems). Network effects are crucial here: no one wants
to use an obscure data format, so it must be used first in a major standard like
WebAssembly.
building compilers, converting code between languages, and bridging language
barriers (both human and machine). Some compilers today can dump their AST in
an XML format, which can allow a program in a different language to pick up that
code and do something with it. But LES is far more compact and just better at
representing code than XML is. So LES makes it easier to talk about, work with,
and reason about syntax trees in the same way s-expressions do. However, LES is
completely unknown today. WebAssembly's role here (should you choose to accept
it) would be to create awareness of LES and cause people to write LES parsers
for numerous languages. This may lead more people to get involved in writing
language-processing tools than in the world we have now, where you're lucky if
you can even get an XML representation of a given language. This in turn should
facilitate the creation of interoperability and code-conversion tools,
ultimately improving interoperability between languages.
programming languages, and I want to use WebAssembly semantics as a "common core"
for such tasks. I expect this to be easier to do if WebAssembly uses LES.
I'm concerned that some would simply say "this is out of scope". But I don't think this will strain the dev process. So why not try it? Others might argue along the lines of "we shouldn't use LES because I really prefer
foo:
instead of:foo
for labels." I would urge those people to look at the bigger picture. Thanks for your consideration!What does Wasm+LES look like?
To show you how Wasm could be encoded in LES, I have created a PR to show how Dan Gohman's strawman would need to change to be LES-compatible.
The text was updated successfully, but these errors were encountered: