LES as the text format #697

qwertie · 2016-05-24T05:04:46Z

I would like to propose the next version of LES to be used as the WebAssembly text format. More specifically:

I propose that the Wasm text format be compatible with the next version of LES, in the sense of being a subset of it, such that a "full" LES parser would not reject anything that is valid Wasm text.
For the MVP, those that want to parse Wasm text will be able to choose whether to use a custom parser dedicated to Wasm or a generic LES parser.
My proposed syntax (link at bottom) is not compatible with the current LES specification, but LES is in beta and can still be tweaked to Wasm's needs. Based on the Wasm text format, a third version of LES (LESv3) will be drafted before the end of 2016.

This gives the CG some freedom to make some changes to LES and not others. Specifically, any elements that make sense only in WebAssembly (e.g. keywords for wasm opcodes) would not be permitted, but changes such as tweaks to operator precedence, handling of semicolons, the grammar of LES "superexpressions", or the name used for "infinity", are fine.

Why use LES for WebAssembly?

Fewer parsers will be needed in the future: in today's world, every new language needs a new parser to be written, bikeshedded, and specified from scratch. While LES is not sufficient for all languages, it will be enough for some (and in particular it is sufficient for Wasm, which has weaker reasons than most languages to use a custom grammar); being part of Wasm encourages use of LES elsewhere (edit: consider DSLs and advanced search boxes. I wrote some GIS software that did formulas & searches, so used an ANTLR-based parser. But why are people still writing custom parsers for expressions? There should be a standard - so here it is.)
More importanty, learning curves decrease: imagine a future world in which a user has already used LES as a programming language or a data language. She wants to write Wasm assembly for the first time, and it's easier to learn, because the syntax is already familiar. She may have many concepts to learn regarding the semantics of Wasm, but at least the syntax is easy. Conversely, she may learn Wasm first and another LES-based language later. Either way, the same benefit accrues.

By analogy we may compare LES with punctuation and grammar in natural language. When I learn a new language, I have a lot of new words to learn, of course. Luckily, most languages share the same meaning for punctuation: I don't have to re-learn a new version of the comma, dash, parentheses and so on. But grammar is another story: it always varies between languages, sometimes dramatically. Similarly, different programming languages will always vary in concepts and idioms, but it isn't necessary for every language to also have different punctuation and grammar - and even most of the words could stay the same. LES is the only language (AFAIK) that takes the kind of standardization you see in s-expressions and applies it in the Algol-family space.
Various things become easier: let's say I want to analyze some C++ code... from within my Rust or Python code. But all the C++ parsers are written in C/C++! See the problem? We don't have easy ways to cross arbitrary language barriers today. Now imagine a world after Wasm MVP: people will want to process Wasm text (and binaries) from many different languages. Because it is hard to cross language barriers, the way this will happen is that many different people will write their own readers and writers for their language of choice. If the text format isn't something generic like LES, the story ends here with folks merrily parsing Wasm. But if the format is generic, the results are more interesting:
- LES typically represents data more compactly than XML and JSON (and I assert it's
  easier to parse correctly than YAML, and is built atop better primitives), so
  some people will start using it as a data format. It's especially good for data with
  embedded code (e.g. build systems). Network effects are crucial here: no one wants
  to use an obscure data format, so it must be used first in a major standard like
  WebAssembly.
- LES and Loyc trees were designed to help with a variety of tasks related to
  building compilers, converting code between languages, and bridging language
  barriers (both human and machine). Some compilers today can dump their AST in
  an XML format, which can allow a program in a different language to pick up that
  code and do something with it. But LES is far more compact and just better at
  representing code than XML is. So LES makes it easier to talk about, work with,
  and reason about syntax trees in the same way s-expressions do. However, LES is
  completely unknown today. WebAssembly's role here (should you choose to accept
  it) would be to create awareness of LES and cause people to write LES parsers
  for numerous languages. This may lead more people to get involved in writing
  language-processing tools than in the world we have now, where you're lucky if
  you can even get an XML representation of a given language. This in turn should
  facilitate the creation of interoperability and code-conversion tools,
  ultimately improving interoperability between languages.
- On a personal note, I want to build a library for converting code between
  programming languages, and I want to use WebAssembly semantics as a "common core"
  for such tasks. I expect this to be easier to do if WebAssembly uses LES.

I'm concerned that some would simply say "this is out of scope". But I don't think this will strain the dev process. So why not try it? Others might argue along the lines of "we shouldn't use LES because I really prefer foo: instead of :foo for labels." I would urge those people to look at the bigger picture. Thanks for your consideration!

What does Wasm+LES look like?

To show you how Wasm could be encoded in LES, I have created a PR to show how Dan Gohman's strawman would need to change to be LES-compatible.

The text was updated successfully, but these errors were encountered:

kripken · 2016-05-24T16:37:41Z

I think this is interesting, and I find a lot of the rationale compelling. In particular simple parsing and familiarity.

What is the status of LES? Is it used anywhere?

The familiarity argument also applies to JS, though - what all current web devs are used to. But, that does come with harder parsing in the general case.

qwertie · 2016-05-24T19:40:52Z

(edited) @kripken Thank you for saying so. LES is "beta" and in its second major iteration (LESv2), and unfortunately has only two confirmed users that I know of - which, on the plus side, means that changes can still be made to suit tastes of CG members. I'm ready to accept a variety of changes as long as the parser stays language-agnostic and simple (for reference, the current grammar is 239 lines for the parser and 221 lines for the lexer, including syntax tree construction and error handling but excluding token interpretation and helper methods.) Fun fact: my LES parser is written in LES.

We know that the JS parser isn't appropriate for Wasm (there are obvious problems like the lack of types, and less obvious problems like the inability to use arbitrary characters as identifiers, and the lack of separate signed and unsigned operators), and the complexity of its parser is inappropriately high, especially given that Wasm's text format is tangential rather than central to the standard.

kripken · 2016-05-24T21:34:30Z

Note that JS may well get optional types, similar to TypeScript which is basically a preview of them. It does makes sense I think for wasm notation to be similar to JS where possible, and the TypeScript/JS declarations for types is one possible area for that - x : i32, etc. It's nicer if languages on the web look more consistent, or at least avoid arbitrary differences.

wanderer · 2016-05-26T19:24:42Z

Note that JS may well get optional types

i'm assuming that wont be anytime soon though

ghost · 2016-06-14T00:39:51Z

Glad someone is exploring this. It still does not seem to have the type inference so how can expressions work as well without a change to the binary encoding?

qwertie · 2016-06-14T00:45:32Z

I recently posted in my blog about potential changes to LES, some designed for Wasm and some not.

@JSStats: LES is only a syntax, so type inference is outside its scope. Could you give me an example of the problem you're talking about?

ghost · 2016-06-14T01:33:33Z

@qwertie The text format being proposed uses 'Infix syntax for arithmetic, with simple overloading.' Without a change to the binary encoding this needs some type inference to present, and would that be a part of the proposal here too?

qwertie · 2016-06-14T04:31:31Z

@JSStats If I understand your question, then yes - LES would use the same kind of "type inference" as almost every programming language. Note that my design strategy so far is to take community proposals (chiefly @sunfishcode's) and apply those proposals to LES. Thus, general questions like that tend to have the same answer here as they would in #704.

flagxor · 2016-07-30T04:41:53Z

After some communication between implementers, we've decided to focus on having all browsers be able to display linear opcodes for the MVP timeframe. Moving to Discussion.

qwertie · 2016-08-02T02:14:11Z

@flagxor What is meant by "display linear opcodes"?

qwertie · 2016-08-02T05:13:30Z

@JSStats although I liked the design as it was before, and although the new design in the slide show isn't a perfect fit to expression notation, the new design (with multiple values) still could allow expression notation in most cases (as you have noted), so it's not necessarily a big loss in terms of presentation - but @flagxor seems to imply they want to drop expression/infix notation in all cases, a proposal to which I would be opposed.

I'm torn. In terms of end-user comprehensibility, an AST is best. Flat assembly code is very tall and hard to follow; the new proposal isn't totally flat, but the same principle applies. On the other hand, the new stack-machine interpretation appears to have (minor) performance and size advantages and it's understood that concision and speed are top priorities. Perhaps the biggest advantage of the new interpretation is thought to be multi-value support, but there are so many ways to skin that cat, it's hard to say the new proposal is the best.

qwertie · 2016-08-02T05:36:33Z

The slides say that "S-expression and Textual infix notation breaks down for stack machine", but it's not a fatal issue, because we can still use expression notation opportunistically.

Considering an example from the slides:

block
i32.const 4
f64.const 5.5
call $foo
i64.add
end

The text format could allow a linear form. In LESv3 (keeping in mind that it's undecided whether to use semicolons or not):

{
    4;
    5.5;
    $foo(pop 2); // use 2 values that were pushed earlier as arguments
    i64.add;
}

But whether via s-expressions or something else, most code could still be expressed in a mostly nested way. As an s-expression it would be (block (call $foo (i32.const 4) (f64.const 5.5)) i64.add). In LESv3, we might write

{
    $foo(4, 5.5);
    i64.add;
}

It could even be a one-liner and use type inference:

{ +$foo(4, 5.5); }

But I wouldn't recommend this, as the meaning seems non-obvious and one would expect -$foo() to be a negation, not a subtraction.

Let's try another one:

block
i32.const 4
f64.const 5.5
get_local 0
br_if[arity=2] 0
i32.const 6
f64.const 7.5
end

I assume that br_if[arity=2] 0 means "pop one number and ~~if that number is nonzero~~ pop 2 items, and if that number is nonzero, break from block 0 which is the innermost block". Linearly in LESv3 this might be

{
   4; 5.5;
   'br 0 (pop 2) 'if $0; 
   6; 7.5;
}

Or in "maximally tall" notation:

{
   4;
   5.5;
   $0;
   'br_if 0 (pop 2);
   6;
   7.5;
}

But more likely we'd want to use a compact form

{
   // branch from block 0 with values {4; 5.5} if $0, where $0 means `get_local 0`
   'br 0 (4; 5.5) 'if $0; 
   6; 7.5;
}

Note: "pop 2" might be something slightly different from the original arity=2, in that you could also write this:

{
   4;
   'br 0 (pop 1; 5.5) 'if $0; 
   6; 7.5;
}

Here the arity of br_if is still 2, but the first is popped by "pop 1" and the second is specified inline. In this way the text format could support a continuum between a very compact expression language and a very tall "linear" language.

drom · 2016-08-02T08:16:00Z

@flagxor @JSStats @qwertie Forth language has good 45year history of efficiently expressing "stack machine" code. And here is relevant proposal for "PAF: A portable assembly language" proposal: http://www.complang.tuwien.ac.at/anton/euroforth/ef13/papers/ertl-paf.pdf

qwertie · 2016-08-02T11:11:35Z

@JSStats I don't think this is the appropriate thread to discuss that issue.

ghost · 2016-08-02T11:48:16Z

@qwertie Ok, sorry, lets move the discussion on the text format to #704 and keep this as LES for whatever format.

qwertie · 2016-08-02T13:06:51Z

@drom I guess PAF resembles the new Wasm design in many ways. Are you aware of any features/benefits of PAF that the new Wasm doesn't have? Exception handling is the only one I notice. The syntax is entirely unfamiliar.

kripken · 2016-08-02T17:02:07Z

@qwertie

On the other hand, the new stack-machine interpretation appears to have (minor) performance and size advantages

There are two stages here. First, the move to drop/tee, which has some clear perf benefits. At that point things are equivalent to both an AST (that has explicit drop) and a stack machine (that has some requirements for structure).

The second stage is the move away from an AST to a full stack machine ("relaxation from 0xc" in the slides), which may have minor perf benefits but also minor downsides. There isn't actual data on perf that I am aware of; I would guess there is no significant change. The main benefit appears to be future support for multi-values.

titzer · 2016-08-02T17:55:03Z

FWIW I was the author of the slides.

I implemented the full stack machine in V8 (patch here: https://codereview.chromium.org/2176653002/), including full multi-value support in both the interpreter and compiler. So I am 100% confident that multi-values work :-)
@rossberg-chromium implemented the stack machine in the stack branch (https://github.com/WebAssembly/spec/tree/stack) of the spec.

I was planning to propose the full stack machine in an issue. It's mostly not a single PR, since the stack machine does not directly affect the binary format, but documentation such as the AstSemantics (to be renamed ExecutionSemantics I presume), and obviously implementations.

qwertie · 2016-08-21T16:01:01Z

Now that Wasm is moving toward a stack machine, I think that LES is now more relevant than ever, because there seems to be a need for not one wasm language but two: one for wasm itself, and another for the pseudo-wasm AST used by producers (i.e. binaryen). Both languages would support expressions, but each would interpret them differently. In the "official" wasm, $a() + $b() could be syntactic sugar for the sequence $a(); $b(); i32.add if both functions return i32, whereas in binaryen, the same text would represent the AST i32.add($a(), $b()).

While only one Wasm variation is expected so far, it's not hard to imagine others:

People will need to prototype new Wasm features.
I'm interested in making some kind of higher-level language using wasm's AST variant as a semantic foundation.

A general-purpose syntax like LESv3 allows people to do all this without touching the parser, let alone writing a new one. I was disappointed that no one commented on my ideas for LESv3, but I did get around to writing a parser + unit tests. In doing so, my ideas evolved slightly beyond the blog post.

Bikeshedders Wanted

So, I've prepared a new post that explains my design choices for LESv3 and requests your opinions on matters I'm unsure about. I’d much rather have your opinion now than after I’ve written separate parsers for multiple languages!

jfbastien · 2017-05-11T18:05:22Z

Text format proposal. Closing this in favor of the proposal.

qwertie mentioned this issue May 24, 2016

Strawman counterproposal sunfishcode/design#3

Closed

sunfishcode mentioned this issue May 25, 2016

Strawman counterproposal (fixed) sunfishcode/design#8

Closed

qwertie mentioned this issue Jun 14, 2016

A JS-style Text Format #704

Closed

sunfishcode added the text format label Jul 8, 2016

sunfishcode modified the milestone: MVP Jul 8, 2016

flagxor modified the milestones: Discussion, MVP Jul 30, 2016

qwertie mentioned this issue Aug 2, 2016

Understanding the post-drop block/br type system #736

Closed

qwertie mentioned this issue Feb 11, 2017

UTF-8 decoding of import/export names in JS #970

Closed

jfbastien closed this as completed May 11, 2017

qwertie mentioned this issue Nov 13, 2021

measure performance of other variable-length integer encoding schemes #601

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LES as the text format #697

LES as the text format #697

qwertie commented May 24, 2016 •

edited

Loading

kripken commented May 24, 2016

qwertie commented May 24, 2016 •

edited

Loading

kripken commented May 24, 2016

wanderer commented May 26, 2016

ghost commented Jun 14, 2016

qwertie commented Jun 14, 2016

ghost commented Jun 14, 2016

qwertie commented Jun 14, 2016

flagxor commented Jul 30, 2016

qwertie commented Aug 2, 2016

qwertie commented Aug 2, 2016 •

edited

Loading

qwertie commented Aug 2, 2016 •

edited

Loading

drom commented Aug 2, 2016 •

edited

Loading

qwertie commented Aug 2, 2016 •

edited

Loading

ghost commented Aug 2, 2016

qwertie commented Aug 2, 2016

kripken commented Aug 2, 2016

titzer commented Aug 2, 2016

qwertie commented Aug 21, 2016

jfbastien commented May 11, 2017

LES as the text format #697

LES as the text format #697

Comments

qwertie commented May 24, 2016 • edited Loading

Why use LES for WebAssembly?

What does Wasm+LES look like?

kripken commented May 24, 2016

qwertie commented May 24, 2016 • edited Loading

kripken commented May 24, 2016

wanderer commented May 26, 2016

ghost commented Jun 14, 2016

qwertie commented Jun 14, 2016

ghost commented Jun 14, 2016

qwertie commented Jun 14, 2016

flagxor commented Jul 30, 2016

qwertie commented Aug 2, 2016

qwertie commented Aug 2, 2016 • edited Loading

qwertie commented Aug 2, 2016 • edited Loading

drom commented Aug 2, 2016 • edited Loading

qwertie commented Aug 2, 2016 • edited Loading

ghost commented Aug 2, 2016

qwertie commented Aug 2, 2016

kripken commented Aug 2, 2016

titzer commented Aug 2, 2016

qwertie commented Aug 21, 2016

Bikeshedders Wanted

jfbastien commented May 11, 2017

qwertie commented May 24, 2016 •

edited

Loading

qwertie commented May 24, 2016 •

edited

Loading

qwertie commented Aug 2, 2016 •

edited

Loading

qwertie commented Aug 2, 2016 •

edited

Loading

drom commented Aug 2, 2016 •

edited

Loading

qwertie commented Aug 2, 2016 •

edited

Loading