WebAssembly support #503

pdubroy · 2025-04-16T13:52:29Z

Over the years we've had a few requests for a way to use Ohm from other languages — e.g. #348 mentions Dart, Rust, and Swift. Go and Python have also come up in private discussions. This proposal discusses one way we could support that: by implementing the equivalent of the Grammar.match in WebAssembly.

Goals

The primary goal is to be able to produce a WebAssembly module (.wasm) from an existing Ohm grammar. The Wasm module would export a match function, which could be used just like the existing Grammar.match method.

At minimum, we also need an API (or ABI) for walking the parse tree.

If possible, we'd also like to go a bit further:

support for incremental parsing
a well-defined way of either (a) compiling semantic actions to WebAssembly, or (b) writing semantic actions in the host language, using the parse tree API/ABI.
(maybe) replacing the existing Ohm.js core with the WebAsembly version

Non-goals

We won't support the full Ohm API from WebAssembly (trace, etc.) We assume that it's acceptable to run JavaScript code to produce the WebAssembly module.

Initial Plan

MVP (~2 weeks):

Support a subset of Ohm grammars with pure PEG features only (no left recursion, no parameterized rules, etc.)
- Implement compilation to Wasm for a very minimal set of features, e.g. only terminals and rule applications. No memoization.
- Add tests that use the grammar from JS and Go.
- Add support for pure-PEG features: ranges, any, repetition, lookahead, sequence, choice
Add initial support for memoization (simple, not focused on performance yet)
- At minimum, we need a reasonable hash map implementation - maybe in AssemblyScript or Rust no_std.
Implement some initial performance benchmarks. Probably using the pure-PEG ES5 grammar from our SLE17 paper.
First cut at semantic actions in Go.
- Unclear what this should look like. Importing a table per operation?
Parameterized rules
Direct left recursion

Post MVP:

Beyond:

Open questions

For runtime code (e.g. managing memo table, potentially indirect left recursion handling, etc.), what language should we use? AssemblyScript? Rust no_std?
Memory management: should we try Wasm GC? Some kind of arena allocation strategy?
- Needed for: memo table, CST nodes, rule stack?
Are we ok with making some breaking changes to the Ohm API? Supporting only direct left recursion would allow us to significantly simplify the implementation.

The text was updated successfully, but these errors were encountered:

alexwarth · 2025-04-16T17:24:23Z

Hi @pdubroy,

Overall this plan looks great to me, and reflects much of what we discussed yesterday.

(1) Re: the goals,

(maybe) replacing the existing Ohm.js core with the WebAsembly version

I'm going to push for this one to go from a maybe to a definitely. For compatibility with existing grammars, it would then be important to implement the whitespace-skipping for syntactic rules and grammar inheritance. (I don't expect either of these things to be tricky, it's just more work that would need to get done.)

(2)

At minimum, we need a reasonable hash map implementation - maybe in AssemblyScript or Rust no_std.

Not necessarily. It's common for Smalltalk VMs to implement polymorphic inline caches with a small list of (class, method implentation) tuples per call site. This list is traversed linearly: Is the receiver an instance of this class? If so, then that's the implementation of the method. If not, is the receiver an instance of the second class? (And so on.) I remember Eliot Miranda pointing out that this was way faster than using a hash map.

Assuming only a small number of rules is applied at any particular position, this scheme could work very well.

(3)

Are we ok with making some breaking changes to the Ohm API? Supporting only direct left recursion would allow us to significantly simplify the implementation.

Like I said yesterday, I'm all for getting nixing indirect left recursion -- it adds a lot of complexity to the implementation and on the user's side, it's difficult to think about and rarely necessary.

I also think that we could do a better job of designing an API for writing semantic actions now than we did back in the day. (We've had lots of experience using it in Typescript, and it's easy to see several opportunities for improvement.) But this doesn't have to be a breaking change -- for a while, we could support both the old style and a better alternative. The old stuff could be deprecated and eventually go away.

Excited about this!

-- Alex

pdubroy · 2025-04-16T20:00:25Z

It's common for Smalltalk VMs to implement polymorphic inline caches with a small list of (class, method implentation) tuples per call site. This list is traversed linearly: Is the receiver an instance of this class? If so, then that's the implementation of the method. If not, is the receiver an instance of the second class? (And so on.) I remember Eliot Miranda pointing out that this was way faster than using a hash map.

Right! You mentioned this yesterday. You're right, there may be simpler options too. Although a true inline cache unfortunately can't be done in Wasm, because the code isn't writeable. But still, you're right that we might not even need a hash table.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebAssembly support #503

WebAssembly support #503

pdubroy commented Apr 16, 2025 •

edited by alexwarth

Loading

alexwarth commented Apr 16, 2025 •

edited

Loading

pdubroy commented Apr 16, 2025

WebAssembly support #503

WebAssembly support #503

Comments

pdubroy commented Apr 16, 2025 • edited by alexwarth Loading

Goals

Non-goals

Initial Plan

Open questions

alexwarth commented Apr 16, 2025 • edited Loading

pdubroy commented Apr 16, 2025

pdubroy commented Apr 16, 2025 •

edited by alexwarth

Loading

alexwarth commented Apr 16, 2025 •

edited

Loading