Skip to content

WebAssembly support #503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
14 tasks
pdubroy opened this issue Apr 16, 2025 · 2 comments
Open
14 tasks

WebAssembly support #503

pdubroy opened this issue Apr 16, 2025 · 2 comments

Comments

@pdubroy
Copy link
Contributor

pdubroy commented Apr 16, 2025

Over the years we've had a few requests for a way to use Ohm from other languages — e.g. #348 mentions Dart, Rust, and Swift. Go and Python have also come up in private discussions. This proposal discusses one way we could support that: by implementing the equivalent of the Grammar.match in WebAssembly.

Goals

The primary goal is to be able to produce a WebAssembly module (.wasm) from an existing Ohm grammar. The Wasm module would export a match function, which could be used just like the existing Grammar.match method.

At minimum, we also need an API (or ABI) for walking the parse tree.

If possible, we'd also like to go a bit further:

  • support for incremental parsing
  • a well-defined way of either (a) compiling semantic actions to WebAssembly, or (b) writing semantic actions in the host language, using the parse tree API/ABI.
  • (maybe) replacing the existing Ohm.js core with the WebAsembly version

Non-goals

We won't support the full Ohm API from WebAssembly (trace, etc.) We assume that it's acceptable to run JavaScript code to produce the WebAssembly module.

Initial Plan

MVP (~2 weeks):

  • Support a subset of Ohm grammars with pure PEG features only (no left recursion, no parameterized rules, etc.)
    • Implement compilation to Wasm for a very minimal set of features, e.g. only terminals and rule applications. No memoization.
    • Add tests that use the grammar from JS and Go.
    • Add support for pure-PEG features: ranges, any, repetition, lookahead, sequence, choice
  • Add initial support for memoization (simple, not focused on performance yet)
    • At minimum, we need a reasonable hash map implementation - maybe in AssemblyScript or Rust no_std.
  • Implement some initial performance benchmarks. Probably using the pure-PEG ES5 grammar from our SLE17 paper.
  • First cut at semantic actions in Go.
    • Unclear what this should look like. Importing a table per operation?
  • Parameterized rules
  • Direct left recursion

Post MVP:

  • Grammar inheritance
  • Space skipping logic
  • Incremental parsing
  • Extend Ohm CLI to make it easy to create Wasm bundles
  • error handling
  • implementing trace?

Beyond:

Open questions

  • For runtime code (e.g. managing memo table, potentially indirect left recursion handling, etc.), what language should we use? AssemblyScript? Rust no_std?
  • Memory management: should we try Wasm GC? Some kind of arena allocation strategy?
    • Needed for: memo table, CST nodes, rule stack?
  • Are we ok with making some breaking changes to the Ohm API? Supporting only direct left recursion would allow us to significantly simplify the implementation.
@alexwarth
Copy link
Contributor

alexwarth commented Apr 16, 2025

Hi @pdubroy,

Overall this plan looks great to me, and reflects much of what we discussed yesterday.

(1) Re: the goals,

(maybe) replacing the existing Ohm.js core with the WebAsembly version

I'm going to push for this one to go from a maybe to a definitely. For compatibility with existing grammars, it would then be important to implement the whitespace-skipping for syntactic rules and grammar inheritance. (I don't expect either of these things to be tricky, it's just more work that would need to get done.)

(2)

At minimum, we need a reasonable hash map implementation - maybe in AssemblyScript or Rust no_std.

Not necessarily. It's common for Smalltalk VMs to implement polymorphic inline caches with a small list of (class, method implentation) tuples per call site. This list is traversed linearly: Is the receiver an instance of this class? If so, then that's the implementation of the method. If not, is the receiver an instance of the second class? (And so on.) I remember Eliot Miranda pointing out that this was way faster than using a hash map.

Assuming only a small number of rules is applied at any particular position, this scheme could work very well.

(3)

Are we ok with making some breaking changes to the Ohm API? Supporting only direct left recursion would allow us to significantly simplify the implementation.

Like I said yesterday, I'm all for getting nixing indirect left recursion -- it adds a lot of complexity to the implementation and on the user's side, it's difficult to think about and rarely necessary.

I also think that we could do a better job of designing an API for writing semantic actions now than we did back in the day. (We've had lots of experience using it in Typescript, and it's easy to see several opportunities for improvement.) But this doesn't have to be a breaking change -- for a while, we could support both the old style and a better alternative. The old stuff could be deprecated and eventually go away.

Excited about this!

-- Alex

@pdubroy
Copy link
Contributor Author

pdubroy commented Apr 16, 2025

It's common for Smalltalk VMs to implement polymorphic inline caches with a small list of (class, method implentation) tuples per call site. This list is traversed linearly: Is the receiver an instance of this class? If so, then that's the implementation of the method. If not, is the receiver an instance of the second class? (And so on.) I remember Eliot Miranda pointing out that this was way faster than using a hash map.

Right! You mentioned this yesterday. You're right, there may be simpler options too. Although a true inline cache unfortunately can't be done in Wasm, because the code isn't writeable. But still, you're right that we might not even need a hash table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants