Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wasm32v1-none support #541

Closed
wants to merge 2 commits into from
Closed

Conversation

daxpedda
Copy link
Contributor

@daxpedda daxpedda commented Nov 21, 2024

Changes:

  • Add default-features = false to the js-sys and wasm-bindgen-test dependencies, enabling no_std support.
  • The crate feature std now enabled std crate features of wasm-bindgen, js-sys and wasm-bindgen-test.
  • Use any(target_os = "unknown", target_os = "none") instead of target_os = "unknown".
  • thread_local! is only supported with std, so a fallback to static with once_cell::unsync::Lazy is added. This is safe because Wasm doesn't support threads without target_feature = "atomics".
  • When detecting target_feature = "atomics" without feature = "std", use #[thread_local].
  • Now wasm32-unknown-unknown is properly supported without pulling in std.

I was unable to add actual no_std tests without some ugly modifications to it. Especially because of the whole harness = false thing. Let me know how you want me to proceed.

This depends on rustwasm/wasm-bindgen#4277 and a new wasm-bindgen release.
Was released.

@daxpedda daxpedda force-pushed the wasm32v1-none branch 4 times, most recently from 6002c05 to 48d38ce Compare November 21, 2024 11:56
@daxpedda
Copy link
Contributor Author

Current CI failure is unrelated.

@newpavlov
Copy link
Member

The CI failures are fixed in #542.

@daxpedda daxpedda force-pushed the wasm32v1-none branch 4 times, most recently from f18f3b3 to ff7a200 Compare November 25, 2024 07:48
@daxpedda
Copy link
Contributor Author

Hm, CI still has an unrelated failure.

@newpavlov
Copy link
Member

newpavlov commented Nov 25, 2024

Yes, see rust-lang/rust#133401. We may disable the GNU/Hurd job since it's not the first time the std breaks this target.

UPD: Fixed in #543.

@daxpedda daxpedda force-pushed the wasm32v1-none branch 2 times, most recently from 84af741 to 604082a Compare November 29, 2024 17:35
@daxpedda daxpedda marked this pull request as ready for review November 29, 2024 17:43
@daxpedda
Copy link
Contributor Author

Rebased after #551 was merged.

@newpavlov
Copy link
Member

newpavlov commented Dec 2, 2024

How expensive is the getrandom_init call in practice? Maybe we should just skip the thread-local dance and obtain fresh RngSource on each call? As a small optimization we could cache the source kind in an atomic static.

Also, is it common to use one (non-trivial) WASM file in both Web and Node environments simultaneously, i.e. maybe we should split wasm_js into wasm_web_js and wasm_node_js?

@daxpedda
Copy link
Contributor Author

daxpedda commented Dec 4, 2024

How expensive is the getrandom_init call in practice? Maybe we should just skip the thread-local dance and obtain fresh RngSource on each call? As a small optimization we could cache the source kind in an atomic static.

For Web that seems reasonable. NodeJS seems to be quite involved and I would argue that a thread-local is justified.

EDIT: I will do some tests and get back to you.

Also, is it common to use one (non-trivial) WASM file in both Web and Node environments simultaneously, i.e. maybe we should split wasm_js into wasm_web_js and wasm_node_js?

AFAIK it is common to do it together. I'm not NodeJS expert, but whats strange to me is that the implementation is different to begin with.

@newpavlov
Copy link
Member

newpavlov commented Dec 4, 2024

I've opened #557 to explore how such simplification could look like. Could you measure whether there is a measurable performance impact or not?

@daxpedda
Copy link
Contributor Author

daxpedda commented Dec 4, 2024

I've opened #557 to explore how such simplification could look like. Could you measure whether there is a measurable performance impact or not?

I did some local benchmarking with Firefox and Chrome and I end up with around 40ns. Its only one FFI call after all, js_sys::global() is cached with a thread_local internally.

NodeJS is around 3 calls (I replaced the "crypto" string with a #[wasm_bindgen(static_string)]). However its generally a bit slower then more up-to-date browsers and ends up around 340ns.

However, I didn't dig into the specifics on why the NodeJS implementation is so special, the API should be the same. My local testing shows that NodeJS works just fine with the browser API.

  • Browser: 40ns
  • NodeJS: 340ns
  • NodeJS with Browser API: 45ns

So all in all I think this is acceptable. I will add the implementation to that effect to this PR tonight.

I also had a couple of optimizations in mind:

  • Replace the "crypto" string in the NodeJS implementation with a #[wasm_bindgen(static_string)] cached JsString.
  • Get rid of the Uint8Array when not using shared memory.
  • Cache the Crypto object via a #[wasm_bindgen(thread_local)]. This should basically put us back to the custom thread-local implementation before falling back to the current implementation, which I'm still unsure who it is for actually.

@newpavlov
Copy link
Member

newpavlov commented Dec 4, 2024

I didn't dig into the specifics on why the NodeJS implementation is so special, the API should be the same.

It's probably because a full Web Crypto support was added only in Node.js 19 (2022-10-18) and getRandomValues() was techincally supported since 17.4 (2022-01-18).

We had releases with WASM support a fair time before that, so we just kept the code as-is without considering whether the fallback is needed or not. I guess we can just drop the alternative Node.js path completely for the v0.3 release.

I will add the implementation to that effect to this PR tonight.

We also can continue with #557. I will leave the choice to you.

Get rid of the Uint8Array when not using shared memory.

Can you demonstrate how this detection could look like?

@daxpedda
Copy link
Contributor Author

daxpedda commented Dec 4, 2024

I will add the implementation to that effect to this PR tonight.

We also can continue with #557. I will leave the choice to you.

Sure, I'm a fan of splitting things up.

Get rid of the Uint8Array when not using shared memory.

Can you demonstrate how this detection could look like?

We can simply use wasm_bindgen::memory() to determine if the WebAssembly.Memory.buffer is an ArrayBuffer or a SharedArrayBuffer. Afterwards we can easily cache it in a static atomic.

@daxpedda
Copy link
Contributor Author

daxpedda commented Dec 4, 2024

Closing in favor of #557.

@daxpedda daxpedda closed this Dec 4, 2024
This was referenced Dec 5, 2024
newpavlov pushed a commit that referenced this pull request Dec 5, 2024
In #557 we forgot to port the `no_std` support from #541.

This also adds running Clippy for `wasm32v1-none`.
newpavlov pushed a commit that referenced this pull request Dec 9, 2024
As discussed in #541. This PR adds the following improvements:
- Caching of the global `Crypto` object.
- Detecting if our Wasm memory is based on a `SharedArrayBuffer`. If
not, we can copy bytes directly into our memory instead of having to go
through JS. This saves allocating the buffer in JS and copying the bytes
into Wasm memory. This is also the most common path. `SharedArrayBuffer`
requires `target_feature = "atomics"`, which is unstable and requires
Rust nightly. See
#559 (comment)
for full context.
- The atomic path only creates a sub-array when necessary, potentially
saving another FFI call.
- The atomic path will now allocate an `Uint8Array` with the minimum
amount of bytes necessary instead of a fixed size.
- The maximum chunk size for the non-atomic path and the maximum
`Uint8Array` size for the atomic paths have been increased to 65536
bytes: the maximum allowed buffer size for `Crypto.getRandomValues()`.

All in all this should give a performance improvement of ~5% to ~500%
depending on the amount of requested bytes and which path is taken. See
#559 (comment)
for some benchmark results.

This spawned a bunch of improvements and fixes in `wasm-bindgen` that
are being used here:
- rustwasm/wasm-bindgen#4315
- rustwasm/wasm-bindgen#4316
- rustwasm/wasm-bindgen#4318
- rustwasm/wasm-bindgen#4319
- rustwasm/wasm-bindgen#4340
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants