Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate h64 to BigInts, add h32 and h64 streaming APIs, plus performance improvements #26

Merged
merged 12 commits into from
Jan 14, 2022

Conversation

marcusdarmstrong
Copy link
Contributor

Resolves #9 and #25.

This PR includes changes to significantly improve the performance of the existing APIs (mostly as discussed in #25), as well as changes to migrate the h64 APIs to use BigInts and add new streaming APIs for both h32 and h64.

Open Questions

I'll start with my "open questions"... I've made choices here for purposes of posting the PR, but I'd describe them as the most shaky portion of the changeset, and accordingly warrant some discussion.

  1. string behavior: I've maintained the original API's type contract of returning strings from the h32 and h64 functions, however, I don't feel great about that. The fact that these APIs take strings is a bit orthogonal to the idea that they should return them, and incurs a performance cost in the case where the raw numeric hash values are an appropriate result of a string hash. That said, there's certainly some ergonomics to be concerned with, as most users certainly would expect hex string return values for hashes. I've also zero-padded in the general case here, again to align with my perception of user expectation.
  2. BigInt/number asymmetry between 32-bit and 64-bit APIs: Since the 32 bit API can express its seed/value via a number I've retained the existing API contracts there, however, there's an argument to be made that we can/should use BigInts for both to provide a completely consistent API between the algorithms.
  3. Hash<T>.update accepts string | Uint8Array, rather than separate update/updateRaw APIs: this matches the node crypto API, but diverges from the h*/h*Raw dichotomy. Removing the branching might provide a performance improvement, so I'm certainly open to either option.
  4. There's currently no changelog in the package. I'd be happy to add one as a part of this PR, but if you prefer to manage that information some other way, that's fine of course.
  5. I didn't do the semver bump here, but obviously that's easy if desired.

Downsides

  1. Bundle size increase: The streaming APIs added a significant amount of webassembly to the package, and the JS-size increased as well to support them. Theoretically this could be trimmed down a bit with some work (for example, by removing the ability of the wasm module to hash at an arbitrary point in the linear memory, since it's always invoked from 0), but that seemed like an unwarranted tradeoff to me, and it's doubtful we could ever keep the expanded set of functionality within the original bundlesize spec.
  2. Requires bulk memory operations, BigInts, and TextEncoder.encodeInto: These changes use some new APIs and change the target of the library accordingly. Bulk Memory Operations are available in Chrome >=75, Firefox >= 79, Safari >= 15, and Node >= 12.5 (behind a flag) . WASM BigInt support is available in Chrome >=85, Firefox 78, Safari >= 14.1, and Node >= 15. TextEncoder.encodeInto is available in Chrome >= 74, Firefox >= 66, Safari >= 14.1, and Node >= 12.11.

The end result there is that the practical library target is now:

  • Chrome >= 85
  • Firefox >= 79
  • Safari >= 15
  • Node >= 15

Benchmarks

Benchmarks run on a 2019 x64 MacBook Pro running Node 17.3.0. xxhash-wasm-1 is obviously the version from this PR, while xxhash-wasm is 0.4.2.

xxhash-wasm#h32 x 850,309 ops/sec ±3.17% (74 runs sampled)
xxhash-wasm#h64 x 726,678 ops/sec ±3.66% (79 runs sampled)
xxhash-wasm#h64Raw x 2,826,606 ops/sec ±0.95% (94 runs sampled)
xxhash-wasm-1#h32 x 3,901,259 ops/sec ±1.22% (92 runs sampled)
xxhash-wasm-1#h64 x 3,459,188 ops/sec ±1.19% (90 runs sampled)
xxhash-wasm-1#h64Raw x 9,977,528 ops/sec ±1.20% (90 runs sampled)
Benchmark 1 bytes - Fastest is xxhash-wasm-1#h64Raw

xxhash-wasm#h32 x 1,110,898 ops/sec ±4.64% (75 runs sampled)
xxhash-wasm#h64 x 731,355 ops/sec ±3.18% (80 runs sampled)
xxhash-wasm#h64Raw x 1,073,810 ops/sec ±1.50% (87 runs sampled)
xxhash-wasm-1#h32 x 3,931,655 ops/sec ±1.02% (88 runs sampled)
xxhash-wasm-1#h64 x 3,505,964 ops/sec ±1.02% (91 runs sampled)
xxhash-wasm-1#h64Raw x 10,219,203 ops/sec ±0.96% (94 runs sampled)
Benchmark 10 bytes - Fastest is xxhash-wasm-1#h64Raw

xxhash-wasm#h32 x 846,448 ops/sec ±3.35% (76 runs sampled)
xxhash-wasm#h64 x 586,251 ops/sec ±3.37% (80 runs sampled)
xxhash-wasm#h64Raw x 155,676 ops/sec ±1.15% (92 runs sampled)
xxhash-wasm-1#h32 x 2,056,868 ops/sec ±2.52% (91 runs sampled)
xxhash-wasm-1#h64 x 3,249,041 ops/sec ±1.31% (92 runs sampled)
xxhash-wasm-1#h64Raw x 8,250,841 ops/sec ±2.24% (88 runs sampled)
Benchmark 100 bytes - Fastest is xxhash-wasm-1#h64Raw

xxhash-wasm#h32 x 642,905 ops/sec ±2.57% (73 runs sampled)
xxhash-wasm#h64 x 604,470 ops/sec ±2.29% (73 runs sampled)
xxhash-wasm#h64Raw x 16,618 ops/sec ±0.93% (95 runs sampled)
xxhash-wasm-1#h32 x 2,323,392 ops/sec ±1.75% (91 runs sampled)
xxhash-wasm-1#h64 x 2,552,064 ops/sec ±0.85% (94 runs sampled)
xxhash-wasm-1#h64Raw x 5,657,834 ops/sec ±1.47% (95 runs sampled)
Benchmark 1000 bytes - Fastest is xxhash-wasm-1#h64Raw

xxhash-wasm#h32 x 165,309 ops/sec ±1.97% (86 runs sampled)
xxhash-wasm#h64 x 171,561 ops/sec ±2.38% (88 runs sampled)
xxhash-wasm#h64Raw x 1,644 ops/sec ±1.17% (92 runs sampled)
xxhash-wasm-1#h32 x 548,383 ops/sec ±1.03% (94 runs sampled)
xxhash-wasm-1#h64 x 831,613 ops/sec ±1.12% (95 runs sampled)
xxhash-wasm-1#h64Raw x 1,162,121 ops/sec ±0.96% (93 runs sampled)
Benchmark 10000 bytes - Fastest is xxhash-wasm-1#h64Raw

xxhash-wasm#h32 x 19,001 ops/sec ±3.24% (83 runs sampled)
xxhash-wasm#h64 x 22,422 ops/sec ±1.44% (87 runs sampled)
xxhash-wasm#h64Raw x 166 ops/sec ±1.13% (85 runs sampled)
xxhash-wasm-1#h32 x 60,206 ops/sec ±0.84% (90 runs sampled)
xxhash-wasm-1#h64 x 98,119 ops/sec ±0.75% (94 runs sampled)
xxhash-wasm-1#h64Raw x 118,755 ops/sec ±0.88% (96 runs sampled)
Benchmark 100000 bytes - Fastest is xxhash-wasm-1#h64Raw

xxhash-wasm#h32 x 1,832 ops/sec ±1.71% (83 runs sampled)
xxhash-wasm#h64 x 2,152 ops/sec ±1.28% (87 runs sampled)
xxhash-wasm#h64Raw x 16.18 ops/sec ±1.53% (44 runs sampled)
xxhash-wasm-1#h32 x 5,665 ops/sec ±0.83% (94 runs sampled)
xxhash-wasm-1#h64 x 8,587 ops/sec ±0.99% (90 runs sampled)
xxhash-wasm-1#h64Raw x 10,593 ops/sec ±1.06% (92 runs sampled)
Benchmark 1000000 bytes - Fastest is xxhash-wasm-1#h64Raw

xxhash-wasm#h32 x 142 ops/sec ±0.58% (80 runs sampled)
xxhash-wasm#h64 x 156 ops/sec ±0.59% (79 runs sampled)
xxhash-wasm#h64Raw x 0.83 ops/sec ±1.97% (7 runs sampled)
xxhash-wasm-1#h32 x 334 ops/sec ±1.12% (83 runs sampled)
xxhash-wasm-1#h64 x 429 ops/sec ±0.99% (89 runs sampled)
xxhash-wasm-1#h64Raw x 588 ops/sec ±0.77% (86 runs sampled)
Benchmark 10000000 bytes - Fastest is xxhash-wasm-1#h64Raw

xxhash-wasm#h32 x 12.73 ops/sec ±3.03% (36 runs sampled)
xxhash-wasm#h64 x 14.02 ops/sec ±0.49% (37 runs sampled)
xxhash-wasm#h64Raw x 0.03 ops/sec ±0.51% (5 runs sampled)
xxhash-wasm-1#h32 x 28.65 ops/sec ±0.54% (51 runs sampled)
xxhash-wasm-1#h64 x 35.64 ops/sec ±0.50% (62 runs sampled)
xxhash-wasm-1#h64Raw x 44.88 ops/sec ±0.28% (59 runs sampled)
Benchmark 100000000 bytes - Fastest is xxhash-wasm-1#h64Raw

Change Overview

  1. h64 now uses BigInts for both seeds as well as return values from wasm. The seed is now communicated via an argument rather than memory writes. Raw now directly returns a BigInt, rather than a Uint8Array.
  2. The Uint8Array providing access to the wasm memory buffer is now reused and only allocated on memory growth.
  3. string encoding has been moved to use TextEncoder.encodeInto, after a bunch of benchmarking. In Openness to significant contributions? #25 I had noted large performance differences between Buffer.from and TextEncoder—it turns out those differences are mostly ARM-specific (my cited benchmark runs in that discussion are from an M1 MacBook Pro). While Buffer.from does still outperform TextEncoder.encode on small inputs, eliminating the second memcpy involved in using those APIs by using TextEncoder.encodeInto to directly encode into the wasm linear memory yields substanial improvements, as documented above. The downside here is that we have to overprovision the wasm memory to account for the risk that we're encoding high-byte-size code points, but should that extra memory use be a consideration for a particular use case, the Raw APIs should provide a reasonable means of handling it.
  4. I've added two new methods to the primary exports: create32 and create64, which allow creation of stateful hashes that can be used for streaming hash applications. These hashers are closed over an xxh state object which we push into the wasm memory whenever we need to update or digest the hash, and read out afterwards. The WAT is a direct port of the C implementation of xxh, done very manually.

In general, the JS code is extremely divergent from the previous version, so it's probably easiest to review as if it were entirely new code.

Please let me know what adjustments you'd like to see and I'll happily get on them ASAP.

@jungomi
Copy link
Owner

jungomi commented Jan 13, 2022

Thank you for your work.

The CI is failing because it has Node 12 and 14 in there, but with the requirement now being 15, these need to be removed. While they are still on Maintenance LTS, it is okay to move on and only support active LTS (Node 16) and the current release. Although at least locally, I had an issue with installing the dependencies on Node 17, because iltorb (needed by bundlesize) could not be compiled. Until that is sorted out it's probably okay to only have 16 in CI.

For the test coverage, there is one branch that is not covered:

----------|---------|----------|---------|---------|-------------------
File      | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s
----------|---------|----------|---------|---------|-------------------
All files |   93.75 |       90 |     100 |   93.75 |
 index.js |   93.75 |       90 |     100 |   93.75 | 87-89
----------|---------|----------|---------|---------|-------------------

https://github.com/marcusdarmstrong/xxhash-wasm/blob/16ce66b99df755f8ee7b1bbd6c8bba1daf6bb0cb/src/index.js#L87-L89

That's for the streaming API with raw inputs (Uint8Array). Adding test cases to verify that both outputs are the same would be good.

Also for the streaming API it would be nice to have a short usage example in the Readme.

Regarding your open questions:

  1. string behavior: I completely agree with you, but I do think that is pretty much a nice balance between user expectations and performance. Maybe that should just be emphasised in the Readme, potentially with a separate section.
  2. BigInt / number asymmetry between 32-bit and 64-bit APIs: I'm fairly indifferent about that. Maybe one point that I'm wondering: if you pass a number to the 64-bit API is it still handled correctly and just limits the maximum value or are the bytes misinterpreted by the WASM (as in float in memory is interpreted as a long)? If it just works but is limited, I don't really mind, but otherwise it could also just handle both, which should hopefully just be a conversion. Frankly, it is up to you what you would prefer.
  3. Hash<T>.update accepts string | Uint8Array , rather than separate update/updateRaw APIs: That's fine, I wouldn't worry about the branching, in that context that is completely insignificant.
  4. There's currently no changelog in the package.: I usually pu the change notes in the releases https://github.com/jungomi/xxhash-wasm/releases or they used to be in the just in the tags before I really used the releases. And now the package will be published automatically when a new release is created, so all new releases will have the change notes. If you want, we could also add a CHANGELOG.md with all of them in one place.
  5. I didn't do the semver bump here: I tend to only bump the version just before the release. If it had a regular release schedule I would prefer to bump it straight after the release.

About the downsides:

  1. Bundle size increase: It is still very small and the features definitely warrant that.
  2. Requires bulk memory operations, BigInts, and TextEncoder.encodeInto: They are fairly new, but at some point we need to move forward and it seems like a good point to do that, considering the improvements and that they have at least been supported for a little while. Maybe it would just be good to also put these requirements in the Readme and suggest to use v0.4.2 for older versions, which cannot take advantage of these performance improvements.

Besides that, it looks good to me and the improvements are fantastic, particularly for the 64-bit version.

@marcusdarmstrong
Copy link
Contributor Author

if you pass a number to the 64-bit API is it still handled correctly and just limits the maximum value or are the bytes misinterpreted by the WASM (as in float in memory is interpreted as a long)?

Passing a number as a BigInt seed is a runtime error: TypeError: Cannot convert 1 to a BigInt. My preference would be to avoid the conversion, as constructing a new BigInt is an allocation and as such involves some legitimate overhead.

I've added the relevant testcases, removed the no-longer-compatible CI targets, and updated the readme with 1. a streaming example, 2. a mention of the limitations of the string api and 3. a discussion of engine requirements. After some thought, I've also added two new API methods here—feedback of course welcome—h32String and h64String that provide raw numeric hashes with the performance of the encodeInto API. Leaving an ergonomic default but providing access to the best possible performance case seemed like the best option to me... There's unfortunately no way to emulate the encodeInto performance via the Raw methods.

@jungomi
Copy link
Owner

jungomi commented Jan 13, 2022

if you pass a number to the 64-bit API is it still handled correctly and just limits the maximum value or are the bytes misinterpreted by the WASM (as in float in memory is interpreted as a long)?

Passing a number as a BigInt seed is a runtime error: TypeError: Cannot convert 1 to a BigInt. My preference would be to avoid the conversion, as constructing a new BigInt is an allocation and as such involves some legitimate overhead.

Ah okay, then it's fine as it is.

The h32String/h64String methods are definitely a nice solution, but the names are counter-intuitive, because it would kind of be expected to be the other way around. That brings up the question, whether the default methods (h32/h64) should actually be returning strings or whether the string version is an additional one. In terms of API design it's certainly more elegant to have the default returning numbers and have a convenience method to directly convert it to a string.
If users expect a string output from the default methods (probably mostly due to backwards-compatibility), I think the h**String should be renamed to something that better indicates the difference being it returning a number instead of a string.
I'm personally in favour of having the default methods return a number and adding the convenience methods for string outputs, but the only thing I'm worried about if these changes will get caught, because if the TypeScript types are not checked, it could lead to silent misbehaviour (most likely with string concatenation), because there will not be any runtime error from that.
There are other breaking changes and it's a new major version, so I guess it doesn't really matter. What do you think?

Otherwise the rest looks fine to me.

README.md Show resolved Hide resolved
@marcusdarmstrong
Copy link
Contributor Author

I'm personally in favour of having the default methods return a number and adding the convenience methods for string outputs, but the only thing I'm worried about if these changes will get caught, because if the TypeScript types are not checked, it could lead to silent misbehaviour (most likely with string concatenation), because there will not be any runtime error from that.
There are other breaking changes and it's a new major version, so I guess it doesn't really matter. What do you think?

I completely agree. I think the risk of such a structural change going unnoticed on a major version update is quite minor, particularly if it's highlighted in the release notes.

As far as naming goes, how about...

h64(string[, BigInt]): BigInt
h64Raw(UInt8Array[, BigInt]: BigInt
h64Hex(string[,BigInt]): string

@jungomi
Copy link
Owner

jungomi commented Jan 14, 2022

h64Hex I don't associate Hex with string. I think it's easiest to keep it close to any string conversion naming, either h64ToString to suggest the .toString(), or maybe h64AsString. The fact that it's converted to a zero padded hex representation is the implementation detail, which is what would be expected from a .toString() if there were a native hex number type.
I also thought your current implementation with h64String was fine, just having them the other way around to match what would be my understanding of the names.

@marcusdarmstrong
Copy link
Contributor Author

Works for me, I'll go with h64ToString().

Copy link
Owner

@jungomi jungomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last couple of remarks just for the README

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@jungomi jungomi merged commit 8a24745 into jungomi:main Jan 14, 2022
@jungomi
Copy link
Owner

jungomi commented Jan 14, 2022

Thank you very much for your contributions. I'll be looking to prepare the release so that v1.0.0 will be published in the next few days.

@jungomi jungomi mentioned this pull request Jan 14, 2022
@extremeheat
Copy link

Regarding the switch from two numbers to BigInt for 64-bit seeds: my understanding is that BigInts are much slower for almost all operations done on them--so is it safe to assume that the performance impact here for call overhead has been tested to be negligible?

@marcusdarmstrong
Copy link
Contributor Author

marcusdarmstrong commented Feb 7, 2022

In short, yes. I did detailed performance evaluation for every change included in this PR—in practice, there’s very little in the way of actual operations that need to be performed on a bigint in the relevant code paths: the math is all done on i64 types inside of wasm and then boxed to a bigint on the way out. The only actual bigint operation is the u64-izing, which, while it has overhead, is significantly cheaper than the hi/low munging previously in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose a streaming api?
3 participants