Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Transition Cranelift and Wasmtime to new x86-64 backend by default. #10

Merged
merged 4 commits into from
Mar 26, 2021

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Mar 12, 2021

Rendered

This RFC proposes to switch the default Cranelift compiler backend for the cranelift-codegen crate (i.e., use of Cranelift as a library) and for Wasmtime from the old x86-64 backend to the new x86-64 backend. Details on how we should decide, what steps we should take to build confidence, ensure backwards compatibility, and gradually switch over are proposed.

[open-questions]: #open-questions

1. How much compile-time and runtime performance degradation should we
accept, if any, before moving forward with this transition?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to help guiding the answer to this question, how about another question: from the performance analysis that we've made on the new backend, it would be interesting to get a rough idea of the speedups we can reach, and with which associated cost? What I mean is, if we think we can get e.g. a 10% speedup easily (famous last words, but imagine in e.g. one or two months of work), then it makes it easier to accept a degradation in the short-term.

@cfallin
Copy link
Member Author

cfallin commented Mar 18, 2021

I spent some time today running Sightglass and looking at the results. The following numbers are from 184a039 in my new-backend branch vs. current main, which is 5fecdfa. All time measurements are on Linux/x86-64 on a Ryzen 3900X @ 3.8GHz, with 3 iterations, arithmetic mean for times and geometric mean across speedups:

Compilation time

|--------------------|--------------------|--------------------|--------|
|       blake3-scalar|       68972585.0 ns|      146270137.0 ns|  0.472x|
|         blake3-simd|       17564151.7 ns|       32353641.7 ns|  0.543x|
|                noop|        9321450.7 ns|       17151835.7 ns|  0.543x|
|      pulldown-cmark|      144998015.3 ns|      300286274.7 ns|  0.483x|
|  shootout-ackermann|       42469345.0 ns|       49661814.3 ns|  0.855x|
|     shootout-base64|       41245124.3 ns|       44928123.3 ns|  0.918x|
|      shootout-ctype|       39737061.7 ns|       43569184.7 ns|  0.912x|
|    shootout-ed25519|       60436223.0 ns|      179996537.0 ns|  0.336x|
|       shootout-fib2|       39346114.7 ns|       40560157.7 ns|  0.970x|
|      shootout-gimli|        3267325.7 ns|        5164884.3 ns|  0.633x|
|   shootout-heapsort|       10727494.3 ns|       16877485.3 ns|  0.636x|
|     shootout-keccak|       11902052.0 ns|       23934554.3 ns|  0.497x|
|     shootout-matrix|       41642608.7 ns|       43212095.3 ns|  0.964x|
|    shootout-memmove|       10944073.7 ns|       15006035.0 ns|  0.729x|
|    shootout-minicsv|        5230805.0 ns|        8374252.0 ns|  0.625x|
| shootout-nestedloop|       39301089.0 ns|       41444058.7 ns|  0.948x|
|     shootout-random|       37677048.0 ns|       39413521.7 ns|  0.956x|
|  shootout-ratelimit|       42176894.7 ns|       42664704.3 ns|  0.989x|
|    shootout-seqhash|       10257756.3 ns|       13729952.7 ns|  0.747x|
|      shootout-sieve|       38677786.7 ns|       41630312.3 ns|  0.929x|
|     shootout-switch|     1246419944.3 ns|       45154181.7 ns| 27.604x|
|  shootout-xblabla20|       10846952.7 ns|       15154782.7 ns|  0.716x|
|  shootout-xchacha20|       10538490.7 ns|       13492446.7 ns|  0.781x|
|--------------------|--------------------|--------------------|--------|
Geomean: 0.828489

Execution time

|--------------------|--------------------|--------------------|--------|
|       blake3-scalar|         124991.7 ns|         161477.0 ns|  0.774x|
|         blake3-simd|         191938.3 ns|         268173.7 ns|  0.716x|
|                noop|             92.3 ns|             69.0 ns|  1.338x|
|      pulldown-cmark|        2915551.0 ns|        2601125.3 ns|  1.121x|
|  shootout-ackermann|            125.7 ns|            142.3 ns|  0.883x|
|     shootout-base64|      120473405.7 ns|      118650169.7 ns|  1.015x|
|      shootout-ctype|      327814007.0 ns|      288553357.7 ns|  1.136x|
|    shootout-ed25519|     3285211842.3 ns|     3355830768.3 ns|  0.979x|
|       shootout-fib2|     1288032348.7 ns|     1121754822.3 ns|  1.148x|
|      shootout-gimli|        3953612.7 ns|        3842533.3 ns|  1.029x|
|   shootout-heapsort|      941783516.7 ns|      905448764.0 ns|  1.040x|
|     shootout-keccak|       11200864.7 ns|       11193635.0 ns|  1.001x|
|     shootout-matrix|      138431299.7 ns|       94843781.7 ns|  1.460x|
|    shootout-memmove|      478948004.0 ns|      312724823.7 ns|  1.532x|
|    shootout-minicsv|      719886639.3 ns|      553181699.7 ns|  1.301x|
| shootout-nestedloop|        9017709.0 ns|        7310325.0 ns|  1.234x|
|     shootout-random|      141097573.0 ns|      141046267.3 ns|  1.000x|
|  shootout-ratelimit|       15340715.7 ns|       14238039.7 ns|  1.077x|
|    shootout-seqhash|     3305572758.3 ns|     3652199454.7 ns|  0.905x|
|      shootout-sieve|      339462468.7 ns|      432978618.7 ns|  0.784x|
|     shootout-switch|       45572014.0 ns|       36769437.7 ns|  1.239x|
|  shootout-xblabla20|        1620894.0 ns|        2381350.3 ns|  0.681x|
|  shootout-xchacha20|        3260006.0 ns|        2719482.7 ns|  1.199x|
|--------------------|--------------------|--------------------|--------|
Geomean: 1.046229

Thoughts

This is somewhat surprising to me; earlier measurements showed different results. For example, on Lucet's benchmarks when I last measured: on Dec 1, I recorded a 20% geomean perf improvement on that suite. I've generally measured compile time to improve relative to the old backend, e.g. about 1.5-2x last summer, though I haven't done a measurement recently.

So, something seems to have regressed; all the more motivation for continuous benchmarking. I haven't tried to bisect the shift.

That said, we still see a modest gain in execution speed on most benchmarks -- geomean of 4.6% better. A compilation time of 18% slower is a little more worrying -- a quick look with perf shows a bunch of time spent in regalloc.rs. I'm currently looking at options for regalloc improvements and am hoping to have something in the next few weeks.

I would propose that we still move forward with the transition, but this increases the urgency with which I'm working on this!

@cfallin
Copy link
Member Author

cfallin commented Mar 25, 2021

Motion to finalize with a disposition to merge

So there is not much discussion here, but there also seem to be few objections. Given that runtime perf is generally positive, compatibility check-boxes are complete, and the maintenance story improves greatly once this moves forward, I'm proposing that we merge this RFC.

Stakeholders sign-off

Tagging all employees of BA-affiliated companies who have committed to the Wasmtime or Lucet repos in the last three months, plus anyone who has given feedback on this PR as a stakeholder.

Fastly

Intel

IBM

Mozilla

Other Stakeholders

@jlb6740
Copy link

jlb6740 commented Mar 26, 2021

I think the RFC looks great. Considering no alarms of missing features so far from those who have made the switch, 5 can be done after the switch before the removal of the legacy backend.

@cfallin
Copy link
Member Author

cfallin commented Mar 26, 2021

Entering Final Call Period

https://github.com/bytecodealliance/rfcs/blob/main/accepted/rfc-process.md#making-a-decision-merge-or-close

Once any stakeholder from a different group has signed off, the RFC will move into a 10 calendar day final comment period (FCP), long enough to ensure that other stakeholders have at least a full business week to respond.

The FCP will end on Mon Apr 5.

@uweigand
Copy link
Member

I agree as well.

@cfallin
Copy link
Member Author

cfallin commented Mar 26, 2021

Thanks all! As we actually have a signoff from every group of stakeholders, it looks like we can close the FCP early and go ahead with the merge:

https://github.com/bytecodealliance/rfcs/blob/main/accepted/rfc-process.md#making-a-decision-merge-or-close

Finally, the RFC is automatically merged/close if either:

  • The FCP elapses without any objections.
  • A stakeholder from each group has signed off, short-cutting the waiting period.

@cfallin cfallin merged commit 03dff33 into bytecodealliance:main Mar 26, 2021
@cfallin cfallin deleted the cranelift-backend-transition branch March 26, 2021 16:16
cfallin added a commit to cfallin/wasmtime that referenced this pull request Mar 26, 2021
As per bytecodealliance/rfcs#10, the first step in our transition to the
new Cranelift x86-64 backend by default in Wasmtime is to switch our
fuzzers and monitor for any breakage.

This PR moves all Wasmtime fuzz targets over to use the new backend. The
differential old-vs-new-backend target still works by dynamically
selecting both backends (the build feature just changes the default,
which is what all the other fuzz targets use).

We should watch for ~a week or so, I think, on oss-fuzz, to make sure
this sticks without issue before moving further.
cfallin added a commit to cfallin/oss-fuzz that referenced this pull request Mar 26, 2021
In bytecodealliance/rfcs#10, we have outlined a process by which we're
switching to a new compiler backend by default. The first step in this
process is to switch our fuzzing targets to use the new backend and wait
for any issues.

This PR adds the Cargo feature that enables the new backend in all
fuzzing targets.
inferno-chromium pushed a commit to google/oss-fuzz that referenced this pull request Mar 26, 2021
In bytecodealliance/rfcs#10, we have outlined a process by which we're
switching to a new compiler backend by default. The first step in this
process is to switch our fuzzing targets to use the new backend and wait
for any issues.

This PR adds the Cargo feature that enables the new backend in all
fuzzing targets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants