RFC: Transition Cranelift and Wasmtime to new x86-64 backend by default. #10

cfallin · 2021-03-12T00:57:30Z

This RFC proposes to switch the default Cranelift compiler backend for the cranelift-codegen crate (i.e., use of Cranelift as a library) and for Wasmtime from the old x86-64 backend to the new x86-64 backend. Details on how we should decide, what steps we should take to build confidence, ensure backwards compatibility, and gradually switch over are proposed.

accepted/cranelift-backend-transition.md

bnjbvr · 2021-03-15T15:57:34Z

accepted/cranelift-backend-transition.md

+[open-questions]: #open-questions
+
+1. How much compile-time and runtime performance degradation should we
+   accept, if any, before moving forward with this transition?


In order to help guiding the answer to this question, how about another question: from the performance analysis that we've made on the new backend, it would be interesting to get a rough idea of the speedups we can reach, and with which associated cost? What I mean is, if we think we can get e.g. a 10% speedup easily (famous last words, but imagine in e.g. one or two months of work), then it makes it easier to accept a degradation in the short-term.

cfallin · 2021-03-18T00:07:20Z

I spent some time today running Sightglass and looking at the results. The following numbers are from 184a039 in my new-backend branch vs. current main, which is 5fecdfa. All time measurements are on Linux/x86-64 on a Ryzen 3900X @ 3.8GHz, with 3 iterations, arithmetic mean for times and geometric mean across speedups:

Compilation time

|--------------------|--------------------|--------------------|--------|
|       blake3-scalar|       68972585.0 ns|      146270137.0 ns|  0.472x|
|         blake3-simd|       17564151.7 ns|       32353641.7 ns|  0.543x|
|                noop|        9321450.7 ns|       17151835.7 ns|  0.543x|
|      pulldown-cmark|      144998015.3 ns|      300286274.7 ns|  0.483x|
|  shootout-ackermann|       42469345.0 ns|       49661814.3 ns|  0.855x|
|     shootout-base64|       41245124.3 ns|       44928123.3 ns|  0.918x|
|      shootout-ctype|       39737061.7 ns|       43569184.7 ns|  0.912x|
|    shootout-ed25519|       60436223.0 ns|      179996537.0 ns|  0.336x|
|       shootout-fib2|       39346114.7 ns|       40560157.7 ns|  0.970x|
|      shootout-gimli|        3267325.7 ns|        5164884.3 ns|  0.633x|
|   shootout-heapsort|       10727494.3 ns|       16877485.3 ns|  0.636x|
|     shootout-keccak|       11902052.0 ns|       23934554.3 ns|  0.497x|
|     shootout-matrix|       41642608.7 ns|       43212095.3 ns|  0.964x|
|    shootout-memmove|       10944073.7 ns|       15006035.0 ns|  0.729x|
|    shootout-minicsv|        5230805.0 ns|        8374252.0 ns|  0.625x|
| shootout-nestedloop|       39301089.0 ns|       41444058.7 ns|  0.948x|
|     shootout-random|       37677048.0 ns|       39413521.7 ns|  0.956x|
|  shootout-ratelimit|       42176894.7 ns|       42664704.3 ns|  0.989x|
|    shootout-seqhash|       10257756.3 ns|       13729952.7 ns|  0.747x|
|      shootout-sieve|       38677786.7 ns|       41630312.3 ns|  0.929x|
|     shootout-switch|     1246419944.3 ns|       45154181.7 ns| 27.604x|
|  shootout-xblabla20|       10846952.7 ns|       15154782.7 ns|  0.716x|
|  shootout-xchacha20|       10538490.7 ns|       13492446.7 ns|  0.781x|
|--------------------|--------------------|--------------------|--------|
Geomean: 0.828489

Execution time

|--------------------|--------------------|--------------------|--------|
|       blake3-scalar|         124991.7 ns|         161477.0 ns|  0.774x|
|         blake3-simd|         191938.3 ns|         268173.7 ns|  0.716x|
|                noop|             92.3 ns|             69.0 ns|  1.338x|
|      pulldown-cmark|        2915551.0 ns|        2601125.3 ns|  1.121x|
|  shootout-ackermann|            125.7 ns|            142.3 ns|  0.883x|
|     shootout-base64|      120473405.7 ns|      118650169.7 ns|  1.015x|
|      shootout-ctype|      327814007.0 ns|      288553357.7 ns|  1.136x|
|    shootout-ed25519|     3285211842.3 ns|     3355830768.3 ns|  0.979x|
|       shootout-fib2|     1288032348.7 ns|     1121754822.3 ns|  1.148x|
|      shootout-gimli|        3953612.7 ns|        3842533.3 ns|  1.029x|
|   shootout-heapsort|      941783516.7 ns|      905448764.0 ns|  1.040x|
|     shootout-keccak|       11200864.7 ns|       11193635.0 ns|  1.001x|
|     shootout-matrix|      138431299.7 ns|       94843781.7 ns|  1.460x|
|    shootout-memmove|      478948004.0 ns|      312724823.7 ns|  1.532x|
|    shootout-minicsv|      719886639.3 ns|      553181699.7 ns|  1.301x|
| shootout-nestedloop|        9017709.0 ns|        7310325.0 ns|  1.234x|
|     shootout-random|      141097573.0 ns|      141046267.3 ns|  1.000x|
|  shootout-ratelimit|       15340715.7 ns|       14238039.7 ns|  1.077x|
|    shootout-seqhash|     3305572758.3 ns|     3652199454.7 ns|  0.905x|
|      shootout-sieve|      339462468.7 ns|      432978618.7 ns|  0.784x|
|     shootout-switch|       45572014.0 ns|       36769437.7 ns|  1.239x|
|  shootout-xblabla20|        1620894.0 ns|        2381350.3 ns|  0.681x|
|  shootout-xchacha20|        3260006.0 ns|        2719482.7 ns|  1.199x|
|--------------------|--------------------|--------------------|--------|
Geomean: 1.046229

Thoughts

This is somewhat surprising to me; earlier measurements showed different results. For example, on Lucet's benchmarks when I last measured: on Dec 1, I recorded a 20% geomean perf improvement on that suite. I've generally measured compile time to improve relative to the old backend, e.g. about 1.5-2x last summer, though I haven't done a measurement recently.

So, something seems to have regressed; all the more motivation for continuous benchmarking. I haven't tried to bisect the shift.

That said, we still see a modest gain in execution speed on most benchmarks -- geomean of 4.6% better. A compilation time of 18% slower is a little more worrying -- a quick look with perf shows a bunch of time spent in regalloc.rs. I'm currently looking at options for regalloc improvements and am hoping to have something in the next few weeks.

I would propose that we still move forward with the transition, but this increases the urgency with which I'm working on this!

cfallin · 2021-03-25T19:15:17Z

jlb6740 · 2021-03-26T02:15:52Z

I think the RFC looks great. Considering no alarms of missing features so far from those who have made the switch, 5 can be done after the switch before the removal of the legacy backend.

cfallin · 2021-03-26T02:42:17Z

Entering Final Call Period

https://github.com/bytecodealliance/rfcs/blob/main/accepted/rfc-process.md#making-a-decision-merge-or-close

Once any stakeholder from a different group has signed off, the RFC will move into a 10 calendar day final comment period (FCP), long enough to ensure that other stakeholders have at least a full business week to respond.

The FCP will end on Mon Apr 5.

uweigand · 2021-03-26T09:44:15Z

I agree as well.

cfallin · 2021-03-26T16:16:31Z

Thanks all! As we actually have a signoff from every group of stakeholders, it looks like we can close the FCP early and go ahead with the merge:

https://github.com/bytecodealliance/rfcs/blob/main/accepted/rfc-process.md#making-a-decision-merge-or-close

Finally, the RFC is automatically merged/close if either:

The FCP elapses without any objections.

A stakeholder from each group has signed off, short-cutting the waiting period.

As per bytecodealliance/rfcs#10, the first step in our transition to the new Cranelift x86-64 backend by default in Wasmtime is to switch our fuzzers and monitor for any breakage. This PR moves all Wasmtime fuzz targets over to use the new backend. The differential old-vs-new-backend target still works by dynamically selecting both backends (the build feature just changes the default, which is what all the other fuzz targets use). We should watch for ~a week or so, I think, on oss-fuzz, to make sure this sticks without issue before moving further.

In bytecodealliance/rfcs#10, we have outlined a process by which we're switching to a new compiler backend by default. The first step in this process is to switch our fuzzing targets to use the new backend and wait for any issues. This PR adds the Cargo feature that enables the new backend in all fuzzing targets.

cfallin added 3 commits March 11, 2021 16:50

Cranelift backend transition RFC, first draft

17d2c8e

Formatting.

9aad6fc

Fix list formatting

61cda73

cfallin mentioned this pull request Mar 12, 2021

Switch default to new x86_64 backend. bytecodealliance/wasmtime#2718

Merged

bjorn3 reviewed Mar 12, 2021

View reviewed changes

accepted/cranelift-backend-transition.md Outdated Show resolved Hide resolved

bjorn3 reviewed Mar 12, 2021

View reviewed changes

accepted/cranelift-backend-transition.md Outdated Show resolved Hide resolved

bjorn3 reviewed Mar 12, 2021

View reviewed changes

accepted/cranelift-backend-transition.md Show resolved Hide resolved

bjorn3 approved these changes Mar 12, 2021

View reviewed changes

joshtriplett reviewed Mar 12, 2021

View reviewed changes

accepted/cranelift-backend-transition.md Outdated Show resolved Hide resolved

Address a few comments and add an alternative to "alternatives" section.

a58848f

bnjbvr reviewed Mar 15, 2021

View reviewed changes

cfallin added disposition-merge motion-to-finalize project-cranelift project-wasmtime labels Mar 25, 2021

jedisct1 approved these changes Mar 25, 2021

View reviewed changes

pchickey approved these changes Mar 25, 2021

View reviewed changes

abrown approved these changes Mar 26, 2021

View reviewed changes

jlb6740 approved these changes Mar 26, 2021

View reviewed changes

cfallin added the final-comment-period label Mar 26, 2021

bnjbvr approved these changes Mar 26, 2021

View reviewed changes

cfallin merged commit 03dff33 into bytecodealliance:main Mar 26, 2021

cfallin deleted the cranelift-backend-transition branch March 26, 2021 16:16

cfallin mentioned this pull request Mar 26, 2021

Switch fuzzers to use new Cranelift backend. bytecodealliance/wasmtime#2779

Closed

cfallin mentioned this pull request Mar 26, 2021

Update wasmtime config to use new x86-64 backend. google/oss-fuzz#5518

Merged

cfallin mentioned this pull request Apr 5, 2021

RFC: Transitioning to the new backend by default bytecodealliance/wasmtime#1936

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Transition Cranelift and Wasmtime to new x86-64 backend by default. #10

RFC: Transition Cranelift and Wasmtime to new x86-64 backend by default. #10

cfallin commented Mar 12, 2021 •

edited

Loading

bnjbvr Mar 15, 2021

cfallin commented Mar 18, 2021

cfallin commented Mar 25, 2021 •

edited by tschneidereit

Loading

jlb6740 commented Mar 26, 2021

cfallin commented Mar 26, 2021

uweigand commented Mar 26, 2021

cfallin commented Mar 26, 2021

RFC: Transition Cranelift and Wasmtime to new x86-64 backend by default. #10

RFC: Transition Cranelift and Wasmtime to new x86-64 backend by default. #10

Conversation

cfallin commented Mar 12, 2021 • edited Loading

bnjbvr Mar 15, 2021

Choose a reason for hiding this comment

cfallin commented Mar 18, 2021

Compilation time

Execution time

Thoughts

cfallin commented Mar 25, 2021 • edited by tschneidereit Loading

Motion to finalize with a disposition to merge

Stakeholders sign-off

Fastly

Intel

IBM

Mozilla

Other Stakeholders

jlb6740 commented Mar 26, 2021

cfallin commented Mar 26, 2021

Entering Final Call Period

uweigand commented Mar 26, 2021

cfallin commented Mar 26, 2021

cfallin commented Mar 12, 2021 •

edited

Loading

cfallin commented Mar 25, 2021 •

edited by tschneidereit

Loading