-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: intrinsify math/bits.RotateLeft{32,64} on all architectures #31265
Comments
cc @benshi001 for arm |
Kindly paging @bmkessler who might also be interested in this issue. |
Change https://golang.org/cl/182359 mentions this issue: |
Change https://golang.org/cl/188697 mentions this issue: |
Change https://golang.org/cl/189277 mentions this issue: |
This CL optimizes math.bits.RotateLeft32 to inline "MOVW Rx@>Ry, Rd" on ARM. The benchmark results of math/bits show some improvements. name old time/op new time/op delta RotateLeft-4 9.42ns ± 0% 6.91ns ± 0% -26.66% (p=0.000 n=40+33) RotateLeft8-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+31) RotateLeft16-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+32) RotateLeft32-4 8.16ns ± 0% 7.54ns ± 0% -7.68% (p=0.000 n=40+40) RotateLeft64-4 15.7ns ± 0% 15.7ns ± 0% ~ (all equal) updates #31265 Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712 Reviewed-on: https://go-review.googlesource.com/c/go/+/188697 Run-TryBot: Ben Shi <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Cherry Zhang <[email protected]>
wasm has 32-bit versions of all integer operations. This change lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits call. Benchmarking on amd64 under node.js this is ~25% faster. node v10.15.3/amd64 name old time/op new time/op delta RotateLeft 8.37ns ± 1% 8.28ns ± 0% -1.05% (p=0.029 n=4+4) RotateLeft8 11.9ns ± 1% 11.8ns ± 0% ~ (p=0.167 n=5+5) RotateLeft16 11.8ns ± 0% 11.8ns ± 0% ~ (all equal) RotateLeft32 11.9ns ± 1% 8.7ns ± 0% -26.32% (p=0.008 n=5+5) RotateLeft64 8.31ns ± 1% 8.43ns ± 2% ~ (p=0.063 n=5+5) Updates #31265 Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46 Reviewed-on: https://go-review.googlesource.com/c/go/+/182359 Run-TryBot: Brian Kessler <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Richard Musiol <[email protected]>
This CL optimizes math.bits.RotateLeft32 to inline "MOVW Rx@>Ry, Rd" on ARM. The benchmark results of math/bits show some improvements. name old time/op new time/op delta RotateLeft-4 9.42ns ± 0% 6.91ns ± 0% -26.66% (p=0.000 n=40+33) RotateLeft8-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+31) RotateLeft16-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+32) RotateLeft32-4 8.16ns ± 0% 7.54ns ± 0% -7.68% (p=0.000 n=40+40) RotateLeft64-4 15.7ns ± 0% 15.7ns ± 0% ~ (all equal) updates golang#31265 Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712 Reviewed-on: https://go-review.googlesource.com/c/go/+/188697 Run-TryBot: Ben Shi <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Cherry Zhang <[email protected]>
wasm has 32-bit versions of all integer operations. This change lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits call. Benchmarking on amd64 under node.js this is ~25% faster. node v10.15.3/amd64 name old time/op new time/op delta RotateLeft 8.37ns ± 1% 8.28ns ± 0% -1.05% (p=0.029 n=4+4) RotateLeft8 11.9ns ± 1% 11.8ns ± 0% ~ (p=0.167 n=5+5) RotateLeft16 11.8ns ± 0% 11.8ns ± 0% ~ (all equal) RotateLeft32 11.9ns ± 1% 8.7ns ± 0% -26.32% (p=0.008 n=5+5) RotateLeft64 8.31ns ± 1% 8.43ns ± 2% ~ (p=0.063 n=5+5) Updates golang#31265 Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46 Reviewed-on: https://go-review.googlesource.com/c/go/+/182359 Run-TryBot: Brian Kessler <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Richard Musiol <[email protected]>
This CL optimizes math.bits.RotateLeft32 to inline "MOVW Rx@>Ry, Rd" on ARM. The benchmark results of math/bits show some improvements. name old time/op new time/op delta RotateLeft-4 9.42ns ± 0% 6.91ns ± 0% -26.66% (p=0.000 n=40+33) RotateLeft8-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+31) RotateLeft16-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+32) RotateLeft32-4 8.16ns ± 0% 7.54ns ± 0% -7.68% (p=0.000 n=40+40) RotateLeft64-4 15.7ns ± 0% 15.7ns ± 0% ~ (all equal) updates golang#31265 Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712 Reviewed-on: https://go-review.googlesource.com/c/go/+/188697 Run-TryBot: Ben Shi <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Cherry Zhang <[email protected]>
wasm has 32-bit versions of all integer operations. This change lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits call. Benchmarking on amd64 under node.js this is ~25% faster. node v10.15.3/amd64 name old time/op new time/op delta RotateLeft 8.37ns ± 1% 8.28ns ± 0% -1.05% (p=0.029 n=4+4) RotateLeft8 11.9ns ± 1% 11.8ns ± 0% ~ (p=0.167 n=5+5) RotateLeft16 11.8ns ± 0% 11.8ns ± 0% ~ (all equal) RotateLeft32 11.9ns ± 1% 8.7ns ± 0% -26.32% (p=0.008 n=5+5) RotateLeft64 8.31ns ± 1% 8.43ns ± 2% ~ (p=0.063 n=5+5) Updates golang#31265 Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46 Reviewed-on: https://go-review.googlesource.com/c/go/+/182359 Run-TryBot: Brian Kessler <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Richard Musiol <[email protected]>
Alright, and doing a checkup here on the statuses of the various GOARCHes
From the table above, we are currently lacking for the GOARCH variants of mips* and ppc64le. |
CL 163760 should cover both ppc64 and ppc64le, right? On MIPS, the rotate instruction is rather new. We still support older machines. I guess we could do something based on GOMIPS(64) value, but we need to introduce the values first. Two shifts + an OR doesn't sounds too bad. |
CL 163760 handles both ppc64le and ppc64. When ppc64x is specified, that
means both.
…On Thu, Oct 17, 2019 at 9:07 AM cherrymui ***@***.***> wrote:
CL 163760 should cover both ppc64 and ppc64le, right?
On MIPS, the rotate instruction is rather new. We still support older
machines. I guess we could do something based on GOMIPS(64) value, but we
need to introduce the values first. Two shifts + an OR doesn't sounds too
bad.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#31265?email_source=notifications&email_token=ACH7BDATPMDS3OGUPJ25KW3QPBWS3A5CNFSM4HDWR4J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBQGSYY#issuecomment-543189347>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACH7BDCQYEPWOO4KKTS3ZF3QPBWS3ANCNFSM4HDWR4JQ>
.
|
The minimal requirement of MIPS64X is MIPS III, which is introduced in 1991 according to Wikipedia. I'm thinking update Go minimal requirement of MIPS64X to MIPS64 R2 (not MIPS II) as MIPS32X do. And we can adding compile flag
r2 = release 2 For the future MIPS64 r6, which is whole new instruction set. We already had |
#21536 (comment) reminded me about this. Currently the intrinsic is only implemented on amd64, arm64, ppc64 and s390x. We should enable it everywhere. It will need tests in
tests/codegen
. Could be a good beginner compiler issue to tackle.CLs so far:
The text was updated successfully, but these errors were encountered: