-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fast processor-native bitshift function #52828
base: master
Are you sure you want to change the base?
Conversation
In Julia, bitshifting a bitinteger x by more bits than is present in x causes x to be set to zero. Also, negative bitshifts are supported. This might semantically be more correct (and the former also matches LLVM's definition of bitshifts), but it does not correspond to either x86 or AArch64 behaviour. The result is that Julia's bitshifts are not optimised to a single instruction. In constrast, in the C language, bitshifting by more than the bitwidth is UB, which allows the compiler to assume it never happens, and optimise the shift to a single instruction. This commit adds the wrapping shift functions >>% >>>% and <<%, which overflows if the shift is too high or negative. The overflow behaviour is explicitly not stated, but the implemented behaviour matches the native behaviour of x86 and AArch64. This commit requires JuliaSyntax support, which will be implemented in a parallel PR to JuliaSyntax.
What's the test? |
@giordano I've used this: Code to test:function foo(x::Union{Int8, UInt8, Int16, UInt16}, y::Base.BitInteger)
Core.Intrinsics.lshr_int(Core.Intrinsics.zext_int(UInt32, x), (y % UInt32) & 0x1f) % typeof(x)
end
function foo(x::T, y::Base.BitInteger) where {T <: Union{Int32, UInt32, UInt64, Int64, UInt128, Int128}}
Core.Intrinsics.lshr_int(x, (y % UInt32) & (8*sizeof(T) - 1)) % T
end
function qux(x::Union{Int8, UInt8, Int16, UInt16}, y::Base.BitInteger)
Core.Intrinsics.shl_int(Core.Intrinsics.zext_int(UInt32, x), (y % UInt32) & 0x1f) % typeof(x)
end
function qux(x::T, y::Base.BitInteger) where {T <: Union{Int32, UInt32, UInt64, Int64, UInt128, Int128}}
Core.Intrinsics.shl_int(x, (y % UInt32) & (8*sizeof(T) - 1)) % T
end
function bar(x::Union{Int8, Int16}, y::Base.BitInteger)
Core.Intrinsics.ashr_int(Core.Intrinsics.sext_int(UInt32, x), (y % UInt32) & 0x1f) % typeof(x)
end
function bar(x::T, y::Base.BitInteger) where {T <: Union{Int32, Int64, Int128}}
Core.Intrinsics.ashr_int(x, (y % UInt32) & (8*sizeof(T) - 1)) % T
end
function bar(x::Union{UInt8, UInt16}, y::Base.BitInteger)
Core.Intrinsics.lshr_int(Core.Intrinsics.zext_int(UInt32, x), (y % UInt32) & 0x1f) % typeof(x)
end
function bar(x::T, y::Base.BitInteger) where {T <: Union{UInt32, UInt64, UInt128}}
Core.Intrinsics.lshr_int(x, (y % UInt32) & (8*sizeof(T) - 1)) % T
end
for T1 in Base.BitInteger64_types
for T2 in Base.BitInteger64_types
for f in [foo, bar, qux]
print(T1, " ", T2, " ", f)
io = IOBuffer()
code_native(io, f, (T1, T2); dump_module=false, debuginfo=:none, raw=true)
s = collect(eachline(IOBuffer(String(take!(io)))))
filter!(s) do i
ss = lstrip(i)
all(!startswith(ss, j) for j in [r"mov\s", r"pop\s", r"nop\s", r"ret\s?", r"push\s"])
end
for i in s
println(i)
end
end
end
end It prints the emitted instructions for each type, filtering for the trivial ones like moving, pushing returning. Then I go through the list and make sure every combination only emits a shift. |
With your unmodified code I get
With more cleaning up code
for T1 in Base.BitInteger64_types
for T2 in Base.BitInteger64_types
for f in [foo, bar, qux]
print(T1, " ", T2, " ", f)
io = IOBuffer()
code_native(io, f, (T1, T2); dump_module=false, debuginfo=:none, raw=true)
s = collect(eachline(IOBuffer(String(take!(io)))))
filter!(s) do i
ss = lstrip(i)
all(!startswith(ss, j) for j in [r"ret\s?", r"(mov|ldr|ldp|stp)\s"])
end
for i in s
println(i)
end
end
end
end I get this simplified output:
Note that there are some |
I'm in favor. It's very doable to implement these for yourself for any given argument type, but annoying to implement generically as you've done here, so having the guaranteed wrapping versions seems good. |
The docstrings here are pretty confusing to me — it's not immediately clear what "overflowing" means or when it'd happen in this context. In the analogy to Using the word overflowing as a jargony standin for "undefined behavior" seems not great. |
When you said "wrapping shift," I initially thought you meant My best understanding is that this is a bitshift with undefined behavior when attempting to shift The rest of my post is written assuming my above guess is correct. Although I think most of my arguments remain relevant even if not. "Overflowing bit shift" does not appear to be standard terminology. A Google search does not turn up any specific operation by that name. The search does find a few discussions on overflow in bit shifts (and the UB that follows), but in any case this is a bad description. Given my persistent fuzzyness on what this operator actually does, the docstring needs a significant change of terminology and explanation. If this exposes UB, it should explicitly mention "undefined behavior" and the conditions for invoking/avoiding it within the docstring. My initial reaction to The current implementation only supports |
The purpose of the functions is to provide bitshifting with zero overhead. In particular, I want the semantics to be:
I don't think "undefined behaviour" is a good term. People associate it with nasal demons and compilers doing terrible things like deleting whole functions, and your entire program being considered invalid such that anything goes. This is very much not the case with invalid bitshift values here. We guarantee the function returns in an orderly fashion, although the returned value is unspecified. I've changed the docstrings to be more precise, but I've avoided the term "undefined behaviour".
I'm not sure. I use it a lot in my own code, but I don't really have a sense of how common this is. If I'm atypical, it makes sense to preserve the
I disagree. This function is all about doing bit operations as efficiently as possible, by matching the CPUs native instructions. It makes no sense to provide a generic implementation, that would only mislead people into thinking there is an efficient implementation for their custom I won't die on that hill, but I really think that if you write so generic code you don't know for sure that your data is a |
I think we want this at least since Julia itself uses << and >> e.g. in base/hamt.jl so it could use it. The point is this is faster, right. Probably ok to export (but not to change the current definition). |
I understand all that — and I have wanted/written the functionality myself! — but neither the term "overflowing shift" nor the name I suppose there's the question of what "pure mathematical bitshifting" is, but in Julia we say it's equivalent to The fact that our processors happily overflow and do "modular arithmetic" up to a point is just quirky. And it's annoying to need to know that quirk to get good performance. But I don't think it's the same as |
With the new explanation, this seems (to me) like a clear case of UB. UB is used to assert that certain values will not appear in certain situations so that useful optimizations can be made. If those values actually occur, then UB permits the compiler to ignore the consequences. If it was on the hook to handle the consequences predictably, then optimization would be impossible and UB would be useless. UB is not something to be ashamed of, but it is definitely something to be made aware of and definitely the terminology "undefined behavior" should be used so that users can understand the consequences of abuse. I don't see any difference between the UB here and the UB exposed in Some languages (C, I think?) define the overflow of signed integer addition to be UB. This is useful because the compiler can always assume that the sum of two nonnegative values is nonnegative. Julia instead defines signed integer addition overflow to return the result of modular arithmetic. In practice, those other languages also give that same result (that's what the hardware does, after all), but the compiler can assume extra properties about that result thanks to the UB. Those properties might be wrong if the non-UB conditions are violated, so the compiler might make optimizations that result in "incorrect" results in UB situations. Any program written with the proposed From the stated criteria
I'm absolutely in favor of having generic fallbacks to ordinary shifts for |
my reasoning is more superficial than principled, but I agree with this conclusion. primarily because I don't like the aesthetics of
not to derail the discussion, but what do you mean by this? I have certainly seen performance improvements in my own code by replacing |
Thinking more about it, you're right. If we think about So yes, let's call them Although:
This is NOT what I propose. I don't want the compiler to do whatever in the presence of an invalid shift value. I want it to return an arbitrary value of the correct type (in practice, whatever the platform's native shift instruction does). In this particular case, I don't see any advantage of declaring invalid bitshift values to be UB, and these associations to the term "undefined behaviour" is exactly why I think we should not choose it. If there are good performance reasons to have something result in UB, so be it. But here, I just don't see that being the case. |
FWIW, since this will presumably use the LLVM freeze to prevent any UB from propagating, I don't think |
NOTE: someone else has offered a more nuanced view of
No compiler has ever actually attempted to summon nasal demons (or if they have tried, no one has ever reported one being successful). The compiler must ensure the result is correct when the non-UB conditions are satisfied. It is not required to fulfill any specific semantic in UB situations. It will almost-certainly decide against printing the complete works of William Shakespeare when UB because that is more work than it has to do. It will also not go out of its way to insert an error path for values that you promised it would never see because that's just more work. In all likelihood, the compiler will see that it gets the correct result for valid inputs via an unchecked shift and it will quietly ignore the fact that it doesn't know what will happen if it uses invalid values because UB says it doesn't have to care. Is it important that this be semantically guaranteed not to throw an error? Wouldn't it be convenient if the user were informed that they are breaking the promise they made and that the output of the program might be garbage? They won't actually get that notice, but they wouldn't be mad if they did. They'd try to go fix the mistake rather than try to silence the error. What use is a program that doesn't produce the correct result? EDIT: Thanks to the following poster for pointing out one aspect of UB that I consistently manage to neglect, which does draw a slight wedge between the desired semantics here and the unrestricted UB I've been advocating. Although I'm still not totally certain that UB is entirely unreasonable here, I can acknowledge the possible merits of requiring hardware-native behavior rather than compiler-level UB on invalid shifts. |
@mikmoore this isn't terribly unlike our discussion on software-checked integer division. It's true that LLVM's So what is this operation called? |
I'm a bit confused as to what this proposes? Because wrapping shift doesn't map to an LLVM instruction, is this then something that behaves directly like the LLVM shift/C shift where it's just UB?
|
I've re-titled this to hopefully better capture the intent here ("wrapping" confused me, too) — the goal as I see it is to have some |
So the thing is, currently this maps to the LLVM call that has UB (which we might not care about). LLVM also has the funnel shifts (which do quite cool things but I'm not sure if that's what we want) |
Just
|
I feel like this PR is headed in the wrong direction. I'd like for these operations to be well-defined and safe, but do what most CPUs already do, which is discard all but the last bits of the shift argument. Unfortunately, that means that it won't be possible to shift by 64 bits, but so be it, perhaps that's what these operations should do. |
I also think the original names of <<%(n::T, k::Integer) where {T<:Base.BitInteger} = n << ((k % UInt8) % UInt8(8*sizeof(T)))
>>%(n::T, k::Integer) where {T<:Base.BitInteger} = n >> ((k % UInt8) % UInt8(8*sizeof(T)))
>>>%(n::T, k::Integer) where {T<:Base.BitInteger} = n >>> ((k % UInt8) % UInt8(8*sizeof(T))) And now you can see where the modulus in the name comes from. If we had defined |
Unfortunately, that implementation doesn't map exactly onto x86 shift instructions, instead it compiles to an IIUC this is because of two annoying factors:
So, for 8-bit integers, if we modulo 8, an extra We can get around it by extending the 8-bit integer to a 32-bit integer, shifting modulo 32, then truncating back to 8 bits. That produces better code, but it's icky that we forcibly to 32-bit bitshifts on 8-bit integers. Is there a way to get around LLVM's limitation and just produce the raw |
Just wanting to make sure that things stay in perspective: Can someone provide a vague remark on the potential performance improvement this may render to a useful calculation (i.e., not a nanobenchmark that simply does a bunch of shifts without motivation)? I don't need a literal benchmark, just hoping someone can at least assert a vague figure. My understanding is that the best "safe" version (modulo-shift) results in 2 instructions instead of the more-desirable 1 (but improved from ~4 for known-sign shifts or ~11 for unknown shifts), but is this really a bottleneck in practice? AND instructions are among the cheapest available on a processor. Are there useful situations where the computational density of non-constant shifts is so high as to make an extra AND per shift a meaningful performance loss? More than 10-20%? If there is real performance that we're missing and wanting here, then we probably aren't alone and perhaps this warrants an upstream issue requesting that LLVM expose the desired semantics? At some point, that may be easier than trying to hack something. P.S.: With the semantic that |
Btw, most constant bitshifts inside of functions do get compiled to a single instruction. It's the dynamic ones that need some coaxing to not have the guards around them. |
I'm honestly not too concerned about the extra
So my inclination is to define the operations consistently the way I did and let the |
Rust does call this a I like the The binary GCD algorithm from #30674 makes for a decent testbed. Definitions@noinline function gcd(<<, >>, a::T, b::T) where {T}
@noinline throw1(a, b) = throw(OverflowError("gcd($a, $b) overflows"))
a == 0 && return abs(b)
b == 0 && return abs(a)
za = trailing_zeros(a)
zb = trailing_zeros(b)
k = min(za, zb)
u = unsigned(abs(a >> za))
v = unsigned(abs(b >> zb))
while u != v
if u > v
u, v = v, u
end
v -= u
v >>= trailing_zeros(v)
end
r = u << k
# T(r) would throw InexactError; we want OverflowError instead
r > typemax(T) && throw1(a, b)
r % T
end
x <<ᵐ n = x << (n & (sizeof(x)*8-1))
x >>ᵐ n = x >> (n & (sizeof(x)*8-1))
using UnsafeAssume
x <<ᵃ n = (unsafe_assume_condition(n >= 0); unsafe_assume_condition(n < sizeof(x)*8); x << n)
x >>ᵃ n = (unsafe_assume_condition(n >= 0); unsafe_assume_condition(n < sizeof(x)*8); x >> n) For Int64 they generate the exact same native code on my ARM M1. But for Int16 and Int8 the assume versions perform better by skipping the superfluous mask: julia> A = rand(Int64, 100_000);
julia> @btime (s = 0; @inbounds for n = 1:length($A)-1 s += gcd(<<, >>, $A[n], $A[n+1]) end; s);
10.641 ms (0 allocations: 0 bytes)
julia> @btime (s = 0; @inbounds for n = 1:length($A)-1 s += gcd(<<ᵐ, >>ᵐ, $A[n], $A[n+1]) end; s);
9.244 ms (0 allocations: 0 bytes)
julia> @btime (s = 0; @inbounds for n = 1:length($A)-1 s += gcd(<<ᵃ, >>ᵃ, $A[n], $A[n+1]) end; s);
9.244 ms (0 allocations: 0 bytes)
julia> A = rand(Int16, 100_000);
julia> @btime (s = 0; @inbounds for n = 1:length($A)-1 s += gcd(<<, >>, $A[n], $A[n+1]) end; s);
3.166 ms (0 allocations: 0 bytes)
julia> @btime (s = 0; @inbounds for n = 1:length($A)-1 s += gcd(<<ᵐ, >>ᵐ, $A[n], $A[n+1]) end; s);
3.155 ms (0 allocations: 0 bytes)
julia> @btime (s = 0; @inbounds for n = 1:length($A)-1 s += gcd(<<ᵃ, >>ᵃ, $A[n], $A[n+1]) end; s);
2.899 ms (0 allocations: 0 bytes)
julia> A = rand(Int8, 100_000);
julia> @btime (s = 0; @inbounds for n = 1:length($A)-1 s += gcd(<<, >>, $A[n], $A[n+1]) end; s);
1.498 ms (0 allocations: 0 bytes)
julia> @btime (s = 0; @inbounds for n = 1:length($A)-1 s += gcd(<<ᵐ, >>ᵐ, $A[n], $A[n+1]) end; s);
1.484 ms (0 allocations: 0 bytes)
julia> @btime (s = 0; @inbounds for n = 1:length($A)-1 s += gcd(<<ᵃ, >>ᵃ, $A[n], $A[n+1]) end; s);
1.396 ms (0 allocations: 0 bytes) |
One more benchmark: Same as Matt's above, but with two differences:
function f(f1::F1, f2::F2, A) where {F1, F2}
s = 0
@inbounds for n in 1:length(A)-1
s += gcd(f1, f2, A[n], A[n+1])
end
s
end
function ⪢(a, n)
if sizeof(a) == 4 || sizeof(a) == 8
return a >> (n & (8*sizeof(a)-1))
else
a2 = Core.Intrinsics.sext_int(UInt32, a)
(a2 >> (n & 31)) % typeof(a)
end
end
function ⪡(a, n)
if sizeof(a) == 4 || sizeof(a) == 8
return a << (n & (8*sizeof(a)-1))
else
a2 = Core.Intrinsics.zext_int(UInt32, a)
(a2 << (n & 31)) % typeof(a)
end
end
So, somehow even faster than the unsafe assume one, despite it being safe, and about 12% faster than the default bitshifts. |
Because everything is terrible, I see the opposite behavior on an M1: preamblejulia> @inline function gcd(<<, >>, a::T, b::T) where {T}
@noinline throw1(a, b) = throw(OverflowError("gcd($a, $b) overflows"))
a == 0 && return abs(b)
b == 0 && return abs(a)
za = trailing_zeros(a)
zb = trailing_zeros(b)
k = min(za, zb)
u = unsigned(abs(a >> za))
v = unsigned(abs(b >> zb))
while u != v
if u > v
u, v = v, u
end
v -= u
v >>= trailing_zeros(v)
end
r = u << k
# T(r) would throw InexactError; we want OverflowError instead
r > typemax(T) && throw1(a, b)
r % T
end
x <<ᵐ n = x << (n & (sizeof(x)*8-1))
x >>ᵐ n = x >> (n & (sizeof(x)*8-1))
using UnsafeAssume
x <<ᵃ n = (unsafe_assume_condition(n >= 0); unsafe_assume_condition(n < sizeof(x)*8); x << n)
x >>ᵃ n = (unsafe_assume_condition(n >= 0); unsafe_assume_condition(n < sizeof(x)*8); x >> n)
>>ᵃ (generic function with 1 method)
julia> function f(f1::F1, f2::F2, A) where {F1, F2}
s = 0
@inbounds for n in 1:length(A)-1
s += gcd(f1, f2, A[n], A[n+1])
end
s
end
f (generic function with 1 method)
julia> function ⪢(a, n)
if sizeof(a) == 4 || sizeof(a) == 8
return a >> (n & (8*sizeof(a)-1))
else
a2 = Core.Intrinsics.sext_int(UInt32, a)
(a2 >> (n & 31)) % typeof(a)
end
end
⪢ (generic function with 1 method)
julia> function ⪡(a, n)
if sizeof(a) == 4 || sizeof(a) == 8
return a << (n & (8*sizeof(a)-1))
else
a2 = Core.Intrinsics.zext_int(UInt32, a)
(a2 << (n & 31)) % typeof(a)
end
end
⪡ (generic function with 1 method)
julia> using BenchmarkTools
julia> A = rand(Int32, 100_000);
julia> @btime f(<<, >>, A);
5.133 ms (1 allocation: 16 bytes)
julia> @btime f(<<ᵐ, >>ᵐ, A);
4.671 ms (1 allocation: 16 bytes)
julia> @btime f(<<ᵃ, >>ᵃ, A);
4.672 ms (1 allocation: 16 bytes)
julia> @btime f(⪡, ⪢, A);
4.672 ms (1 allocation: 16 bytes) julia> A = rand(Int16, 100_000);
julia> @btime f(<<, >>, A);
3.021 ms (1 allocation: 16 bytes)
julia> @btime f(<<ᵐ, >>ᵐ, A);
3.092 ms (1 allocation: 16 bytes)
julia> @btime f(<<ᵃ, >>ᵃ, A);
2.771 ms (1 allocation: 16 bytes)
julia> @btime f(⪡, ⪢, A);
3.050 ms (1 allocation: 16 bytes)
julia> A = rand(Int8, 100_000);
julia> @btime f(<<, >>, A);
1.363 ms (1 allocation: 16 bytes)
julia> @btime f(<<ᵐ, >>ᵐ, A);
1.343 ms (1 allocation: 16 bytes)
julia> @btime f(<<ᵃ, >>ᵃ, A);
1.257 ms (1 allocation: 16 bytes)
julia> @btime f(⪡, ⪢, A);
1.382 ms (1 allocation: 16 bytes) It's also worth checking the SIMD-ability of these operators. The GCD example doesn't SIMD. And of course
And M1 does have |
Good discussion. I just want to chime in to agree that we should avoid UB, and that this should not be considered an "unsafe" function. I agree with Matt on the |
Ok, so what should we call these then? They're not unsafe (at least we don't want them to be), but they're also not modular. So what are they then? |
Seems closest to |
Crystal seems to call them |
So in triage we were discussing and came to the conclusion that the behaviour should follow what LLVM defines
but with a freeze operation after, which should eliminate the UB on the operation and move to it only returning an unspecified value |
That would seem to imply a name of |
So this is what this might look like for one of the types (though this should be implemented as an intrinsic) function shl(x::Int64, n::Int64)
Base.llvmcall(
""" %3 = shl i64 %0, %1
%4 = freeze i64 %3
ret i64 %4""", Int64, Tuple{Int64, Int64},x,n)
end
function shr(x::Int64, n::Int64)
Base.llvmcall(
""" %3 = ashr i64 %0, %1
%4 = freeze i64 %3
ret i64 %4""", Int64, Tuple{Int64, Int64}, x, n)
end
|
In that case, fastmath is indeed a good name, as it should be noted that that version is not eligible for constant folding (unlike this PR), as that computation is not consistent (it is not pure) |
Although every instance of "fast math" in Julia up to now (and in many languages) has referred exclusively to IEEE754 floating point values, so there's a bit of a name collision there. The same goes for the Regardless of name and code location, I absolutely definitely would not want this to be affiliated with |
Spitballing, I wonder if there's a new idiom to build here... Imagine the following ways to specify the "flavors" of an operation: safe{+}(a, b)
unsafe{<<}(x, n)
fast{sin}(x) |
[I like that we are thinking of a solution, but maybe it should live in a package. Is it strictly needed for Julia itself? We want to have a way, and document it at least, by pointing to a package, or safe/optimal solution/type, such as UInt6, see below.] At JuliaSyntax issue:
First, that's wrong, it can't overflow, but if got me thinking what checks are needed, and a possible solution:
Could we not leverage the type system, with a bit-shift type, 1 >> int_shift(1) could get rid of all, i.e. the EDIT: I see we would need UInt3 actually too, and all up to UInt6 for Int128... Zig has such types (and few other languages), maybe (also) for this same reason,
|
I know I'm late to the party here, but I wanted to chime in briefly and say that my gut reaction is that we should absoutely not add new operator syntax for this. Special syntax is an expense which is subtly imposed on all Julia users. If Special syntax needs to meet a high bar of being fairly useful to a broad range of users. Or being extremely useful to a narrower set of users. I don't think this operation meets that bar. The rest of this seems great - we should absolutely teach the compiler what it needs to know about this operation, and make it possible for people who need it to use it. Let's just use a normal function name for this (maybe with |
I just wanted to point out that julia/stdlib/Random/src/XoshiroSimd.jl Lines 25 to 31 in dc34428
|
In Julia, bitshifting a
B
-bit integerx
by B bits causes the result to bezero(x)
. Also, negative bitshifts are supported, e.g.x << -1
. This might semantically be sensible, but it does not correspond to either x86 or AArch64 behaviour. The result is that Julia's bitshifts are not optimised to a single instruction, which makes them unnecessarily slow in some performance sensitive contexts.In constrast, in the C language, bitshifting by more than the bitwidth is undefined behaviour, which allows the compiler to assume it never happens, and optimise the shift to a single assembly instruction.
The difference of one CPU instruction vs a handful may seem trivial, but in performance sensitive code this can really matter, e.g. #30674 and attractivechaos/plb2#48.
This commit adds the functions
unsafe_ashr
,unsafe_lshr
andunsafe_shl
. The goal of these functions are to compile to single shift instructions in order to be as fast as possible.Decisions
1. Semantics when the shift is too high
What happens in the CPU when you shift
x >> n
, wherex
is aB
bit integer ann >= B
? Let's call these "overflowing shifts". As far as I can tell, on x86, AArch and RISCV - so, basically all instruction sets that matter - only the lower 5 bits ofn
are used for 8-32 bit integers, and only the lower 6 bits for 64-bit integers.Note that this implies that when
x
andn
are 8-bit integers, maskingn
by0x07
does NOT correspond to the native shift instruction - it should be masked by0x1f
. I'm not 100% certain about that - all the documentation I can find simply assumes 32-bit operands.So, what options do we have?
a) Native behaviour
Here, we just do what the CPU does when it encounters overflowing bitshifts. In particular, we shift with
n % max(32, 8*sizeof(x))
. That's a weird rule, but really, IMO, no more weird than how signed overflow wraps from e.g. 127 to -128. This has maximal performance, but the semantics are weirdly complex and unpedagogical on overflow.If we take this approach, we might want to keep the documented behaviour simply by formally documenting only that we promise the return type is correct, but we make no promises about the returned value.
c) Shift with
n % (8*sizeof(x))
On x86, this produces optimal code for 64 and 32 bits, and is a little slower on 8 and 16 bit integers (the performance is somewhere between the native behaviour and the current shifting behaviour). The advantage here is that it's semantically simpler than the solution above.
2. What should the name be for e.g. the equivalent of
>>
?>>%
Pro: It's short, it looks like
>>
, and it's an infix operator, which makes it much more readable in complex bitshifting codeCon: It takes up valuable "ASCII real estate value", and the proposed semantics of
>>%
vs>>
is different from+%
vs+
as proposed by Keno, which might be confusingunsafe_ashr
The
unsafe_
prefix is a nice way to warn that the resulting value may be arbitrary (if we go with that behaviour), similarly tounsafe_trunc
. However, it may give misleading annotations of e.g. memory unsafety that otherunsafe_
functions can cause. It's also long and annoying to use in bit hacking compared with infix operators.3. Should we have a fallback definition
(::Integer, ::Integer)
Pro: It makes generic code possible with these operations, and it makes it less annoying for users who don't have to define their own implementations
Con: A processor-native bitshift only really makes sense for BitIntegers, and adding a generic function is semantically misleading
My recommendations:
Integer
definition, since the purpose of this function is native bitshifting which doesnt exist for generic objects, only bit integers.Closes #50225