-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify Intel intrinsics against upstream definitions #251
Conversation
I’ll take a look at this tomorrow
…On Fri 22. Dec 2017 at 21:58, Alex Crichton ***@***.***> wrote:
This commit adds a new crate for testing that the intrinsics listed in this
crate do indeed match the upstream definition of each intrinsic. A
pre-downloaded XML description of all Intel intrinsics is checked in which
is
then parsed in the stdsimd-verify crate to verify that everything we write
down is matched against the upstream definitions.
Currently the checks are pretty loose to get this compiling but a few
intrinsics
were fixed as a result of this. For example:
- _mm256_extract_epi8 - AVX2 intrinsic erroneously listed under AVX
- _mm256_extract_epi16 - AVX2 intrinsic erroneously listed under AVX
- _mm256_extract_epi32 - AVX2 intrinsic erroneously listed under AVX
- _mm256_extract_epi64 - AVX2 intrinsic erroneously listed under AVX
- _mm_tzcnt_32 - erroneously had u32 in the name
- _mm_tzcnt_64 - erroneously had u64 in the name
- _mm_cvtsi64_si128 - erroneously available on 32-bit platforms
- _mm_cvtsi64x_si128 - erroneously available on 32-bit platforms
- _mm_cvtsi128_si64 - erroneously available on 32-bit platforms
- _mm_cvtsi128_si64x - erroneously available on 32-bit platforms
- _mm_extract_epi64 - erroneously available on 32-bit platforms
- _mm_insert_epi64 - erroneously available on 32-bit platforms
- _mm256_extract_epi16 - erroneously returned i32 instead of i16
- _mm256_extract_epi8 - erroneously returned i32 instead of i8
- _mm_shuffle_ps - the mask argument was erroneously i32 instead of u32
- _popcnt32 - the signededness of the argument and return were flipped
- _popcnt64 - the signededness of the argument was flipped and the
argument
was too large bit-wise
- _mm_tzcnt_32 - the return value's sign was flipped
- _mm_tzcnt_64 - the return value's sign was flipped
- A good number of intrinsics used imm8: i8 or imm8: u8 instead of imm8:
i32 which Intel was using. (we were also internally inconsistent)
- A number of intrinsics working with __m64 were instead working with
i64/u64,
so they're now corrected to operate with the vector types instead.
Currently the verifications performed are:
- Each name in Rust is defined in the XML document
- The arguments/return values all agree.
- The CPUID features listed in the XML document are all enabled in
Rust as well.
The type matching right now is pretty loose and has a lot of questionable
changes. Future commits will touch these up to be more strict and require
closer
adherence with Intel's own types. Otherwise types like i32x8 (or any
integers
with 256 bits) all match up to __m256i right now, althoguh this may want
to
change in the future.
Finally we're also not testing the instruction listed in the XML right now.
There's a huge number of discrepancies between the instruction listed in
the XML
and the instruction listed in assert_instr, and those'll need to be taken
care
of in a future commit.
Closes #240 <#240>
------------------------------
You can view, comment on, or merge this pull request online at:
#251
Commit Summary
- Verify Intel intrinsics against upstream definitions
File Changes
- *M* .travis.yml
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-0>
(1)
- *M* Cargo.toml
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-1>
(1)
- *M* ci/run.sh
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-2>
(4)
- *M* coresimd/src/x86/i586/abm.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-3>
(16)
- *M* coresimd/src/x86/i586/avx.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-4>
(165)
- *M* coresimd/src/x86/i586/avx2.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-5>
(146)
- *M* coresimd/src/x86/i586/bmi.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-6>
(8)
- *M* coresimd/src/x86/i586/sse.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-7>
(43)
- *M* coresimd/src/x86/i586/sse2.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-8>
(7)
- *M* coresimd/src/x86/i586/sse41.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-9>
(20)
- *M* coresimd/src/x86/i586/sse42.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-10>
(60)
- *M* coresimd/src/x86/i586/xsave.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-11>
(14)
- *M* coresimd/src/x86/i686/mmx.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-12>
(2)
- *M* coresimd/src/x86/i686/sse.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-13>
(16)
- *M* coresimd/src/x86/i686/sse2.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-14>
(66)
- *M* coresimd/src/x86/i686/sse41.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-15>
(44)
- *M* coresimd/src/x86/i686/ssse3.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-16>
(2)
- *M* coresimd/src/x86/x86_64/mod.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-17>
(3)
- *M* coresimd/src/x86/x86_64/sse2.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-18>
(46)
- *A* coresimd/src/x86/x86_64/sse41.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-19>
(49)
- *M* coresimd/src/x86/x86_64/xsave.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-20>
(12)
- *A* stdsimd-verify/.gitattributes
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-21>
(1)
- *A* stdsimd-verify/Cargo.toml
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-22>
(18)
- *A* stdsimd-verify/build.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-23>
(24)
- *A* stdsimd-verify/src/lib.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-24>
(248)
- *A* stdsimd-verify/tests/x86.rs
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-25>
(288)
- *A* stdsimd-verify/x86.xml
<https://github.com/rust-lang-nursery/stdsimd/pull/251/files#diff-26>
(134861)
Patch Links:
- https://github.com/rust-lang-nursery/stdsimd/pull/251.patch
- https://github.com/rust-lang-nursery/stdsimd/pull/251.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#251>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AA3NpinyeBFbIzeRTFmpBapOycipDJBZks5tDBgEgaJpZM4RLa3G>
.
|
be1b710
to
7b5f300
Compare
@@ -44,16 +44,16 @@ pub unsafe fn _lzcnt_u64(x: u64) -> u64 { | |||
#[inline(always)] | |||
#[target_feature = "+popcnt"] | |||
#[cfg_attr(test, assert_instr(popcnt))] | |||
pub unsafe fn _popcnt32(x: u32) -> u32 { | |||
x.count_ones() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was on purpose, bit manipulation intrinsics prefer unsigned integers like the ones in std::
do. Whatever we decide to do, we should choose a guideline, and follow it consistently (many other intrinsics that take __m128i
take for example u32x4
instead of i32x4
).
We should open an issue for this.
@@ -607,77 +607,77 @@ pub unsafe fn _mm256_xor_ps(a: f32x8, b: f32x8) -> f32x8 { | |||
} | |||
|
|||
/// Equal (ordered, non-signaling) | |||
pub const _CMP_EQ_OQ: u8 = 0x00; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unsigned part was on purpose (bit flags consistently used unsigned integers), why we use 8
instead of 32
bits here, I don't know.
Whatever we do, we should do it consistently, and have a guideline for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So IIRC as mentioned below, arguments that can only be 8
-bit long, 16
bit long, etc. were made _8
(e.g. i8
) on purpose. This happens on most intrinsics taking an imm8
or imm16
.
The Intel intrinsics use here int
/long
/... very loosely, so this was part of offering better types by default.
@@ -1138,7 +1096,7 @@ pub unsafe fn _mm_permute_pd(a: f64x2, imm8: i32) -> f64x2 { | |||
#[inline(always)] | |||
#[target_feature = "+avx"] | |||
#[cfg_attr(test, assert_instr(vperm2f128, imm8 = 0x5))] | |||
pub unsafe fn _mm256_permute2f128_ps(a: f32x8, b: f32x8, imm8: i8) -> f32x8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments that can only be 8
bit long like this imm8
were make i8
or u8
on purpose.
@@ -27,8 +27,8 @@ pub unsafe fn _bextr_u32(a: u32, start: u32, len: u32) -> u32 { | |||
#[target_feature = "+bmi"] | |||
#[cfg_attr(test, assert_instr(bextr))] | |||
#[cfg(not(target_arch = "x86"))] | |||
pub unsafe fn _bextr_u64(a: u64, start: u64, len: u64) -> u64 { | |||
_bextr2_u64(a, (start & 0xff_u64) | ((len & 0xff_u64) << 8_u64)) | |||
pub unsafe fn _bextr_u64(a: u64, start: u32, len: u32) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably be u8
instead of u32
since they can only take values in the [0, 64)
range.
#[inline(always)] | ||
#[target_feature = "+bmi"] | ||
#[cfg_attr(test, assert_instr(tzcnt))] | ||
pub unsafe fn _tzcnt_u16(x: u16) -> u16 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCC and clang define these:
#define _tzcnt_u16(a) (__tzcnt_u16((a)))
E.g. see here: https://clang.llvm.org/doxygen/bmiintrin_8h_source.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but this isn't on Intel's intrinsic list, which is what this crate is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the Intel intrinsic list only defines _tzcnt_u{32,64}
.
MSVC does not define it here: https://msdn.microsoft.com/en-us/library/hh977023.aspx
and the Intel C++ compiler doesn't seem to define it either. The GCC compiler defines it The GCC definition is here: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/bmiintrin.h#L38M
Perhaps, but this isn't on Intel's intrinsic list, which is what this crate is for
Intel Intrinsics list doesn't define many x86 intrinsics either (e.g. all of the AMD ones).
Anyways, I am fine with removing it, just saying that two widely used C and C++ compilers offer it, which is why it was offered here.
@@ -163,8 +153,8 @@ pub unsafe fn _tzcnt_u64(x: u64) -> u64 { | |||
#[inline(always)] | |||
#[target_feature = "+bmi"] | |||
#[cfg_attr(test, assert_instr(tzcnt))] | |||
pub unsafe fn _mm_tzcnt_u32(x: u32) -> u32 { | |||
x.trailing_zeros() | |||
pub unsafe fn _mm_tzcnt_32(x: u32) -> i32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The renaming is correct, but these should return an unsigned integer (it returns a size). This integer is in range [0,32)
, so per what we were currently doing somewhere else it should be an u8
, but I have opened an issue to clarify that.
Also, clang, GCC, and the Intel intrinsic list also define
#define _tzcnt_u32(a) (__tzcnt_u32((a)))
and in this module the non-_mm
prefixed versions are missing (so the typo might have come from here).
@@ -173,8 +163,8 @@ pub unsafe fn _mm_tzcnt_u32(x: u32) -> u32 { | |||
#[inline(always)] | |||
#[target_feature = "+bmi"] | |||
#[cfg_attr(test, assert_instr(tzcnt))] | |||
pub unsafe fn _mm_tzcnt_u64(x: u64) -> u64 { | |||
x.trailing_zeros() as u64 | |||
pub unsafe fn _mm_tzcnt_64(x: u64) -> i64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, non-_mm
-version missing, size and signedness of the result do not match what we were doing everywhere else.
@@ -55,8 +55,8 @@ pub unsafe fn _bzhi_u32(a: u32, index: u32) -> u32 { | |||
#[target_feature = "+bmi2"] | |||
#[cfg_attr(test, assert_instr(bzhi))] | |||
#[cfg(not(target_arch = "x86"))] | |||
pub unsafe fn _bzhi_u64(a: u64, index: u64) -> u64 { | |||
x86_bmi2_bzhi_64(a, index) | |||
pub unsafe fn _bzhi_u64(a: u64, index: u32) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index is in range [0, 64)
, so size of the index might be wrong
@@ -764,7 +764,7 @@ pub unsafe fn _mm_setzero_ps() -> f32x4 { | |||
#[inline(always)] | |||
#[target_feature = "+sse"] | |||
#[cfg_attr(test, assert_instr(shufps, mask = 3))] | |||
pub unsafe fn _mm_shuffle_ps(a: f32x4, b: f32x4, mask: i32) -> f32x4 { | |||
pub unsafe fn _mm_shuffle_ps(a: f32x4, b: f32x4, mask: u32) -> f32x4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that in general the Intel intrinsics "try" to use unsigned
integers for bit manipulations because that is what makes sense in C++ due to unsigned
integers having modulo arithmetic (but signed
integers having undefined behavior on overflow).
The Intel intrinsics don't do this consistently (probably because some of them are 20 years old and this guideline wasn't clear then) but we should probably follow what the new intrinsics do.
coresimd/src/x86/i586/sse.rs
Outdated
@@ -906,7 +906,7 @@ pub unsafe fn _mm_movemask_ps(a: f32x4) -> i32 { | |||
assert_instr(unpcklps))] | |||
// TODO: This function is actually not limited to floats, but that's what | |||
// what matches the C type most closely: (__m128, *const __m64) -> __m128 | |||
pub unsafe fn _mm_loadh_pi(a: f32x4, p: *const f32) -> f32x4 { | |||
pub unsafe fn _mm_loadh_pi(a: f32x4, p: *const u8x8) -> f32x4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be p: *const f32x2
because this loads two floats from p
into the upper two floats of a
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in case we can't agree on that, then I think _m64
would have been better than u8x8
.
coresimd/src/x86/i586/sse.rs
Outdated
@@ -957,7 +957,7 @@ pub unsafe fn _mm_loadh_pi(a: f32x4, p: *const f32) -> f32x4 { | |||
#[cfg_attr(all(test, target_arch = "x86", not(target_feature = "sse2")), | |||
assert_instr(movss))] | |||
// TODO: Like _mm_loadh_pi, this also isn't limited to floats. | |||
pub unsafe fn _mm_loadl_pi(a: f32x4, p: *const f32) -> f32x4 { | |||
pub unsafe fn _mm_loadl_pi(a: f32x4, p: *const u8x8) -> f32x4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: p: *const f32x2
.
coresimd/src/x86/i586/sse.rs
Outdated
@@ -1070,14 +1070,14 @@ pub unsafe fn _mm_loadr_ps(p: *const f32) -> f32x4 { | |||
// On i586 (no SSE2) it just generates plain MOV instructions. | |||
#[cfg_attr(all(test, any(target_arch = "x86_64", target_feature = "sse2")), | |||
assert_instr(movhpd))] | |||
pub unsafe fn _mm_storeh_pi(p: *mut u64, a: f32x4) { | |||
pub unsafe fn _mm_storeh_pi(p: *mut u8x8, a: f32x4) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, we should make the storeh
functions consistent with the loadh
. That is: p: *mut f32x2
.
coresimd/src/x86/i586/sse.rs
Outdated
assert_instr(movnti))] | ||
pub unsafe fn _mm_stream_pi(mem_addr: *mut i64, a: i64) { | ||
// #[cfg_attr(test, assert_instr(movntq))] // FIXME: llvm codegens wrong instr | ||
pub unsafe fn _mm_stream_pi(mem_addr: *mut i8x8, a: i8x8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be using _m64
here (which might probably fix the FIXME since that is the magic x86_mmx
type that represents integers in MMX registers).
@@ -61,7 +61,7 @@ pub unsafe fn _mm_blendv_epi8(a: i8x16, b: i8x16, mask: i8x16) -> i8x16 { | |||
#[inline(always)] | |||
#[target_feature = "+sse4.1"] | |||
#[cfg_attr(test, assert_instr(pblendw, imm8 = 0xF0))] | |||
pub unsafe fn _mm_blend_epi16(a: i16x8, b: i16x8, imm8: u8) -> i16x8 { | |||
pub unsafe fn _mm_blend_epi16(a: i16x8, b: i16x8, imm8: i32) -> i16x8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an imm8
from a bitmask I think it is more consistent with what we were previously doing to use an u8
(because of bitmask, and because of its value range).
@@ -91,7 +91,7 @@ pub unsafe fn _mm_blendv_ps(a: f32x4, b: f32x4, mask: f32x4) -> f32x4 { | |||
#[inline(always)] | |||
#[target_feature = "+sse4.1"] | |||
#[cfg_attr(test, assert_instr(blendpd, imm2 = 0b10))] | |||
pub unsafe fn _mm_blend_pd(a: f64x2, b: f64x2, imm2: u8) -> f64x2 { | |||
pub unsafe fn _mm_blend_pd(a: f64x2, b: f64x2, imm2: i32) -> f64x2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Intel Intrinsics guide calls the last argument imm8
(so u8
).
@@ -8,49 +8,49 @@ use stdsimd_test::assert_instr; | |||
use v128::*; | |||
|
|||
/// String contains unsigned 8-bit characters *(Default)* | |||
pub const _SIDD_UBYTE_OPS: i8 = 0b0000_0000; | |||
pub const _SIDD_UBYTE_OPS: i32 = 0b0000_0000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should have been u8
instead of i8
.
@@ -36,7 +36,7 @@ extern "C" { | |||
#[inline(always)] | |||
#[target_feature = "+xsave"] | |||
#[cfg_attr(test, assert_instr(xsave))] | |||
pub unsafe fn _xsave(mem_addr: *mut u8, save_mask: u64) -> () { | |||
pub unsafe fn _xsave(mem_addr: *mut u8, save_mask: u64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clippy should have complained about these, would be nice to know why it didn't and fill an issue upstream.
// no particular instruction to test | ||
pub unsafe fn _mm_movepi64_pi64(a: i64x2) -> i64 { | ||
a.extract(0) | ||
// #[cfg_attr(test, assert_instr(movdq2q))] // FIXME: llvm codegens wrong instr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should go through all FIXMEs and TODOs of the x86 intrinsics before stabilization. There was one before that say "add test on windows" with little explanation about exactly what was broken there.
coresimd/src/x86/i686/sse2.rs
Outdated
pub unsafe fn _mm_movepi64_pi64(a: i64x2) -> i64 { | ||
a.extract(0) | ||
// #[cfg_attr(test, assert_instr(movdq2q))] // FIXME: llvm codegens wrong instr? | ||
pub unsafe fn _mm_movepi64_pi64(a: i64x2) -> i8x8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong, this function returns a single 64-bit integer. It should have been __m64
instead of i64
though (probably __m64
wasn't implemented back then).
We should go over all intrinsics that return __m64
and make sure that we don't make them worse by returning something else. In particular, chaining one intrinsic that returns __m64
with one that takes __m64
as an argument probably generates assembly that operates directly on that MMX register. However, u8x8
is not an x86_mmx
LLVM type, so by using a different type that does not translates to x86_mmx
we might be in-curing a cost that the C compilers don't have.
coresimd/src/x86/i686/sse2.rs
Outdated
pub unsafe fn _mm_movpi64_epi64(a: i64) -> i64x2 { | ||
i64x2::new(a, 0) | ||
// #[cfg_attr(test, assert_instr(movq2dq))] // FIXME: llvm codegens wrong instr? | ||
pub unsafe fn _mm_movpi64_epi64(a: i8x8) -> i64x2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should take a __m64
instead of i64
or i8x8
type. This would probably fix the FIXME.
@@ -47,7 +47,7 @@ pub unsafe fn _mm_shuffle_pi8(a: u8x8, b: u8x8) -> u8x8 { | |||
#[inline(always)] | |||
#[target_feature = "+ssse3"] | |||
#[cfg_attr(test, assert_instr(palignr, n = 15))] | |||
pub unsafe fn _mm_alignr_pi8(a: u8x8, b: u8x8, n: u8) -> u8x8 { | |||
pub unsafe fn _mm_alignr_pi8(a: u8x8, b: u8x8, n: i32) -> u8x8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n: u8
was by design here as well.
for intrinsic in data.intrinsics.iter() { | ||
// This intrinsic has multiple definitions in the XML, so just ignore it. | ||
if intrinsic.name == "_mm_prefetch" { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _mm_prefetch
intrinsics does something different if the target has AVX-512, SSE, or not even SSE. I think it is ok to ignore these for now (and AVX-512 overloads some intrinsics to have slightly different behavior).
|
||
// Verify that all `#[target_feature]` annotations are correct, ensuring | ||
// that we've actually enabled the right instruction set for this | ||
// intrinsic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we check here that we have put the intrinsic in the right module (e.g. by saving the name of the module in which the intrinsics are implemented)? (there were some avx2 intrinsics in the avx module)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but I'll leave that to a future commit.
.collect::<String>(); | ||
|
||
// Normalize `bmi1` to `bmi` as apparently that's what we're calling | ||
// it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to change the name of the target feature here to match intel's name before stabilization.
|
||
_ => panic!("failed to equate: `{}` and {:?} for {}", intel, t, intrinsic), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work here.
The three main points that worry me a bit is that:
- this equates
int
to 32-bit integer, but anint
in C can be a 64-bit integer (and actually are in some platforms). I think assumingint
to mean 32-bit integer is fine, but - Intel often writes that the argument is an 8-bit integer type, even though the argument type is
int
, or the specification mentions that the value is at most N, meaning that a 2-bit, 4-bit, 8-bit, 16-bit integer types might make more sense to represent these, - this assumes that
int
maps to a signed integer, but in many intrinsics intel usesint
to denote a count or a bit index, which can't never be negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but an int in C can be a 64-bit integer (and actually are in some platforms).
Not on Intel AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't depend on the CPU vendor, but on the platform (which must conform to the C standard). For example:
- on
x86_64-unknown-linux-gnu
int
s are 32-bit, and long are64-bit
, - on
x86_64-pc-windows-msvc
int
s andlong
s are 32-bit andlong long
s are 64-bit, and - on
x86_64-cray-linux-gnu
int
s are also 64-bit.
All of these conform to the C standard, and there are x86_64-cray-linux-gnu
targets with both Intel and AMD CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er what I mean is this is that this doesn't matter for us. Rust doesn't have any platforms where i32 isn't int
, so we can just cross this bridge when we get there. There's no real need to add hundreds of lines of "portability code" that aren't actually buying us anything.
So this looks awesome and it has surfaced lots of bugs and inconsistencies in the conventions that we were trying to follow. I think we should clarify whether we want to keep following these conventions and improve this a bit to enforce them:
|
In general as the libs team has discussed SIMD over time it's becoming more and more apparent that we're not going to deviate at all from Intel's upstream definitions. I expect the I'd personally prefer to not have special cases when reading Intel's own definitions and trying to bend over backwards to map those to our own "possibly more correct" types. If we can come up with a strong technical argument for why we'd make such a transformation I thin kthat makes sense, but "we happened to do it yesterday" I don't think is sufficient. For |
I'll be ok with:
That is, we have the same intrinsic API that C does, but higher-level libraries and users can use portable The points 1. and 2. are non-controversial because that's what C does and this is a low-level API, so we can deliver those "soon". The point 3 would be a convenience for libraries like All in all, I'll agree to all of this, but it's a huge change API wise, and it hasn't been discussed much in the open, I think. The quickest thing would be to create an issue here to discuss it, but this is already at a point where we (or I) can write an RFC about this so maybe we should do that instead. In the mean time I don't know what to do with this PR, I think we should merge it, but changing all those APIs is a huge breaking change.
We probably should just add an
Basically we will need to review every single intrinsic, and motivate why it uses one type instead of another. That's an absurd ton of work. From the pov of "we want to stabilize this someday", trying to deviate from the Intel intrinsics in any way makes this whole thing "un-rewieable/un-RFC'able".
IIRC when you added |
@BurntSushi help |
It's worth noting, however, that the ambiguity of intel's vector types isn't necessarily duplicated across all platforms. Having any of the AArch64 intrinsics take an I'd be willing to steward a package containing the more stringently-typed definitions of these intrinsics, if that helps with the decision at all. Perhaps we could have a solution similar to the
I maintain 2 of the 5 packages which depend on stdsimd at the moment, and I'd recommend breaking the API as much as possible while we're still figuring this stuff out and few people actually use the crate. |
4c3473b
to
134792b
Compare
Yeah that's what I'd imagine as well, the "more portable" types would probably be much more laggard in stabilization as well.
Agreed it's a big change! We'd want to be somewhat careful about it I think and probably yes have a bit of discussion first. FWIW though I don't consider this library having any stability guarantees right now though.
I'd rather not, we have yet to even see from a distance a realistic platform for Rust where this is true. This has lots of complications when moving into libcore as well, let's just not.
Correct, this is why "hey it's the same as intel" is so crucial to be able to rubber-stamp APIs and be confident in such a stamp
Certainly! Something like
That'd be great! It's certainly something we always wanted to see on top of a library like this! |
I plan on addressing some of the feedback and then merging. I don't think we'll break much usage by switching the types from things like |
c884f4e
to
d8dc256
Compare
I'm in favor of this. I basically don't think "breakage" at this point is something we should be concerning ourselves with. This crate is still very clearly in an experimental phase while we iron out the API. Folks using it today should be doing it with that knowledge, and with the knowledge that their experience should help inform us of future API decisions. We should semver bump and move on. :) I don't have too many other thoughts (other than the fact that this PR is awesome). I am slowly coming around to @alexcrichton's view that we should revert our efforts to provide more stringent types on the vector intrinsics. I am pretty disappointed myself since we put so much work into it, but I suppose it's just a sunk cost. I tend to think that having these be checked in an automated fashion is a pretty huge advantage. With that said, I do think we should have a contingency. i.e., The stabilization effort should lay out plans for what to do in the event that we did indeed get some intrinsics wrong. |
@alexcrichton So all in all I'd say this is good to once it's green again. The only thing that is a bit sketchy is switching so many APIs to use
I think we should just provide all vector types in The Implementing these types needs unstable intrinsics that we would prefer never to stabilize, and we need to implement both "weak" and "strong" vector types in Given that they are already implemented, and that they currently work with each other (this is not hard), I don't currently see what we would gain from splitting the types into a different crate. In particular, because Maybe my views on this will change as we implement more ARM, MIPS, and PowerPC intrinsics in Also, for stabilizing the intrinsics of a particular platform we only need to stabilize the vector types that the platform uses. That is, to stabilize the
Well its not 100% sunk: we learned something. Once the RFC for this arrives somebody is going to ask "why aren't the Intel intrinsics more strongly typed?" and now we have a pretty good answer for that ("they were more strongly typed for a while; it did not work out").
Agreed. |
Oh I think I moved them over to |
88ee2fe
to
601f76d
Compare
This commit adds a new crate for testing that the intrinsics listed in this crate do indeed match the upstream definition of each intrinsic. A pre-downloaded XML description of all Intel intrinsics is checked in which is then parsed in the `stdsimd-verify` crate to verify that everything we write down is matched against the upstream definitions. Currently the checks are pretty loose to get this compiling but a few intrinsics were fixed as a result of this. For example: * `_mm256_extract_epi8` - AVX2 intrinsic erroneously listed under AVX * `_mm256_extract_epi16` - AVX2 intrinsic erroneously listed under AVX * `_mm256_extract_epi32` - AVX2 intrinsic erroneously listed under AVX * `_mm256_extract_epi64` - AVX2 intrinsic erroneously listed under AVX * `_mm_tzcnt_32` - erroneously had `u32` in the name * `_mm_tzcnt_64` - erroneously had `u64` in the name * `_mm_cvtsi64_si128` - erroneously available on 32-bit platforms * `_mm_cvtsi64x_si128` - erroneously available on 32-bit platforms * `_mm_cvtsi128_si64` - erroneously available on 32-bit platforms * `_mm_cvtsi128_si64x` - erroneously available on 32-bit platforms * `_mm_extract_epi64` - erroneously available on 32-bit platforms * `_mm_insert_epi64` - erroneously available on 32-bit platforms * `_mm256_extract_epi16` - erroneously returned i32 instead of i16 * `_mm256_extract_epi8` - erroneously returned i32 instead of i8 * `_mm_shuffle_ps` - the mask argument was erroneously i32 instead of u32 * `_popcnt32` - the signededness of the argument and return were flipped * `_popcnt64` - the signededness of the argument was flipped and the argument was too large bit-wise * `_mm_tzcnt_32` - the return value's sign was flipped * `_mm_tzcnt_64` - the return value's sign was flipped * A good number of intrinsics used `imm8: i8` or `imm8: u8` instead of `imm8: i32` which Intel was using. (we were also internally inconsistent) * A number of intrinsics working with `__m64` were instead working with i64/u64, so they're now corrected to operate with the vector types instead. Currently the verifications performed are: * Each name in Rust is defined in the XML document * The arguments/return values all agree. * The CPUID features listed in the XML document are all enabled in Rust as well. The type matching right now is pretty loose and has a lot of questionable changes. Future commits will touch these up to be more strict and require closer adherence with Intel's own types. Otherwise types like `i32x8` (or any integers with 256 bits) all match up to `__m256i` right now, althoguh this may want to change in the future. Finally we're also not testing the instruction listed in the XML right now. There's a huge number of discrepancies between the instruction listed in the XML and the instruction listed in `assert_instr`, and those'll need to be taken care of in a future commit. Closes rust-lang#240
This commit adds a new crate for testing that the intrinsics listed in this
crate do indeed match the upstream definition of each intrinsic. A
pre-downloaded XML description of all Intel intrinsics is checked in which is
then parsed in the
stdsimd-verify
crate to verify that everything we writedown is matched against the upstream definitions.
Currently the checks are pretty loose to get this compiling but a few intrinsics
were fixed as a result of this. For example:
_mm256_extract_epi8
- AVX2 intrinsic erroneously listed under AVX_mm256_extract_epi16
- AVX2 intrinsic erroneously listed under AVX_mm256_extract_epi32
- AVX2 intrinsic erroneously listed under AVX_mm256_extract_epi64
- AVX2 intrinsic erroneously listed under AVX_mm_tzcnt_32
- erroneously hadu32
in the name_mm_tzcnt_64
- erroneously hadu64
in the name_mm_cvtsi64_si128
- erroneously available on 32-bit platforms_mm_cvtsi64x_si128
- erroneously available on 32-bit platforms_mm_cvtsi128_si64
- erroneously available on 32-bit platforms_mm_cvtsi128_si64x
- erroneously available on 32-bit platforms_mm_extract_epi64
- erroneously available on 32-bit platforms_mm_insert_epi64
- erroneously available on 32-bit platforms_mm256_extract_epi16
- erroneously returned i32 instead of i16_mm256_extract_epi8
- erroneously returned i32 instead of i8_mm_shuffle_ps
- the mask argument was erroneously i32 instead of u32_popcnt32
- the signededness of the argument and return were flipped_popcnt64
- the signededness of the argument was flipped and the argumentwas too large bit-wise
_mm_tzcnt_32
- the return value's sign was flipped_mm_tzcnt_64
- the return value's sign was flippedimm8: i8
orimm8: u8
instead ofimm8: i32
which Intel was using. (we were also internally inconsistent)__m64
were instead working with i64/u64,so they're now corrected to operate with the vector types instead.
Currently the verifications performed are:
The type matching right now is pretty loose and has a lot of questionable
changes. Future commits will touch these up to be more strict and require closer
adherence with Intel's own types. Otherwise types like
i32x8
(or any integerswith 256 bits) all match up to
__m256i
right now, althoguh this may want tochange in the future.
Finally we're also not testing the instruction listed in the XML right now.
There's a huge number of discrepancies between the instruction listed in the XML
and the instruction listed in
assert_instr
, and those'll need to be taken careof in a future commit.
Closes #240