Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add x86 AES-NI vendor intrinsics #311

Merged
merged 9 commits into from
Feb 5, 2018
Merged

Conversation

ruuda
Copy link
Contributor

@ruuda ruuda commented Jan 30, 2018

Fixes #295.

I have not tested this yet, waiting for a nightly that includes rust-lang/rust@2497d10 to be released.

Also, the MSDN documentation is a bit clearer and more comprehensive than the Intel intrinsics guide (thanks @gnzlbg for pointing this out), but I am not sure if we could copy it into the docs license-wise. In the worst case we could add links to the MSDN docs.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jan 30, 2018

but I am not sure if we could copy it into the docs license-wise.

I would just use that as "inspiration" for the tests. For the docs you can check the clang-docs. IIRC @alexcrichton suggested that we could base our docs on those and then just add attribution on the readme.


use x86::*;

#[simd_test = "aex"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be aes instead of aex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

#[target_feature(enable = "aes")]
#[cfg_attr(test, assert_instr(aeskeygenassist))]
pub unsafe fn _mm_aeskeygenassist_si128(a: __m128i, imm8: u8) -> __m128i {
aeskeygenassist(a, imm8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't probably work because imm8 must be a constant but here it is a run-time value.

You need a constify macro for this, from the description I'd guess that you need constify_imm8!.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With constify it does work, indeed.

@alexcrichton
Copy link
Member

Thanks! These are currently in x86_64 but that means they're only compiled in the x86_64-* targets, but I think these can move into i686 to be on 32-bit as well!

@ruuda
Copy link
Contributor Author

ruuda commented Jan 31, 2018

The tests now pass on rustc 1.25.0-nightly (def3269a7 2018-01-30).

Thanks! These are currently in x86_64 but that means they're only compiled in the x86_64-* targets, but I think these can move into i686 to be on 32-bit as well!

Can they? Intel does not list any 32-bit CPUs with AES-NI, unless I am missing something.

I would just use that as "inspiration" for the tests. For the docs you can check the clang-docs. IIRC @alexcrichton suggested that we could base our docs on those and then just add attribution on the readme.

The constants in the tests should not be an issue. The documentation comments in Clang’s __wmmintrin_aes.h do not seem much more useful that what we have now. What I was referring to is the remarks section of MSDN docs, which contains a bit more information on how these instructions are supposed to be used together.

@alexcrichton
Copy link
Member

@ruuda ah that's true, but making making code compatible across x86/x86_64 tends to be easier if it's available everywhere. The cfg_feature_enabled! would probably just always return false on i686 which means they'd never be called anyway.

I think it also matches what clang/gcc do, making the intrinsics available on both platforms.

unsafe fn test_mm_aeskeygenassist_si128() {
// Constants taken from https://msdn.microsoft.com/en-us/library/cc714138.aspx.
let a = __m128i(0x0123456789abcdef, 0x8899aabbccddeeff);
let e = __m128i(0x857c266b7c266e85, 0xeac4eea9c4eeacea);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use _mm_set_epi64x to avoid making an assumption about the internal layout of __m128i?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ruuda
Copy link
Contributor Author

ruuda commented Feb 1, 2018

making making code compatible across x86/x86_64 tends to be easier if it's available everywhere

Ah, I see, that makes sense. Moved them into i686.

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 1, 2018

@ruuda

Can they? Intel does not list any 32-bit CPUs with AES-NI, unless I am missing something.

64-bit CPUs can run 32-bit binaries. AFAIK 32-bit binaries can call 64-bit CPU intrinsics without issues, but they obviously won't be able to run on CPUs that don't have these intrinsics.

@alexcrichton
Copy link
Member

Looks like stdsimd's runtime detection may need to be taught the aes feature?

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 1, 2018

@alexcrichton yes, I am on it.

@alexcrichton
Copy link
Member

@ruuda want to rebase on #312 and see if the tests are fixed/

ruuda added 5 commits February 2, 2018 17:10
These tests are based on the examples in Microsoft's documentation.
Same input should result in the same output in any case.
Use _mm_set_epi64x instead to construct constants.
Although i686 does not have the AES New Instructions, making code
compatible across x86 and x64_64 tends to be easier if the intrinsics
are available everywhere.
@ruuda
Copy link
Contributor Author

ruuda commented Feb 2, 2018

thread 'x86::i686::aes::assert__mm_aeskeygenassist_si128_aeskeygenassist' panicked at 'instruction found, but the disassembly contains too many instructions: #instructions = 781 >= 20 (limit)', stdsimd-test/src/lib.rs:375:9

Looks like that constify macro expanded to an enormous function ... is there anything we can do about that?

/// a round key for encryption cipher using data from `a` and an 8-bit round constant.
#[inline]
#[target_feature(enable = "aes")]
#[cfg_attr(test, assert_instr(aeskeygenassist))]
Copy link
Contributor

@gnzlbg gnzlbg Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this

assert_instr(aeskeygenassist)

to (from the top of my head):

assert_instr(aeskeygenassist, imm8 = 0)

That should fix the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to explain what's going on, imm8 must be a compile-time argument, and while testing the disassembly, we need to pass it a value. If one doesn't specify it in the assert_instr macro, it will pass it a run-time value, and that won't work for this intrinsic.

We don't have a way of expressing that function arguments must be const yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation. Should be fixed now.

I was a bit worried that this meant that the entire match statement would end up in the final binary, so I also verified the disassembly of test_mm_aeskeygenassist_si128 in release mode, and fortunately it does get const-folded:

0000000000041720 <coresimd::x86::i686::aes::tests::test_mm_aeskeygenassist_si128::test_mm_aeskeygenassist_si128>:
   41720:       55                      push   %rbp
   41721:       48 89 e5                mov    %rsp,%rbp
   41724:       48 81 ec 80 00 00 00    sub    $0x80,%rsp
   4172b:       66 0f 3a df 05 3b c7    aeskeygenassist $0x5,0x11c73b(%rip),%xmm0        # 15de70 <str.4y+0x20>
   41732:       11 00 05 
   41735:       66 0f 7f 45 90          movdqa %xmm0,-0x70(%rbp)
   4173a:       0f 28 05 9f c7 11 00    movaps 0x11c79f(%rip),%xmm0        # 15dee0 <str.4y+0x90>
   41741:       0f 29 45 80             movaps %xmm0,-0x80(%rbp)

In debug mode it doesn’t though, instead there is a call to _mm_aeskeygenassist_si128 which contains the entire match statement. Something along the lines of rust-lang/rust#47980 would be good to have indeed.

Pass a particular value for the disassembly test, so we end up with one
instruction, instead of the match arm that matches on all 256 values.
@ruuda
Copy link
Contributor Author

ruuda commented Feb 5, 2018

The only thing failing at this point, is that the Intel xml file claims that the aeskeygenassist argument has type const int. The description clearly specifies that it is an “8-bit round constant” though. Other intrinsics with an imm8 argument specify it as i32. Do we want to blindly copy this idiosyncrasy from the C API? Or shall I update the test to allow const int imm8 arguments to be u8? The right instruction is emitted either way.

@alexcrichton
Copy link
Member

@ruuda for now I'd stick exactly to what Intel says, and Intel says int which we translate to i32, so I'd switch to i32.

@alexcrichton
Copy link
Member

Bah it looks like that yield an LLVM error. Maybe the LLVM intrinsic needs a u8?

Intel documentation specifies it as an "8-bit round constant", but then
goes on to give it a type "const int", which translates to i32 in Rust.
The test that verifies the Rust signatures against Intel documentation
failed on this.

For now we will replicate the C API verbatim. Even when Rust could have
a more accurate type signature that makes passing values more than 8
bits impossible, rather than silently mapping out-of-range values to
255.
@ruuda
Copy link
Contributor Author

ruuda commented Feb 5, 2018

It compiles 🎉

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 5, 2018

Nice! Now you just need to fix the minor formatting and clippy issues and this is good to go! (see the last two build bots!)

@alexcrichton alexcrichton merged commit 0d6b868 into rust-lang:master Feb 5, 2018
@alexcrichton
Copy link
Member

Thanks @ruuda!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants