Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any() on boolean vectors on 32-bit ARM likely broken #12

Closed
hsivonen opened this issue Jun 9, 2017 · 5 comments
Closed

any() on boolean vectors on 32-bit ARM likely broken #12

hsivonen opened this issue Jun 9, 2017 · 5 comments

Comments

@hsivonen
Copy link
Owner

hsivonen commented Jun 9, 2017

Steps to reproduce

  1. Use and ARMv7 + NEON Linux host.
  2. git clone https://github.com/hsivonen/encoding_rs
  3. cd encoding_rs
  4. git checkout 3049251cd80bb8eebc7d8c96057480d4e84fffef
  5. RUSTFLAGS=' -C target-feature=+neon' cargo test --features simd-accel

Expected results

Expected tests to pass, since encoding_rs contains no 32-bit ARM-specific code and the same code that only uses simd-crate facilities and cross-architecture LLVM shuffles works on Aarch64.

Actual results

Various tests fail. Since it's unlikely that LLVM is broken and unlikely that the rustc-to-LLVM part is broken just for 32-bit ARM, I suspect that the implementation for any() on boolean vectors is broken.

@hsivonen
Copy link
Owner Author

hsivonen commented Jun 9, 2017

The implementation seems to assume that it's OK to transmute a 128-bit vector into a pair of 64-bit vectors (the 128-bit registers are aliased with two 64-bit registers). This is not what clang's arm_neon.h does, so maybe the assumption that the transmute is OK is no longer valid with rustc and LLVM updates.

@hsivonen
Copy link
Owner Author

To use an aliased half-register, arm_neon.h does this:

__ai uint8x8_t vget_high_u8(uint8x16_t __p0) {
  uint8x8_t __ret;
  __ret = __builtin_shufflevector(__p0, __p0, 8, 9, 10, 11, 12, 13, 14, 15);
  return __ret;
}

AFAICT, to fix this, a compiler RFC to extend SIMD shuffles so that the parameter and return value lane number doesn't need to be the same is needed. I'm thinking adding simd_shuffle16to8, etc.

@BurntSushi
Copy link
Contributor

@hsivonen I suspect you don't need an RFC for that. Instead, you can probably just submit a PR. The raw shuffle intrinsics are unstable today and probably will be for the foreseeable future. (So long as the spectre of integer generics looms, I suspect that will be true.) I think the quickest way to stabilization is to provide a layer above the shuffle in std, e.g., by defining vget_high_u8 (assuming that's a vendor intrinsic?) in std (errrmmm, I mean core), which would internally use the appropriate shuffle.

@hsivonen
Copy link
Owner Author

OK. I'll try to go with the direct rustc PR route.

@hsivonen
Copy link
Owner Author

hsivonen commented Sep 6, 2017

AFAICT, to fix this, a compiler RFC to extend SIMD shuffles so that the parameter and return value lane number doesn't need to be the same is needed. I'm thinking adding simd_shuffle16to8, etc.

I was wrong. rustc already supports N to M shuffles. The number is the shuffle name is the output lane count and does not limit the input lanes.

hsivonen added a commit that referenced this issue May 2, 2018
Use shuffles instead of transmutes for accessing aliased
half-registers. Implement the `Simd` trait for more types in
order to satisfy the trait bounds of the shuffle intrinsic
declarations. Bitcast to `u32x2` before extracting data to an
ALU register. In the `all()` case, compare with `0xFFFFFFFF`
instead of zero.

Closes #12.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants