Skip to content

Commit

Permalink
SIMD backend of ARM NEON (#5775)
Browse files Browse the repository at this point in the history
* in the process of adding a NEON backend

* more work on NEON

* up to double where expressions

* finished first draft!

* working on compiling NEON backend

* got NEON backend to compile

* formatting

* actually test NEON and test conversions

conversions are missing right now

* test and fix vsetq_lane_ usage

* implement and test shifts

* implement and test a mask conversion

* test and implement i32 -> u64

* test and implement u64 -> i32

* test and fix final conversion

* formatting

* consolidate NEON mask types

there are really only two implementations:
masks for 64-bit and 32-bit value types.
use a bit of CRTP to ensure return types
of operators are correct

* formatting

* move converting constructors

* add missing nodiscard

* add unary negation for 32-bit signed integer

* add 64-bit signed addition and move unary negation

* replace all vdup with vmov

as far as I can tell they're exactly identical,
except that there are some vmov intrinsics
that don't have vdup equivalents, so vmov
seems to just be the better one to use

* ensure all the condition methods are [[nodiscard]]

* add subtraction and addition for 64bit uint

* formatting
  • Loading branch information
ibaned authored Jan 30, 2023
1 parent fb7d9f2 commit 8103d82
Show file tree
Hide file tree
Showing 4 changed files with 1,151 additions and 1 deletion.
8 changes: 8 additions & 0 deletions simd/src/Kokkos_SIMD.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@
#include <Kokkos_SIMD_AVX512.hpp>
#endif

#ifdef __ARM_NEON
#include <Kokkos_SIMD_NEON.hpp>
#endif

namespace Kokkos {
namespace Experimental {

Expand All @@ -40,6 +44,8 @@ namespace Impl {
using host_native = avx512_fixed_size<8>;
#elif defined(KOKKOS_ARCH_AVX2)
using host_native = avx2_fixed_size<4>;
#elif defined(__ARM_NEON)
using host_native = neon_fixed_size<2>;
#else
using host_native = scalar;
#endif
Expand Down Expand Up @@ -134,6 +140,8 @@ class abi_set {};
using host_abi_set = abi_set<simd_abi::scalar, simd_abi::avx512_fixed_size<8>>;
#elif defined(KOKKOS_ARCH_AVX2)
using host_abi_set = abi_set<simd_abi::scalar, simd_abi::avx2_fixed_size<4>>;
#elif defined(__ARM_NEON)
using host_abi_set = abi_set<simd_abi::scalar, simd_abi::neon_fixed_size<2>>;
#else
using host_abi_set = abi_set<simd_abi::scalar>;
#endif
Expand Down
28 changes: 28 additions & 0 deletions simd/src/Kokkos_SIMD_Common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,34 @@ template <class T, class Abi>
return simd<T, Abi>([&](std::size_t i) { return lhs[i] * rhs[i]; });
}

// fallback simd shift using generator constructor
// At the time of this writing, these fallbacks are only used
// to shift vectors of 64-bit unsigned integers for the NEON backend

template <class T, class U, class Abi>
[[nodiscard]] KOKKOS_IMPL_HOST_FORCEINLINE_FUNCTION simd<T, Abi> operator>>(
simd<T, Abi> const& lhs, unsigned int rhs) {
return simd<T, Abi>([&](std::size_t i) { return lhs[i] >> rhs; });
}

template <class T, class U, class Abi>
[[nodiscard]] KOKKOS_IMPL_HOST_FORCEINLINE_FUNCTION simd<T, Abi> operator<<(
simd<T, Abi> const& lhs, unsigned int rhs) {
return simd<T, Abi>([&](std::size_t i) { return lhs[i] << rhs; });
}

template <class T, class U, class Abi>
[[nodiscard]] KOKKOS_IMPL_HOST_FORCEINLINE_FUNCTION simd<T, Abi> operator>>(
simd<T, Abi> const& lhs, simd<U, Abi> const& rhs) {
return simd<T, Abi>([&](std::size_t i) { return lhs[i] >> rhs[i]; });
}

template <class T, class U, class Abi>
[[nodiscard]] KOKKOS_IMPL_HOST_FORCEINLINE_FUNCTION simd<T, Abi> operator<<(
simd<T, Abi> const& lhs, simd<U, Abi> const& rhs) {
return simd<T, Abi>([&](std::size_t i) { return lhs[i] << rhs[i]; });
}

// The code below provides:
// operator@(simd<T, Abi>, Arithmetic)
// operator@(Arithmetic, simd<T, Abi>)
Expand Down
Loading

0 comments on commit 8103d82

Please sign in to comment.