-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable hash computation from variable length keys #327
Conversation
Murmur3 doesn't play nice with the dynamic size. For now, I just implemented this idea for xxhash. |
I'm seeing a few build warnings in the tests: In member function ‘constexpr cuco::detail::XXHash_64<Key>::result_type cuco::detail::XXHash_64<Key>::operator()(const Key&, Extent) const [with Extent = long unsigned int; Key = int]’,
inlined from ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’ at /workspaces/cuCollections/tests/utility/hash_test.cu:170:33:
/workspaces/cuCollections/include/cuco/detail/hash_functions/xxhash.cuh:296:38: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
296 | v2 += blocks8[pipeline_offset + 1] * prime2;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/workspaces/cuCollections/tests/utility/hash_test.cu: In function ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’:
/workspaces/cuCollections/tests/utility/hash_test.cu:170:75: note: ‘<anonymous>’ declared here
170 | CHECK(hash(42) == hash(42, key_size));
| ^
In member function ‘constexpr cuco::detail::XXHash_64<Key>::result_type cuco::detail::XXHash_64<Key>::operator()(const Key&, Extent) const [with Extent = long unsigned int; Key = int]’,
inlined from ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’ at /workspaces/cuCollections/tests/utility/hash_test.cu:170:33:
/workspaces/cuCollections/include/cuco/detail/hash_functions/xxhash.cuh:299:38: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
299 | v3 += blocks8[pipeline_offset + 2] * prime2;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/workspaces/cuCollections/tests/utility/hash_test.cu: In function ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’:
/workspaces/cuCollections/tests/utility/hash_test.cu:170:75: note: ‘<anonymous>’ declared here
170 | CHECK(hash(42) == hash(42, key_size));
| ^
In member function ‘constexpr cuco::detail::XXHash_64<Key>::result_type cuco::detail::XXHash_64<Key>::operator()(const Key&, Extent) const [with Extent = long unsigned int; Key = int]’,
inlined from ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’ at /workspaces/cuCollections/tests/utility/hash_test.cu:170:33:
/workspaces/cuCollections/include/cuco/detail/hash_functions/xxhash.cuh:302:38: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
302 | v4 += blocks8[pipeline_offset + 3] * prime2;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/workspaces/cuCollections/tests/utility/hash_test.cu: In function ‘void CATCH2_INTERNAL_TEMPLATE_TEST_8() [with Hash = cuco::detail::XXHash_64<int>]’:
/workspaces/cuCollections/tests/utility/hash_test.cu:170:75: note: ‘<anonymous>’ declared here
170 | CHECK(hash(42) == hash(42, key_size)); |
What is the issue with murmurhash? We could take cudf code as a reference (https://github.com/rapidsai/cudf/blob/d14b6cce9cc39793c118f065b113c83d0210ceb6/cpp/include/cudf/detail/utilities/hash_functions.cuh#L195-L285). Eventually, I would like to get rid of all hash function details in libcudf and reply on cuco hashers instead. |
Dynamic mumurhash3 passes the unit test now (see diff 4a95fa8) but I'm getting even more warnings: /workspaces/cuCollections/include/cuco/detail/hash_functions/murmurhash3.cuh:183:13: warning: this statement may fall through [-Wimplicit-fallthrough=]
183 | case 3: k1 ^= tail[2] << 16;
| ~~~^~~~~~~~~~~~~~~~~~~~
/workspaces/cuCollections/include/cuco/detail/hash_functions/murmurhash3.cuh:184:1: note: here
184 | case 2: k1 ^= tail[1] << 8;
| ^
/workspaces/cuCollections/include/cuco/detail/hash_functions/murmurhash3.cuh:184:13: warning: this statement may fall through [-Wimplicit-fallthrough=]
184 | case 2: k1 ^= tail[1] << 8;
| ~~~^~~~~~~~~~~~~~~~~~~
... |
|
* @return A resulting hash value for `key` | ||
*/ | ||
template <typename Extent> | ||
constexpr result_type __host__ __device__ operator()(Key const& key, Extent size) const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline. We would provide a member like
constexpr result_type __host__ __device__ compute_hash(Key const& key, Extent size) const noexcept;
instead of overloading ()
operator.
Per std::hash
std::hash<Key>::operator()
C++ Utilities library std::hash
Specializations of std::hash should define an operator() that:
Takes a single argument key of type Key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we still meet the standard requirements/interface. This is just another overload of the operator. The standard does not prohibit additional functionality.
Proof that with the changes introduced by this PR the compiler is still able to apply the same optimizations in case of a static extent (PTX output is identical): |
const uint32_t* const blocks = (const uint32_t*)(data + nblocks * 4); | ||
for (int i = -nblocks; i; i++) { | ||
uint32_t k1 = blocks[i]; // getblock32(blocks,i); | ||
for (std::remove_const_t<decltype(nblocks)> i = 0; size >= 4 && i < nblocks; i++) { | ||
std::uint32_t k1 = load_chunk<std::uint32_t>(data, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PointKernel Here's the essential change I had to make to eliminate the compiler warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conflicts to resolve (due to the bug fix in #326)
@sleeepyjack Can you please add a benchmark verifying the memcpy won't affect runtime performance?
Otherwise LGTM
Performance difference between the current [0] NVIDIA RTX A6000
|
rerun tests |
Ok, so it turns out that the commit that introduced the |
With CTK 12.2 everything compiles just fine - no compiler segfaults. Does anyone know of a recently fixed nvcc bug causing this issue? |
* @return A resulting hash value for `key` | ||
*/ | ||
template <typename Extent> | ||
constexpr result_type __host__ __device__ operator()(Key const& key, Extent size) const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels weird. Normally, I'd expect such an esoteric use case to provide a custom hash function as opposed to trying to twist the standard hash function signature to make it work for this kind of situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just discussed this with @PointKernel. I will change the signature to
template <typename Extent>
constexpr result_type __host__ __device__ compute_bytes(std::byte const* data, Extent size) const noexcept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that a user can use this function to build a custom hasher around our hashers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit more modern approach would be
template<class T, std::size_t N>
constexpr result_type __host__ __device__ compute_hash(cuda::std::span<T, N> data) const noexcept;
Edit: We're still waiting for cuda::std::span
support: #332
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 74a3739
Updated benchmarks (reference is current dev branch): [0] NVIDIA RTX A6000
|
8fbd42d
to
d2a2538
Compare
This PR enables hash computation of key types whose sizes are only known at runtime.