-
Notifications
You must be signed in to change notification settings - Fork 756
thrust::count() total malfunction for arrays of 2^31 and more elements #989
Comments
Yeah. This is one of the longstanding known bugs; I'm in the process of addressing and verifying the fixes for as many of those as I can find. Since it's sometimes hard to write synthetic test cases that detect this without ridiculous amounts of memory necessary, we'll be grateful for a list of algorithms in which you've encountered a problem with input sizes like this. |
Michał, As far as can guess (or suspect), the problem resides (at least) at any procedure which defines
The OffsetT usage seems similar (at first sight) in all these cases so I suspect they all are crashing the same way. I am not sure there was no other problems of the kind in code, but Moreover, there are also typedefs:
which seem also suspicious, but I'm not sure yet that they cause problems. I can drop in an idea to create a special edition of all unit tests with all the arrays extended up to 2^32, and try to run a "smoke test" on this modification. Not sure it makes sense here, but it is just an idea. Please keep us informed on the progress with this issue. Regards, |
Are there any updates on this issue? Regards, |
This is related to NVIDIA/cccl#744. |
I will close this as a duplicate of NVIDIA/cccl#744. We are aware and have recently started working on expanding algorithms to larger ranges |
For big arrays it appears that OffsetT typedef'ed type which is widely used in thrust algorithms, is not capable of addressing large arrays. It is hardcoded as "typedef int OffsetT;", but int is normally a 32-bit type.
The following code snippet illustrates the problem the way it appears on thrust::count(), but the same can probably be seen for some other algorithms.
`//
// This file illustrates failure use case for thrust::count() if the size of array is close to 2^31
//
// nvcc -g -lineinfo -o thrust_big thrust_big.cu
//
// sudo /usr/local/cuda/bin/cuda-memcheck $PWD/thrust_big 2137502016
//
// Result:
//
// Error messages about illegal memory reads in LoadDirectStriped() from file block_load.cuh
// called from thrust::cuda_cub::cub::DeviceReduceKernel<>
//
// Changing the OffsetT to long all around thrust library fixes this.
//
#include
#include <assert.h>
#include <unistd.h>
#include <stdlib.h>
#include <thrust/system/cuda/execution_policy.h>
#include <thrust/count.h>
#define CUDA_CALL(X) { cudaError_t err = X; if (err != cudaSuccess) { throw err; } }
size_t fill(char *src, size_t size)
{
assert(size > 2);
size_t cnt = 0;
for (size_t i = 0; i < size - 2; i++) {
// unsigned x = (rand() % 16);
unsigned x = 0;
src[i] = (x == 0 ? '\n' : 'A' + (char)x - (char)1);
if (x == 0) cnt++;
}
src[size-2] = '\n';
src[size-1] = '\0';
return cnt + 1;
}
int main(int argc, char **argv)
{
try {
assert(argc > 1);
char *dev_src = NULL;
size_t size = (size_t)std::stoull(argv[1]);
char *src = (char *)malloc(size);
size_t N = fill(src, size);
CUDA_CALL(cudaMalloc(&dev_src, size));
CUDA_CALL(cudaMemcpy(dev_src, src, size, cudaMemcpyHostToDevice));
size_t num_rows = (size_t)thrust::count(thrust::device, dev_src, dev_src + size, '\n');
assert(num_rows == N);
}
catch(cudaError_t &err) {
std::cerr << "CUDA ERROR: " << cudaGetErrorString(err) << std::endl;
return 1;
}
catch(std::exception &ex) {
std::cerr << "std::exception: " << ex.what() << std::endl;
return 1;
}
catch (...) {
std::cout << "UNKNOWN EXCEPTION" << std::endl;
return 1;
}
return 0;
}
`
Are there any plans to tune or fix this?
The text was updated successfully, but these errors were encountered: