thrust::count() total malfunction for arrays of 2^31 and more elements #989

alexey-medvedev-fastdata-io · 2019-05-26T19:47:30Z

For big arrays it appears that OffsetT typedef'ed type which is widely used in thrust algorithms, is not capable of addressing large arrays. It is hardcoded as "typedef int OffsetT;", but int is normally a 32-bit type.

The following code snippet illustrates the problem the way it appears on thrust::count(), but the same can probably be seen for some other algorithms.

`//
// This file illustrates failure use case for thrust::count() if the size of array is close to 2^31
//
// nvcc -g -lineinfo -o thrust_big thrust_big.cu
//
// sudo /usr/local/cuda/bin/cuda-memcheck $PWD/thrust_big 2137502016
//
// Result:
//
// Error messages about illegal memory reads in LoadDirectStriped() from file block_load.cuh
// called from thrust::cuda_cub::cub::DeviceReduceKernel<>
//
// Changing the OffsetT to long all around thrust library fixes this.
//

#include
#include <assert.h>
#include <unistd.h>
#include <stdlib.h>
#include <thrust/system/cuda/execution_policy.h>
#include <thrust/count.h>

#define CUDA_CALL(X) { cudaError_t err = X; if (err != cudaSuccess) { throw err; } }

size_t fill(char *src, size_t size)
{
assert(size > 2);
size_t cnt = 0;
for (size_t i = 0; i < size - 2; i++) {
// unsigned x = (rand() % 16);
unsigned x = 0;
src[i] = (x == 0 ? '\n' : 'A' + (char)x - (char)1);
if (x == 0) cnt++;
}
src[size-2] = '\n';
src[size-1] = '\0';
return cnt + 1;
}

int main(int argc, char **argv)
{
try {
assert(argc > 1);
char *dev_src = NULL;
size_t size = (size_t)std::stoull(argv[1]);
char *src = (char *)malloc(size);
size_t N = fill(src, size);
CUDA_CALL(cudaMalloc(&dev_src, size));
CUDA_CALL(cudaMemcpy(dev_src, src, size, cudaMemcpyHostToDevice));
size_t num_rows = (size_t)thrust::count(thrust::device, dev_src, dev_src + size, '\n');
assert(num_rows == N);
}
catch(cudaError_t &err) {
std::cerr << "CUDA ERROR: " << cudaGetErrorString(err) << std::endl;
return 1;
}
catch(std::exception &ex) {
std::cerr << "std::exception: " << ex.what() << std::endl;
return 1;
}
catch (...) {
std::cout << "UNKNOWN EXCEPTION" << std::endl;
return 1;
}
return 0;
}
`

Are there any plans to tune or fix this?

griwes · 2019-05-26T20:48:00Z

Yeah. This is one of the longstanding known bugs; I'm in the process of addressing and verifying the fixes for as many of those as I can find. Since it's sometimes hard to write synthetic test cases that detect this without ridiculous amounts of memory necessary, we'll be grateful for a list of algorithms in which you've encountered a problem with input sizes like this.

alexey-medvedev-fastdata-io · 2019-05-27T18:18:40Z

Michał,

As far as can guess (or suspect), the problem resides (at least) at any procedure which defines typedef int OffsetT;. Searching through the code it gives 40 occurrences in thrust-1.9.5 inside these functions:

cub::DeviceRunLengthEncode::Encode()
cub::DeviceRunLengthEncode::NonTrivialRuns()
cub::DeviceReduce::Reduce()
cub::DeviceReduce::Sum()
cub::DeviceReduce::Min()
cub::DeviceReduce::ArgMin()
cub::DeviceReduce::Max()
cub::DeviceReduce::ArgMax()
cub::DeviceReduce::ReduceByKey()
cub::DeviceSegmentedReduce::Reduce()
cub::DeviceSegmentedReduce::Sum()
cub::DeviceSegmentedReduce::Min()
cub::DeviceSegmentedReduce::ArgMin()
cub::DeviceSegmentedReduce::Max()
cub::DeviceSegmentedReduce::ArgMax()
cub::DeviceScan::ExclusiveSum()
cub::DeviceScan::ExclusiveScan()
cub::DeviceScan::InclusiveSum()
cub::DeviceScan::InclusiveScan()
cub::DevicePartition::Flagged()
cub::DeviceSelect::If()
cub::DeviceSegmentedRadixSort::SortPairs()
cub::DeviceSegmentedRadixSort::SortPairsDescending()
cub::DeviceSegmentedRadixSort::SortKeys()
cub::DeviceSegmentedRadixSort::SortKeysDescending()
cub::DeviceSelect::Flagged()
cub::DeviceSelect::If()
cub::DeviceSelect::Unique()

The OffsetT usage seems similar (at first sight) in all these cases so I suspect they all are crashing the same way.

I am not sure there was no other problems of the kind in code, but typedef int OffsetT; is probably the first candidate for a critical review.

Moreover, there are also typedefs:

system/cuda/detail/copy_if.h:708: typedef int size_type;
system/cuda/detail/reduce_by_key.h:982: typedef int size_type;
system/cuda/detail/util.h:703: typedef int difference_type;
system/cuda/detail/unique_by_key.h:731: typedef int size_type;
system/cuda/detail/unique.h:642: typedef int size_type;
system/cuda/detail/reduce.h:67: typedef int GridSizeType;

which seem also suspicious, but I'm not sure yet that they cause problems.

I can drop in an idea to create a special edition of all unit tests with all the arrays extended up to 2^32, and try to run a "smoke test" on this modification. Not sure it makes sense here, but it is just an idea.

Please keep us informed on the progress with this issue.

Regards,
FASTDATA.IO team

alexey-medvedev-fastdata-io · 2019-11-19T09:18:00Z

Are there any updates on this issue?

Regards,
Alexey Medvedev
FASTDATA.IO team

alliepiper · 2020-10-13T16:29:32Z

This is related to NVIDIA/cccl#744.

miscco · 2023-02-24T10:35:27Z

I will close this as a duplicate of NVIDIA/cccl#744. We are aware and have recently started working on expanding algorithms to larger ranges

jrhemstad added this to CCCL Aug 11, 2022

jrhemstad added the thrust label Feb 22, 2023

miscco closed this as completed Feb 24, 2023

github-project-automation bot moved this to Done in CCCL Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thrust::count() total malfunction for arrays of 2^31 and more elements #989

thrust::count() total malfunction for arrays of 2^31 and more elements #989

alexey-medvedev-fastdata-io commented May 26, 2019 •

edited

Loading

griwes commented May 26, 2019

alexey-medvedev-fastdata-io commented May 27, 2019 •

edited

Loading

alexey-medvedev-fastdata-io commented Nov 19, 2019

alliepiper commented Oct 13, 2020

miscco commented Feb 24, 2023

thrust::count() total malfunction for arrays of 2^31 and more elements #989

thrust::count() total malfunction for arrays of 2^31 and more elements #989

Comments

alexey-medvedev-fastdata-io commented May 26, 2019 • edited Loading

griwes commented May 26, 2019

alexey-medvedev-fastdata-io commented May 27, 2019 • edited Loading

alexey-medvedev-fastdata-io commented Nov 19, 2019

alliepiper commented Oct 13, 2020

miscco commented Feb 24, 2023

alexey-medvedev-fastdata-io commented May 26, 2019 •

edited

Loading

alexey-medvedev-fastdata-io commented May 27, 2019 •

edited

Loading