Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

I did what you suggested and I discovered a very high amount of needed memory from the cub::DeviceHistogram::HistogramEven routine. #46

Closed
dumerrill opened this issue Jun 14, 2016 · 1 comment

Comments

@dumerrill
Copy link
Contributor

I did what you suggested and I discovered a very high amount of needed memory from the cub::DeviceHistogram::HistogramEven routine. It seems to want more memory than a size_t (assuming it is not larger than 64bit can handle).

The histgram routine says, it needs about 18446744073709552000 bytes which are about 16 exabyte of memory. Since this number is larger than size_t, I assume there was a memory overflow somewhere in the calculation of the needed memory space.

I attached a full program this time and hope you can recreate the problem.

struct floatExp {
    float a, b, c;
};

int main() {
    cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 1024 * 1024 * 1024);

    unsigned int const sz = 65536 * 2;

    unsigned int *idx_d = NULL;
    unsigned int *newIdx_d = NULL;
    void *tmpSort_d = NULL, *tmpHist_d = NULL;
    size_t tmpSortSz = 0, tmpHistSz = 0;


    cudaMalloc(reinterpret_cast<void **>(&idx_d), sizeof(unsigned int) * sz);

    cudaMalloc(reinterpret_cast<void **>(&newIdx_d), sizeof(unsigned int) * sz);

    float *data_d = NULL;
    float *newData_d = NULL;
    cudaMalloc(reinterpret_cast<void **>(&data_d), sizeof(float) * sz * 3);
    cudaMalloc(reinterpret_cast<void **>(&newData_d), sizeof(float) * sz * 3);

    unsigned int totalBlockSz = 801u*801u*401u+2u;


    unsigned int *idx_h = new unsigned int[sz];
    for (unsigned int i = 0; i < sz; ++i) {
        idx_h[i] = sz - i;
    }
    cudaMemcpy(idx_d, idx_h, sizeof(unsigned int) * sz, cudaMemcpyHostToDevice);


    unsigned int *hist_d = NULL;
    cudaMalloc(reinterpret_cast<void **>(&hist_d), sizeof(unsigned int) * (sz + 1));



    cub::DeviceRadixSort::SortPairs(tmpSort_d, tmpSortSz, idx_d, newIdx_d,
        reinterpret_cast<floatExp *>(const_cast<float *>(data_d)), reinterpret_cast<floatExp *>(newData_d), sz);
    cudaDeviceSynchronize();

    printf("TotalBlockSz %i\n", totalBlockSz);
    printf("sz %u \n", sz);

    cub::DeviceHistogram::HistogramEven(tmpHist_d, tmpHistSz, newIdx_d, hist_d,
        static_cast<int>(totalBlockSz + 2), 0u, totalBlockSz + 1u, static_cast<int>(sz));

    printf("tmpHistSz %f\n", static_cast<float>(tmpHistSz));
    printf("tmpHistSz %lu\n", tmpHistSz);
    cudaDeviceSynchronize();


    cudaDeviceSynchronize();
    cudaError_t err = cudaGetLastError();
    printf("Last Cuda error was: %i\n", err);

    system("pause");

    cudaFree(idx_d);
    cudaFree(newIdx_d);
    cudaFree(data_d);
    cudaFree(newData_d);
    cudaFree(hist_d);

    return 0;

}
dumerrill added a commit that referenced this issue Oct 11, 2016
cub::DeviceHistogram::HistogramEven routine.)
@dumerrill
Copy link
Contributor Author

Fixed in the above commit for v1.6.2

dumerrill added a commit that referenced this issue Aug 11, 2017
cub::DeviceHistogram::HistogramEven routine.)
Former-commit-id: 04c172b07a02b013e468baa90f6a58f8a6926c52
sarvex pushed a commit to sarvex/cub that referenced this issue Apr 27, 2023
cub::DeviceHistogram::HistogramEven routine.)
Former-commit-id: 04c172b07a02b013e468baa90f6a58f8a6926c52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant