Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

(cudaErrorInvalidDevice) when trying to perform a thrust::reduce #1371

Closed
ArEsKay3 opened this issue Jan 5, 2021 · 2 comments · Fixed by NVIDIA/cub#259
Closed

(cudaErrorInvalidDevice) when trying to perform a thrust::reduce #1371

ArEsKay3 opened this issue Jan 5, 2021 · 2 comments · Fixed by NVIDIA/cub#259
Labels
type: bug: functional Does not work as intended.

Comments

@ArEsKay3
Copy link

ArEsKay3 commented Jan 5, 2021

I'm seeing the below error (cudaErrorInvalidDevice ) when trying to perform a thrust::reduce.
I'm not sure where I should be looking as I'm not generating any device ordinal's myself, just using this call in the snippet below.

This is within a python cextension and I do use pytorch to set the device. But in this case I don't set it to anything other than 0. I've checked that cudaGetDevice matches the pointer device (and that all input pointers are on device '0').

This worked with previous versions of cub/CUDA 10.

https://github.com/limbo018/DREAMPlace/blob/0035d8a8a40729d414c84d52464b459d46680db9/dreamplace/ops/global_swap/src/global_swap_cuda_kernel.cu#L1275

int compute_max_num_nodes_per_bin(const DetailedPlaceDB<T>& db) {
  int num_bins = db.num_bins_x * db.num_bins_y;
  int* node_count_map = nullptr;
  allocateCUDA(node_count_map, num_bins, int);
  checkCUDA(cudaMemset(node_count_map, 0, sizeof(int) * num_bins));
  compute_num_nodes_in_bins<<<ceilDiv(db.num_movable_nodes, 256), 256>>>(
      db, node_count_map);
  checkCUDA(cudaDeviceSynchronize());
  int max_num_nodes_per_bin =
      thrust::reduce(thrust::device, node_count_map, node_count_map + num_bins,
                     0, thrust::maximum<int>());
  checkCUDA(cudaDeviceSynchronize());
  destroyCUDA(node_count_map);
  return max_num_nodes_per_bin;
}

RuntimeError: after reduction step 1: cudaErrorInvalidDevice: invalid device ordinal

docker run -it gitlab-master.nvidia.com:5005/rkirby/dreamgym:thrustcubdebug /placement/debug_repro.sh

Should be enough to reproduce. I'm on Driver Version: 450.51.06 CUDA: 11.0 with a single V100. If you launch into a shell, you can look in that shell script and I have pointers to source and how to rebuild. Let me know if you need permission to pull from that registry.

@ArEsKay3
Copy link
Author

ArEsKay3 commented Jan 14, 2021

Traced the failure to this line. The device variable looks ok (it's zero) but the DeviceCount function was inlined so I couldn't tell why it's returning what I assume is -1.

https://github.com/NVIDIA/cub/blob/c3be9a94273b5049520aacc7db00c738668aaa3f/cub/util_device.cuh#L299

However, when I remove the __forceinline__ on both DeviceCount and DeviceCountUncached, the failure goes away.

//CUB_RUNTIME_FUNCTION __forceinline__ int DeviceCountUncached()
CUB_RUNTIME_FUNCTION int DeviceCountUncached()
...
// CUB_RUNTIME_FUNCTION __forceinline__ int DeviceCount()
CUB_RUNTIME_FUNCTION int DeviceCount()

@alliepiper
Copy link
Collaborator

Heh, wow. Thanks for tracking that down.

Would you be able to make a pull request to https://github.com/NVIDIA/cub that removes those annotations? These functions don't really need to be inlined.

Somewhat related to NVIDIA/cccl#754.

@alliepiper alliepiper added the type: bug: functional Does not work as intended. label Jan 21, 2021
alliepiper added a commit to alliepiper/cub that referenced this issue Jan 28, 2021
These functions started producing invalid results in CUDA 11 under
certain circumstances (see issue NVIDIA/thrust#1371), and removing
these hints fixes the issue.
alliepiper added a commit to alliepiper/cub that referenced this issue Feb 8, 2021
These functions started producing invalid results in CUDA 11 under
certain circumstances (see issue NVIDIA/thrust#1371), and removing
these hints fixes the issue.

NVIDIA#260 reported that other functions in this file were also
causing the same issue.

These methods are not perf critical -- they don't need to be inlined.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: bug: functional Does not work as intended.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants