This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 756
(cudaErrorInvalidDevice) when trying to perform a thrust::reduce #1371
Labels
type: bug: functional
Does not work as intended.
Comments
Traced the failure to this line. The However, when I remove the
|
Heh, wow. Thanks for tracking that down. Would you be able to make a pull request to https://github.com/NVIDIA/cub that removes those annotations? These functions don't really need to be inlined. Somewhat related to NVIDIA/cccl#754. |
alliepiper
added a commit
to alliepiper/cub
that referenced
this issue
Jan 28, 2021
These functions started producing invalid results in CUDA 11 under certain circumstances (see issue NVIDIA/thrust#1371), and removing these hints fixes the issue.
This was referenced Jan 28, 2021
alliepiper
added a commit
to alliepiper/cub
that referenced
this issue
Feb 8, 2021
These functions started producing invalid results in CUDA 11 under certain circumstances (see issue NVIDIA/thrust#1371), and removing these hints fixes the issue. NVIDIA#260 reported that other functions in this file were also causing the same issue. These methods are not perf critical -- they don't need to be inlined.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm seeing the below error (cudaErrorInvalidDevice ) when trying to perform a thrust::reduce.
I'm not sure where I should be looking as I'm not generating any device ordinal's myself, just using this call in the snippet below.
This is within a python cextension and I do use pytorch to set the device. But in this case I don't set it to anything other than 0. I've checked that cudaGetDevice matches the pointer device (and that all input pointers are on device '0').
This worked with previous versions of cub/CUDA 10.
https://github.com/limbo018/DREAMPlace/blob/0035d8a8a40729d414c84d52464b459d46680db9/dreamplace/ops/global_swap/src/global_swap_cuda_kernel.cu#L1275
RuntimeError: after reduction step 1: cudaErrorInvalidDevice: invalid device ordinal
docker run -it gitlab-master.nvidia.com:5005/rkirby/dreamgym:thrustcubdebug /placement/debug_repro.sh
Should be enough to reproduce. I'm on
Driver Version: 450.51.06 CUDA: 11.0
with a single V100. If you launch into a shell, you can look in that shell script and I have pointers to source and how to rebuild. Let me know if you need permission to pull from that registry.The text was updated successfully, but these errors were encountered: