Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Recent change in device_spmv.cuh causing incorrect results #195

Closed
cjnolet opened this issue Aug 21, 2020 · 4 comments
Closed

Recent change in device_spmv.cuh causing incorrect results #195

cjnolet opened this issue Aug 21, 2020 · 4 comments
Labels
type: bug: functional Does not work as intended. unverified Cannot be reproduced or confirmed.

Comments

@cjnolet
Copy link

cjnolet commented Aug 21, 2020

I've been testing CUDA11 integration with a recent release (7.8.0) of Cupy and found that cub's spmv was returning incorrect results. The results almost appear as though the data is being corrupted (potentially from incorrect array indexing). cupy/cupy#3822 contains the detailed summary of the problem.

Here's a simple example of what we're seeing. a is using cub's device_spmv and b is not:

>>> a.T.dot(cupy.ones(a.shape[0])).reshape(1, a.shape[1])
array([[0.00000000e+000, 0.00000000e+000, 4.24399158e-314, ...,
        1.10383165e-307, 1.10512862e-307, 1.10636362e-307]])
>>> b.T.dot(cupy.ones(b.shape[0])).reshape(1, b.shape[1])
array([[3056.30047519, 3058.09608272, 3009.4495842 , ..., 1505.96878088,
        1460.47145535, 1505.26624145]])

The problem seems isolated to this commit: NVlabs@b2e64cf and when that commit is reverted, the example above returns correct results (again, a is using device_spmv and b is not):

>>> a.T.dot(cupy.ones(a.shape[0])).reshape(1, a.shape[1])
array([[3058.34350657, 3071.2141306 , 3059.92383792, ..., 1466.52647562,
        1441.08203412, 1479.81493357]])
>>> b.T.dot(cupy.ones(b.shape[0])).reshape(1, b.shape[1])
array([[3058.34350657, 3071.2141306 , 3059.92383792, ..., 1466.52647562,
        1441.08203412, 1479.81493357]])

I'm not sure if there are any automated tests for device_spmv.cuh internally, but I quickly grepped the tests in this repository for spmv and didn't get any results.

@cjnolet cjnolet changed the title Recent change in device_spmv.cuh causing failures in cupy Recent change in device_spmv.cuh causing incorrect results Aug 21, 2020
@cjnolet
Copy link
Author

cjnolet commented Aug 21, 2020

Also referencing #161 as this seems to also indicate this is a problem.

@alliepiper
Copy link
Collaborator

Marking unverified -- As mentioned on the PR, I'll need a way to reproduce the error with CUB-only code to create a test case.

@alliepiper
Copy link
Collaborator

Closing for lack of response -- feel free to reopen with more information if you still need this.

@alliepiper
Copy link
Collaborator

For posterity -- see comment #196 (comment) -- looks like this was fixed by the cub::DivideAndRoundUp patch.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: bug: functional Does not work as intended. unverified Cannot be reproduced or confirmed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants