Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hip test failures with complex_double: rocm/4.2.0 and MI50 arch #1032

Closed
ndellingwood opened this issue Jun 30, 2021 · 3 comments
Closed

Hip test failures with complex_double: rocm/4.2.0 and MI50 arch #1032

ndellingwood opened this issue Jun 30, 2021 · 3 comments

Comments

@ndellingwood
Copy link
Contributor

This is a summary of unit tests are failing with the Hip backend with complex_double scalar types enabled:

Kokkos SHA: kokkos/kokkos@27adabb
KokkosKernels SHA: 2ada47e

BLAS:

1: [  PASSED  ] 129 tests.
1: [  FAILED  ] 2 tests, listed below:
1: [  FAILED  ] hip.gemv_complex_double
1: [  FAILED  ] hip.gemm_complex_double

SPARSE:

3: [  FAILED  ] hip.sparse_spmv_kokkos_complex_double_int_int_TestExecSpace
3: [  FAILED  ] hip.sparse_spmv_kokkos_complex_double_int_size_t_TestExecSpace
3: [  FAILED  ] hip.sparse_spmv_mv_kokkos_complex_double_int_int_LayoutLeft_TestExecSpace
3: [  FAILED  ] hip.sparse_spmv_mv_kokkos_complex_double_int_size_t_LayoutLeft_TestExecSpace
3: [  FAILED  ] hip.sparse_sptrsv_kokkos_complex_double_int_int_TestExecSpace
3: [  FAILED  ] hip.sparse_sptrsv_kokkos_complex_double_int_size_t_TestExecSpace

Sample outputs:

hip.gemv_complex_double

1: /ascldap/users/ndellin/kokkos-kernels/unit_test/blas/Test_Blas2_gemv.hpp:98: Failure
1: Value of: 0
1: Expected: numErrors
1: Which is: 13
1: Nonconst input, 13x13, alpha = (3,0), beta = (5,0), mode N: gemv incorrect
1: /ascldap/users/ndellin/kokkos-kernels/unit_test/blas/Test_Blas2_gemv.hpp:110: Failure
1: Value of: 0
1: Expected: numErrors
1: Which is: 13
1: Const vector input, 13x13, alpha = (3,0), beta = (5,0), mode N: gemv incorrect
1: /ascldap/users/ndellin/kokkos-kernels/unit_test/blas/Test_Blas2_gemv.hpp:123: Failure
1: Value of: 0
1: Expected: numErrors
1: Which is: 13
1: Const matrix/vector input, 13x13, alpha = (3,0), beta = (5,0), mode N: gemv incorrect
...

hip.gemm_complex_double

1: Result: 5.397791e+00 2.949100e-11
1: /ascldap/users/ndellin/kokkos-kernels/unit_test/blas/Test_Blas3_gemm.hpp:150: Failure
1: Value of: (diff_C_average < 1.05*diff_C_expected )
1:   Actual: false
1: Expected: true
1: Result: 6.253012e+00 2.949100e-11
1: /ascldap/users/ndellin/kokkos-kernels/unit_test/blas/Test_Blas3_gemm.hpp:150: Failure
1: Value of: (diff_C_average < 1.05*diff_C_expected )
1:   Actual: false
1: Expected: true

hip.sparse_spmv_kokkos_complex_double_int_int_TestExecSpace

...
3: KokkosSparse::Test::spmv: 1886 errors of 10000 with params: 1.000000 0.000000
3: /ascldap/users/ndellin/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:160: Failure
3: Value of: num_errors==0
3:   Actual: false
3: Expected: true
3: KokkosSparse::Test::spmv: 1845 errors of 10000 with params: 1.000000 1.000000
3: /ascldap/users/ndellin/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:160: Failure
3: Value of: num_errors==0
3:   Actual: false
3: Expected: true

Similarly,
hip.sparse_spmv_kokkos_complex_double_int_size_t_TestExecSpace
hip.sparse_spmv_mv_kokkos_complex_double_int_int_LayoutLeft_TestExecSpace
hip.sparse_spmv_mv_kokkos_complex_double_int_size_t_LayoutLeft_TestExecSpace

hip.sparse_sptrsv_kokkos_complex_double_int_int_TestExecSpace

3: SUPERNODAL_DAG
3: Supernode Tri Solve FAILURE : (4.43654,0) vs.5
3: SUPERNODAL_DAG
3: /ascldap/users/ndellin/kokkos-kernels/unit_test/sparse/Test_Sparse_sptrsv.hpp:1031: Failure
3: Value of: sum == scalar_t(X.extent(0))
3:   Actual: false
3: Expected: true

Similarly
hip.sparse_sptrsv_kokkos_complex_double_int_size_t_TestExecSpace

Reproducer instructions (Caraway testbed, MI50 nodes):

module load cmake/3.19.3 rocm/4.2.0

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-devices=Hip,Serial --arch=VEGA906 --compiler=hipcc --cxxflags="-O3  " --cxxstandard="14"  --with-hip --kokkos-path=$KOKKOS_PATH --kokkoskernels-path=$KOKKOSKERNELS_PATH --with-scalars='double,complex_double' --with-ordinals=int --with-offsets=int,size_t --with-layouts=LayoutLeft
@brian-kelley
Copy link
Contributor

brian-kelley commented Jul 7, 2021

See kokkos/kokkos#3974 for more info of underlying cause in Kokkos (>64 bit atomics just haven't been implemented yet for HIP).

update: I just re-ran the reproducer for that issue and atomic_add(complex<double>*, ...) is still giving incorrect results. So I'm pretty sure that's still the reason for gemv/spmv failing here, since those both use atomics. I'm no expert in supernodal sptrsv but it uses atomic-trait views in the solve, so I think it's the same issue there.

@brian-kelley
Copy link
Contributor

@ndellingwood With the merge of kokkos/kokkos#4159, all of these are now fixed (just tested develop, caraway MI50, rocm 4.2). I guess we could now add complex to the nightly HIP build.

@ndellingwood
Copy link
Contributor Author

Thanks @brian-kelley , I'll update the build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants