Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP backend general issue #806

Closed
lucbv opened this issue Sep 10, 2020 · 8 comments
Closed

HIP backend general issue #806

lucbv opened this issue Sep 10, 2020 · 8 comments
Assignees

Comments

@lucbv
Copy link
Contributor

lucbv commented Sep 10, 2020

This issue is meant to centralize issues and work being done to integrate the HIP backend in Kokkos-Kernels.
Ideally I would like other issues to be opened for specific technical issues to be opened and then referenced here so that users and developers would know what the known issues are and who is working on them.

@lucbv
Copy link
Contributor Author

lucbv commented Sep 12, 2020

Here is a list of the current issues observed while building with the HIP backend:

Now that the ETI and tests are merged (or are about to be), we can make a list of what still needs to be done to get the backend fully functional.

HIP spot-check enabled tests

  • BLAS
  • batchedDLA
  • Sparse
  • Graph
  • Common

HIP tests currently failing

Issues in batchedDLA

  1. batched_scalar_team_trsm_l_u_nt_n_dcomplex_dcomplex fails with a bunch of values == 0 which seems to indicate a memory issue with complex?
  2. batched_scalar_team_trsm_l_u_t_n_dcomplex_dcomplex aborts on Memory access fault by GPU
  3. batched_scalar_team_trsm_l_u_nt_n_dcomplex_double same as dcomplex_dcomple version
  4. batched_scalar_team_trsm_l_u_t_n_dcomplex_double same as dcomplex_dcomple version
  5. batched_scalar_teamvector_qr_with_columnpivoting_double aborts on Device::callbackQueue aborting with status: 0x29
  6. batched_scalar_teamvector_solve_utv_double aborts on Memory access fault by GPU
  7. batched_scalar_teamvector_solve_utv2_double aborts on Memory access fault by GPU after failing with values == 0
  8. batched_scalar_teamvector_utv_double aborts on Memory access fault by GPU

Issues in Graph (offset==int and offset==size_t fail in the same way)

  1. graph_graph_color_double_int_int_TestExecSpace aborts on Memory access fault by GPU
  2. graph_graph_color_distance2_double_int_int_TestExecSpace aborts on Memory access fault by GPU
  3. graph_graph_color_deterministic_double_int_int_TestExecSpace aborts on Device::callbackQueue aborting with status: 0x1016

Issues in Sparse (offset==int and offset==size_t fail in the same way)

  1. sparse_gauss_seidel_asymmetric_rank1_kokkos_complex_double_int_int_TestExecSpace aborts on Memory access fault by GPU, Note: same happens with rank2 and/or symmetric tests
  2. sparse_balloon_clustering_double_int_int_TestExecSpace aborts on Memory access fault by GPU, Note: happens randomly so quick possibly related to race condition?
  3. sparse_replaceSumIntoLonger_double_int_int_TestExecSpace fails with values == 0
  4. sparse_replaceSumIntoLonger_kokkos_complex_double_int_int_TestExecSpace aborts on Device::callbackQueue aborting with status: 0x1016
  5. sparse_replaceSumInto_kokkos_complex_double_int_int_TestExecSpace aborts on Device::callbackQueue aborting with status: 0x1016
  6. sparse_spgemm_jacobi_kokkos_complex_double_int_size_t_TestExecSpace aborts on Device::callbackQueue aborting with status: 0x29
  7. sparse_spmv_kokkos_complex_double_int_int_TestExecSpace aborts on Device::callbackQueue aborting with status: 0x1016
  8. sparse_spmv_mv_kokkos_complex_double_int_int_LayoutLeft_TestExecSpace aborts on Device::callbackQueue aborting with status: 0x1016

@ndellingwood
Copy link
Contributor

@lucbv I'll add amd/caraway options for the testing scripts this week

@lucbv
Copy link
Contributor Author

lucbv commented Sep 14, 2020

Thanks, I have shared my current configuration on the internal repo (see the Technical tips section on the homepage).
One thing that I need to do is ask what extra flags are used by Kokkos for AMG builds, currently I removed all the warning/error flags as Kokkos would not build otherwise.

@brian-kelley
Copy link
Contributor

@lucbv I have a branch now that passes unit tests for CUDA, Serial, OpenMP but will (hopefully) also work on HIP when then unit tests are built for it. The only things still hardcoded for CUDA are things involving cusparse, cublas, graphs and streams. There are a couple places where __CUDA_ARCH__ is used but that is still defined for HIP so it should be OK.

@lucbv
Copy link
Contributor Author

lucbv commented Oct 10, 2020

@brian-kelley thanks for looking at this, I am still waiting on rocm/3.8.0 tests to move with the ETI/tests PR as I feel it might fix quite a few things. Hopefully I can get that done next week but I'm not sure.
If your PR is ready feel free to put me as a reviewer, I will finish my review of the coarsening PR this weekend.

@lucbv
Copy link
Contributor Author

lucbv commented Apr 28, 2021

Using the latest rocm LLVM compiler the new list of failing tests is much shorter:

Graph

[ RUN ] hip.graph_graph_color_deterministic_double_int_int_TestExecSpace
:0:rocdevice.cpp :2325: 378970770383 us: Device::callbackQueue aborting with status: 0x1016
Aborted (core dumped)
[ RUN ] hip.graph_graph_color_double_int_size_t_TestExecSpace
:0:rocdevice.cpp :2325: 379268378835 us: Device::callbackQueue aborting with status: 0x1016
Aborted (core dumped)

Sparse

Some failures related to complex atomics, updates in Kokkos Core should resolve these issues.

@brian-kelley
Copy link
Contributor

More things are working now - with rocm 4.5 and MI100 (on Caraway) all tests pass except for structured SpMV (hip.sparse_spmv_struct_double_int_size_t_TestExecSpace).

@lucbv
Copy link
Contributor Author

lucbv commented Dec 19, 2022

At this point we are testing HIP in our CI, everything is building correct : )

@lucbv lucbv closed this as completed Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants