-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIP: CMake work on tests and ETI to truely enable the backend #820
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucbv Several of the BLAS tests in unit_test/hip have Cuda in the filename and should be renamed with Hip (e.g. unit_test/hip/Test_Cuda_Blas2_gemv.cpp)
569fdd0
to
1f0e85a
Compare
@ndellingwood this was meant to be WIP so I can show @jjwilke where I'm at and fix some CMake stuff. Once this is done we should be about finished with HIP backend though : ) |
@brian-kelley just double checking, you are doing that based by working on top of this PR or are you working on top of develop? |
@lucbv I started out today working from develop, but I can rebase on your branch. Is the plan to only merge HIP work into develop when it all works? If we're ok with having different pieces go in without 100% working, I think this PR could just go in. (by that, I mean that adding files that only compile with HIP enabled shouldn't break anyone's build) |
@brian-kelley |
@lucbv I will just finish the branch I have now, then when Jeremy's ETI stuff goes in, we can combine our stuff and rebase on that. We could talk about this tomorrow. Today I got through batched, blas, graph, common (sparse was going to be the trickiest). |
@ndellingwood this version should be much closer to the final one. |
@ndellingwood thanks for the look, it's actually a good thing, I just used a script to automatically get all the Cuda unit-test into the HIP folder and then changed the header inclusion. Maybe I should actually do a grep for Cuda in the HIP unit-test folder just to see what comes out of it? |
d8229bd
to
7bdfe10
Compare
@srajama1 @ndellingwood @brian-kelley @jjwilke Regarding what works and what does not after this PR:
One thing to do after this PR will be to modify the spot-check so that it can run on caraway. While it cannot check for passing tests, just checking for compilation would already be helpful. Especially so since rocm/3.8.0 is using clang-11 as compiler and we do not test with it (currently kokkos-dev2 tests with clang 8.0 and clang 9.0.0). Please let me know if this makes sense or if you have a strong opinion like: Blas or Sparse should pass before merging? |
I would say, as long as it doesn't break serial/cuda/openmp we should just merge anything that gets closer to HIP working. It's easier than having several diverging branches of develop with the updates. We just shouldn't advertise to users that they should try hip yet :) For question 6 you had, I just double checked in all the wiki examples that they are calling HIP, which is the DefaultExecutionSpace in a HIP+Serial build. For perf tests, all the graph ones seem to work completely. I just tried several of the D1 and D2 coloring algorithms on some pretty big graphs, and mis2 also works. Coloring is producing a plausible color array and mis2 can verify the output. I did miss the D1 coloring when adding HIP support but I have a PR #841 for that. |
Yeah that's my philosophy too, let's see what others say. ####################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HIP device should be HIP, not HIPSpace (see 2 comments)
@brian-kelley I find the exec space issue and enable common and graph tests! |
by ETI and tested in the unit-test using the Kokkos::Experimental::HIP and Kokkos::Experimental::HIPSpace execution and memory spaces. Status of the unit-tests: Batched -> does not build Blas -> builds and Blas1 test pass, others fail Common -> does not build Graph -> does not build Sparse -> builds but does few tests run before crash Fix blas and blas3 perf_test: these perf_test use parallel_for to run multiple tests at once while on host and call a host only function to do so. These are now being skipped when KOKKOS_ENABLE_HIP is on. All performance test are building.
TestExecSpace was set to HIPSpace instead of HIP creating problems at build and execution time in unit_test. We can now enable common and graph unit-tests for build. common_hip is passing without failures.
The min launch bound was set to 2 and is now set to 0. This new setting allows the BLAS unit-test to successfully run to completion.
With the new modifications the spot-check can be run on caraway04 where BLAS and Common unit-tests will be built and tested as well as the wiki tests.
With a bit more work done in that last couple commits we now have the spot-check building and testing BLAS and Common unit-tests and wiki examples successfully on caraway04. caraway04 spot-check####################################################### kokkos-dev2 spot-check####################################################### |
All , @lucbv @brian-kelley @jjwilke : First, thanks for all the progress on HIP backend. Let me review the actual changes next. In principle I would like incremental changes over one massive change. As long as we can clearly identify the failures (turn them off) create an issue with all of them so we can make progress on those, we can push this to develop. I do not want to advertise HIP support unless we have reached a point we have clean tests (except when there are external issues from TPLs or compilers and we can disable tests for those). On to code now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code wise looks ok. My major comments :
- please file issues or one master issue for all the problems even if there is a workaround now.
- can we avoid 100+ files for every backend ?
@@ -292,7 +292,7 @@ void __do_trmm_serial_batched(options_t options, trmm_args_t trmm_args) { | |||
return; | |||
} | |||
|
|||
#if !defined(KOKKOS_ENABLE_CUDA) | |||
#if !defined(KOKKOS_ENABLE_CUDA) && !defined(KOKKOS_ENABLE_HIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these basically "host only" checks ? Then we can add one check as host only and not worry about this for Intel SYCL and OpenMP target backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we can definitely create a KOKKOSKERNELS_HOST_ONLY macro that stores that information and is used in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see issue #843
// yield an optimal_num_blocks=0 which means no ressources | ||
// are allocated... Switching to LaunchBounds<384,2> fixes | ||
// that problem but I'm not sure if that it a good perf | ||
// parameter or why it is set to 2 for Cuda? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File a Kokkos issue ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll talk about it with Damien and Daniel, some parameters are inherently different with AMD like the wrap size so this could be something that needs to be different depending on the architecture?
# COMPONENTS graph | ||
# ) | ||
|
||
#currently float 128 test is not working. So common tests are explicitly added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need to be in list of issues to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srajama1 Note that float128 tests are also diabled for CUDA, so it's generally just not supported now on GPU. I think because of that this should be lower priority than the other tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it.
@@ -0,0 +1,3 @@ | |||
#include "Test_HIP.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these test files needed ? I am worried we are going in the direction of the ETI again. This is going to blow up with OpenMP target and Intel SYCL. Can we do same trick as we did for ETI, where we generate these files if unit tests are ON ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes something like that or having all the headers for the different backends in a single file and use guards to enable them selectively at compile time with different executably names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see issue #845
Can I merge this with the assumption above comments will be filed as an issue ? It would be nice for this to go in the next release. |
Yes feel free to merge, I am filling new issues for the points raised here. |
The goal is to make sure the HIP backend in properly instanciated
by ETI and tested in the unit-test using the Kokkos::Experimental::HIP
and Kokkos::Experimental::HIPSpace execution and memory spaces.