-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial TRMM & TRTRI eti specializations as serial batched routines #697
Conversation
Spot-check on White: #######################################################
PASSED TESTS
#######################################################
cuda-10.1.105-Cuda_OpenMP-release build_time=335 run_time=107
cuda-10.1.105-Cuda_Serial-release build_time=375 run_time=146
cuda-9.2.88-Cuda_OpenMP-release build_time=368 run_time=120
cuda-9.2.88-Cuda_Serial-release build_time=340 run_time=160
gcc-6.4.0-OpenMP_Serial-release build_time=133 run_time=129
gcc-7.2.0-OpenMP-release build_time=99 run_time=44
gcc-7.2.0-OpenMP_Serial-release build_time=138 run_time=128
gcc-7.2.0-Serial-release build_time=91 run_time=80
ibm-16.1.0-Serial-release build_time=602 run_time=89 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e10harvey They look good to me. Just have one minor suggestion for file names.
And since you have the serial implementations in the batched
directory, should we also add unit tests for the batched mode to unit_test/batched
?
#define __KOKKOSBATCHED_TRMM_SERIAL_IMPL_HPP__ | ||
|
||
#include "KokkosBatched_Util.hpp" | ||
#include "KokkosBatched_trmm_serial_internal.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e10harvey I suggest the file names follow the naming convention in the src/batched/
directory. For example:
KokkosBatched_trmm_serial_internal.hpp
--> KokkosBatched_Trmm_Serial_Internal.hpp
KokkosBatched_trtri_decl.hpp
--> KokkosBatched_Trtri_Decl.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vqd8a thanks for the feedback. I renamed the files to match the naming convention. I see that the batched unit tests are currently disabled; would you still like to see those batched unit tests added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e10harvey Thanks for fixing. The batched unit tests were enabled in #686. You should merge the develop
branch to yours.
Spot-checkswhite#######################################################
PASSED TESTS
#######################################################
cuda-10.1.105-Cuda_OpenMP-release build_time=1187 run_time=297
cuda-10.1.105-Cuda_Serial-release build_time=1175 run_time=413
cuda-9.2.88-Cuda_OpenMP-release build_time=1165 run_time=365
cuda-9.2.88-Cuda_Serial-release build_time=1162 run_time=438
gcc-6.4.0-OpenMP_Serial-release build_time=583 run_time=358
gcc-7.2.0-OpenMP-release build_time=521 run_time=149
gcc-7.2.0-OpenMP_Serial-release build_time=614 run_time=364
gcc-7.2.0-Serial-release build_time=458 run_time=210
ibm-16.1.0-Serial-release build_time=4114 run_time=213 kokkos-dev#######################################################
PASSED TESTS
#######################################################
cuda-9.2-Cuda_OpenMP-release build_time=484 run_time=169 kokkos-dev-2, --spot-check#######################################################
PASSED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release build_time=169 run_time=78
clang-8.0-Pthread_Serial-release build_time=70 run_time=50
clang-9.0.0-Pthread-release build_time=59 run_time=24
clang-9.0.0-Serial-release build_time=54 run_time=24
cuda-10.1-Cuda_OpenMP-release build_time=224 run_time=75
cuda-9.2-Cuda_Serial-release build_time=209 run_time=98
gcc-4.8.4-OpenMP-release build_time=50 run_time=24
gcc-7.3.0-OpenMP-release build_time=58 run_time=23
gcc-7.3.0-Pthread-release build_time=53 run_time=25
gcc-8.3.0-Serial-release build_time=56 run_time=25
gcc-9.1-OpenMP-release build_time=69 run_time=24
gcc-9.1-Serial-release build_time=61 run_time=25
intel-17.0.1-Serial-release build_time=107 run_time=23
intel-18.0.5-OpenMP-release build_time=180 run_time=21
intel-19.0.5-Pthread-release build_time=203 run_time=23 kokkos-dev-2, --spot-check-tpls#######################################################
PASSED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release build_time=169 run_time=79
clang-8.0-Pthread_Serial-release build_time=71 run_time=52
clang-9.0.0-Pthread-release build_time=56 run_time=25
clang-9.0.0-Serial-release build_time=55 run_time=25
cuda-10.1-Cuda_OpenMP-release build_time=219 run_time=78
gcc-4.8.4-OpenMP-release build_time=49 run_time=25
gcc-7.3.0-OpenMP-release build_time=58 run_time=25
gcc-7.3.0-Pthread-release build_time=55 run_time=26
gcc-8.3.0-Serial-release build_time=54 run_time=26
gcc-9.1-OpenMP-release build_time=69 run_time=24
gcc-9.1-Serial-release build_time=61 run_time=27
intel-17.0.1-Serial-release build_time=109 run_time=23
intel-18.0.5-OpenMP-release build_time=177 run_time=21
intel-19.0.5-Pthread-release build_time=204 run_time=21 |
@vqd8a: I've implemented your feedback. Please review when you have time. |
Thank you @e10harvey . They look good to me. |
Thanks, @e10harvey ! |
This PR adds initial fall-back support for TRMM and TRTRI.
Related to #589