Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial TRMM & TRTRI eti specializations as serial batched routines #697

Merged
merged 28 commits into from
May 1, 2020

Conversation

e10harvey
Copy link
Contributor

This PR adds initial fall-back support for TRMM and TRTRI.

Related to #589

@e10harvey e10harvey self-assigned this Apr 17, 2020
@e10harvey
Copy link
Contributor Author

Spot-check on White:

#######################################################
PASSED TESTS
#######################################################
cuda-10.1.105-Cuda_OpenMP-release build_time=335 run_time=107
cuda-10.1.105-Cuda_Serial-release build_time=375 run_time=146
cuda-9.2.88-Cuda_OpenMP-release build_time=368 run_time=120
cuda-9.2.88-Cuda_Serial-release build_time=340 run_time=160
gcc-6.4.0-OpenMP_Serial-release build_time=133 run_time=129
gcc-7.2.0-OpenMP-release build_time=99 run_time=44
gcc-7.2.0-OpenMP_Serial-release build_time=138 run_time=128
gcc-7.2.0-Serial-release build_time=91 run_time=80
ibm-16.1.0-Serial-release build_time=602 run_time=89

Copy link
Contributor

@vqd8a vqd8a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e10harvey They look good to me. Just have one minor suggestion for file names.
And since you have the serial implementations in the batched directory, should we also add unit tests for the batched mode to unit_test/batched?

#define __KOKKOSBATCHED_TRMM_SERIAL_IMPL_HPP__

#include "KokkosBatched_Util.hpp"
#include "KokkosBatched_trmm_serial_internal.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e10harvey I suggest the file names follow the naming convention in the src/batched/ directory. For example:
KokkosBatched_trmm_serial_internal.hpp --> KokkosBatched_Trmm_Serial_Internal.hpp

KokkosBatched_trtri_decl.hpp --> KokkosBatched_Trtri_Decl.hpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vqd8a thanks for the feedback. I renamed the files to match the naming convention. I see that the batched unit tests are currently disabled; would you still like to see those batched unit tests added?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e10harvey Thanks for fixing. The batched unit tests were enabled in #686. You should merge the develop branch to yours.

@e10harvey
Copy link
Contributor Author

Spot-checks

white

#######################################################
PASSED TESTS
#######################################################
cuda-10.1.105-Cuda_OpenMP-release build_time=1187 run_time=297
cuda-10.1.105-Cuda_Serial-release build_time=1175 run_time=413
cuda-9.2.88-Cuda_OpenMP-release build_time=1165 run_time=365
cuda-9.2.88-Cuda_Serial-release build_time=1162 run_time=438
gcc-6.4.0-OpenMP_Serial-release build_time=583 run_time=358
gcc-7.2.0-OpenMP-release build_time=521 run_time=149
gcc-7.2.0-OpenMP_Serial-release build_time=614 run_time=364
gcc-7.2.0-Serial-release build_time=458 run_time=210
ibm-16.1.0-Serial-release build_time=4114 run_time=213

kokkos-dev

#######################################################
PASSED TESTS
#######################################################
cuda-9.2-Cuda_OpenMP-release build_time=484 run_time=169

kokkos-dev-2, --spot-check

#######################################################
PASSED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release build_time=169 run_time=78
clang-8.0-Pthread_Serial-release build_time=70 run_time=50
clang-9.0.0-Pthread-release build_time=59 run_time=24
clang-9.0.0-Serial-release build_time=54 run_time=24
cuda-10.1-Cuda_OpenMP-release build_time=224 run_time=75
cuda-9.2-Cuda_Serial-release build_time=209 run_time=98
gcc-4.8.4-OpenMP-release build_time=50 run_time=24
gcc-7.3.0-OpenMP-release build_time=58 run_time=23
gcc-7.3.0-Pthread-release build_time=53 run_time=25
gcc-8.3.0-Serial-release build_time=56 run_time=25
gcc-9.1-OpenMP-release build_time=69 run_time=24
gcc-9.1-Serial-release build_time=61 run_time=25
intel-17.0.1-Serial-release build_time=107 run_time=23
intel-18.0.5-OpenMP-release build_time=180 run_time=21
intel-19.0.5-Pthread-release build_time=203 run_time=23

kokkos-dev-2, --spot-check-tpls

#######################################################
PASSED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release build_time=169 run_time=79
clang-8.0-Pthread_Serial-release build_time=71 run_time=52
clang-9.0.0-Pthread-release build_time=56 run_time=25
clang-9.0.0-Serial-release build_time=55 run_time=25
cuda-10.1-Cuda_OpenMP-release build_time=219 run_time=78
gcc-4.8.4-OpenMP-release build_time=49 run_time=25
gcc-7.3.0-OpenMP-release build_time=58 run_time=25
gcc-7.3.0-Pthread-release build_time=55 run_time=26
gcc-8.3.0-Serial-release build_time=54 run_time=26
gcc-9.1-OpenMP-release build_time=69 run_time=24
gcc-9.1-Serial-release build_time=61 run_time=27
intel-17.0.1-Serial-release build_time=109 run_time=23
intel-18.0.5-OpenMP-release build_time=177 run_time=21
intel-19.0.5-Pthread-release build_time=204 run_time=21

@e10harvey e10harvey requested a review from vqd8a April 29, 2020 19:35
@e10harvey
Copy link
Contributor Author

@vqd8a: I've implemented your feedback. Please review when you have time.

@vqd8a
Copy link
Contributor

vqd8a commented May 1, 2020

Thank you @e10harvey . They look good to me.

@srajama1
Copy link
Contributor

srajama1 commented May 1, 2020

Thanks, @e10harvey !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants