Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sptrsv perf_test enhancements #605

Merged
merged 5 commits into from
Feb 24, 2020

Conversation

e10harvey
Copy link
Contributor

@e10harvey e10harvey commented Feb 17, 2020

  • perf_test/sparse: Improve test
    • Improve command line argument parsing for superlu perf test.
    • Conditionally set scalar_t based on configure value of KOKKOSKERNELS_SCALARS.

Related to #589.

Spot-check run in progress.

  - If KOKKOSKERNELS_ENABLE_TPL_SUPERLU is enabled, sptrsv_supernode must
  declare read_supernodal_graphL for sptrsv_superlu.
  - Include KokkosKernels_config.h in superlu and choldmod perf test. This is
  needed for defines that conditionally enable the test at compile time.

  - Improve command line argument parsing for superlu perf test.
@@ -796,8 +796,8 @@ int main(int argc, char **argv) {
}

{
using scalar_t = double;
//using scalar_t = Kokkos::complex<double>;
//using scalar_t = double;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pick this based on what scalars are defined ?

Copy link
Contributor Author

@e10harvey e10harvey Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will modify this perf_test to exercise both the double and complex types.

EDIT: We should be able to pick this based on the input files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should also be what is enabled in ETI at compile time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks!

#if defined(KOKKOSKERNELS_ENABLE_TPL_LAPACKE) && defined(KOKKOSKERNELS_ENABLE_TPL_CBLAS)
#if (defined(KOKKOSKERNELS_ENABLE_TPL_LAPACKE) && \
defined(KOKKOSKERNELS_ENABLE_TPL_CBLAS)) \
|| defined(KOKKOSKERNELS_ENABLE_TPL_SUPERLU)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right to include cblas and lapacke when SuperLU is defined ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is the correct place, I think these includes should be hidden within KokkosBlas and I intend to do this while resolving #589. These small changes are just to fix compilation errors. || defined(KOKKOSKERNELS_ENABLE_TPL_SUPERLU) makes the definition of read_supernodal_graphL available for sptrsv_superlu.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking at this. By the way, this file uses CBLAS/LAPACKE, but not SuperLU (in fact, it is also used with sptrsv_cholmod). Does CBLAS/LAPACKE gets enabled, if SuperLU get enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem :)

The problem here is that sptrsv_superlu needs the definition of read_supernodal_graphL -- otherwise, kokkos-kernels will not compile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR feedback. This is my mistake. It looks like I had a bug with my build makefile generation scripts where CBLAS and LAPACKE were not enabled but SUPERLU was.

I now see that the perf tests pull in KokkosKernels_config.h via KokkosSparse_sptrsv.hpp->KokkosSparse_sptrsv_symbolic_spec.hpp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you are trying to build it without CBLAS/LAPACKE enabled?

@iyamazaki: Yes, I was. This compilation issue arises when --with-tpls=cholmod,superlu.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused now. What is the goal here ? If you enable superlu, cholmod then cblas/lapacke are required. Your changes will get rid of cblas/lapacke dependency, but we are not there yet. Correct ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I don't think we can compile the code without enabling CBLAS/LAPACKE, right now..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srajama1: I was confused as well -- if we use --with-tpls=cholmod,superlu, one way to get things compiling is by adding || defined(KOKKOSKERNELS_ENABLE_TPL_SUPERLU) here to fix a seemingly unrelated compilation error. If we use --with-tpls=cholmod,superlu,cblas,blas,lapacke everything works as expected without adding || defined(KOKKOSKERNELS_ENABLE_TPL_SUPERLU) here.

What is the goal here ?

The goal of this PR is to learn the kokkos-kernels PR guidelines and fix the build for the superlu and cholmod perf_tests and in the process.

If you enable superlu, cholmod then cblas/lapacke are required.

OK, is the correct place to handle this dependency resolution in the new cmake build system (which announces whether a TPL is available) and the ETI specialization layer (which determines if a full specialization is available)?

Your changes will get rid of cblas/lapacke dependency, but we are not there yet. Correct ?

I think that's right assuming I am starting to understand the ETI system. I plan on addressing this dependency in a separate PR -- I will update the description in #589 to see if we are on the same page.

These questions were really helpful, thank you.

@e10harvey e10harvey changed the title sptrsv build patches sptrsv perf_test enhancements Feb 17, 2020
@e10harvey
Copy link
Contributor Author

e10harvey commented Feb 19, 2020

The spot-check passed on ride:

nohup ../scripts/test_all_sandia --kokkoskernels-path=$KOKKOSKERNELS_PATH --kokkos-path=$KOKKOS_PATH --arch="Power8,Pascal60" --spot-check &
Running on machine: white
Going to test compilers:  gcc/6.4.0 gcc/7.2.0 gcc/7.4.0 ibm/16.1.0 cuda/9.2.88 cuda/10.0.130
Testing compiler gcc/6.4.0
Testing compiler gcc/7.2.0
  Starting job gcc-7.2.0-OpenMP-release
  Starting job gcc-6.4.0-OpenMP_Serial-release
  PASSED gcc-7.2.0-OpenMP-release
  Starting job gcc-7.2.0-Serial-release
  PASSED gcc-6.4.0-OpenMP_Serial-release
Testing compiler gcc/7.4.0
  Starting job gcc-7.2.0-OpenMP_Serial-release
  PASSED gcc-7.2.0-Serial-release
Testing compiler ibm/16.1.0
  Starting job gcc-7.4.0-OpenMP-release
  PASSED gcc-7.4.0-OpenMP-release
Testing compiler cuda/9.2.88
  Starting job ibm-16.1.0-Serial-release
  PASSED gcc-7.2.0-OpenMP_Serial-release
  PASSED ibm-16.1.0-Serial-release
Testing compiler cuda/10.0.130
  Starting job cuda-9.2.88-Cuda_OpenMP-release
  PASSED cuda-9.2.88-Cuda_OpenMP-release
  Starting job cuda-10.0.130-Cuda_Serial-release
  PASSED cuda-10.0.130-Cuda_Serial-release
#######################################################
PASSED TESTS
#######################################################
cuda-10.0.130-Cuda_Serial-release build_time=1248 run_time=853
cuda-9.2.88-Cuda_OpenMP-release build_time=1318 run_time=663
gcc-6.4.0-OpenMP_Serial-release build_time=649 run_time=841
gcc-7.2.0-OpenMP-release build_time=478 run_time=301
gcc-7.2.0-OpenMP_Serial-release build_time=603 run_time=1080
gcc-7.2.0-Serial-release build_time=257 run_time=462
gcc-7.4.0-OpenMP-release build_time=402 run_time=304
ibm-16.1.0-Serial-release build_time=1484 run_time=779
#######################################################
FAILED TESTS
#######################################################

Copy link
Contributor

@srajama1 srajama1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this looks simple enough to go in. I would like this for float and complex float as well, but that can come in another PR.

If @iyamazaki is fine, I will push this PR in.

@srajama1
Copy link
Contributor

@e10harvey Can you start a spot check on a non-GPU machine as well ?

@e10harvey
Copy link
Contributor Author

@e10harvey Can you start a spot check on a non-GPU machine as well ?

@srajama1, no problem. I started another with [email protected] and kokkos-kernels@topic/sptrsv_build_patches on a sems machine. I won't have access to other non-GPU machines listed in test_all_sandia until the accounts are approved.

@srajama1
Copy link
Contributor

Got it ! Thanks !

@@ -100,7 +102,7 @@ void print_factor_superlu(int n, SuperMatrix *L, SuperMatrix *U, int *perm_r, in
}
printf( "];\n" );

#if 0
#if defined(KOKKOSKERNELS_ENABLE_TPL_SUPERLU)
Copy link
Contributor

@iyamazaki iyamazaki Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to remove these debugging functions in the next PR, but we may get compilation errors/warnings with %e in fprintf using complex scalar type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't encounter any when testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These "spot-check" are not testing these codes, are they?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e10harvey Bold of you to assume that tests actually enable complex scalar types ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These "spot-check" are not testing these codes, are they?

Correct. The spot-check is not testing these diffs.

@e10harvey Bold of you to assume that tests actually enable complex scalar types ;-)

I ran the cholmod and superlu perf tests manually for both double and complex :-)

We can either close or merge this PR -- it's really just an onboarding task for myself and the only real improvement is the super LU perf test argument processing.

@srajama1
Copy link
Contributor

@lucbv Thank you ! Merging it.

@srajama1
Copy link
Contributor

I merged this when I was trying to merge a PR by @lucbv by switching to the wrong window. I think it is still reasonable to merge this.

@e10harvey @iyamazaki : Do you agree or should I revert it ? I would have guarded the fprintf with complex being defined, but I think you can submit another quick PR for it, right ?

@e10harvey
Copy link
Contributor Author

@e10harvey @iyamazaki : Do you agree or should I revert it ? I would have guarded the fprintf with complex being defined, but I think you can submit another quick PR for it, right ?

I agree but I defer to @iyamazaki. @iyamazaki, shall I submit a PR for the fprintf guarding or are you still planning to remove these in your next PR?

@iyamazaki
Copy link
Contributor

Yes. I've already removed it. I am not quite done cleaning it up, but I am thinking to do a PR today after resolving conflicts. Thank you again for looking at it!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants