Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various complex<double> unit test fails with XL/16.1 OpenMP build #344

Closed
ndellingwood opened this issue Nov 15, 2018 · 13 comments
Closed
Assignees
Labels

Comments

@ndellingwood
Copy link
Contributor

Output from Jenkins:

Failed Test 1

06:46:10 [ RUN      ] openmp.gemm_complex_double
06:46:10 /home/jenkins/white/workspace/KokkosKernels_White_XL_16_1_OpenMP_Serial/kokkos-kernels/unit_test/blas/Test_Blas3_gemm.hpp:147: Failure
06:46:10 Value of: (diff_C_average < 1.05*diff_C_expected )
06:46:10   Actual: false
06:46:10 Expected: true
etc.

Failed Test 2

06:46:10 [ RUN      ] openmp.batched_scalar_team_trsm_l_u_nt_n_dcomplex_dcomplex
06:46:10 /home/jenkins/white/workspace/KokkosKernels_White_XL_16_1_OpenMP_Serial/kokkos-kernels/unit_test/../test_common/KokkosKernels_TestUtils.hpp:87: Failure
06:46:10 The difference between double(AT1::abs(val1)) and double(AT2::abs(val2)) is 0.99556106126897947, which exceeds double(AT3::abs(tol)), where
06:46:10 double(AT1::abs(val1)) evaluates to 0.99556106126897947,
06:46:10 double(AT2::abs(val2)) evaluates to 0, and
06:46:10 double(AT3::abs(tol)) evaluates to 2.2204460492503131e-13.

Failed Test 3

06:46:10 [ RUN      ] openmp.batched_scalar_team_trsm_l_u_nt_n_dcomplex_double
06:46:10 /home/jenkins/white/workspace/KokkosKernels_White_XL_16_1_OpenMP_Serial/kokkos-kernels/unit_test/../test_common/KokkosKernels_TestUtils.hpp:87: Failure
06:46:10 The difference between double(AT1::abs(val1)) and double(AT2::abs(val2)) is 0.99580002746667939, which exceeds double(AT3::abs(tol)), where
06:46:10 double(AT1::abs(val1)) evaluates to 0.99580002746667939,
06:46:10 double(AT2::abs(val2)) evaluates to 0, and
06:46:10 double(AT3::abs(tol)) evaluates to 2.2204460492503131e-13.
@ndellingwood
Copy link
Contributor Author

@kyungjoo-kim would you have time to look at the gemm test?
@vqd8a would you have time to look at the trsm tests?

@kyungjoo-kim
Copy link
Contributor

Due to prohibitive compile time from the XL compiler, it is very difficult (almost impossible) to debug with the compiler. As the other compilers and platforms are okay, the listed failures are probably related to compiler super-scalar ordering. I am not sure if spending our time for this is meaningful. I suggest to disable the entire complex testing with XL. Since we test wtih Kokkos::complex, it would be interesting if this failures are reproduced from std::complex. However, I also think that investigating the difference between kokkos::complex and std::complex is not meaningful.

@ndellingwood
Copy link
Contributor Author

@kyungjoo-kim sounds good, compile times are very long with XL, thanks for looking into it.

@mhoemmen
Copy link
Contributor

@kyungjoo-kim Are there any Kokkos::parallel_reduce or atomic updates on Kokkos::complex<double>? If so, it's possible this is a Kokkos bug, due to POWER's different memory model.

@kyungjoo-kim
Copy link
Contributor

@mhoemmen I do not use reduce but I suspect that kokkos complex is problematic. When I populate random numbers, I use the max range as value_type(1.0) where value_type is kokkos complex. Lately I found that complex(1.0) populates random number with zero imaginary as the imaginary range is zero. The same test fails when it tests complex with zero imaginary but it passes with double. That is why I think that it is necessary to test with std::complex to tell if the issue comes from different memory model or from complex arithmetic overloading. Testig these takes too much time.

@crtrott
Copy link
Member

crtrott commented Nov 19, 2018

@kyungjoo-kim As discussed previously if you ask for a range of (1.0,0.0) (which you do since you implicitly construct from a real value) your range on the imaginary part is zero as you asked for it ;-).

@mhoemmen Reductions and Atomics should just work (and they should do the right thing, i.e. proper atomics). I might look into this and figure out whats going on with XL here.

@crtrott crtrott self-assigned this Dec 4, 2018
@crtrott crtrott added the bug label Dec 4, 2018
@ndellingwood ndellingwood mentioned this issue Dec 16, 2019
@brian-kelley
Copy link
Contributor

brian-kelley commented Dec 16, 2019

@kyungjoo-kim @mhoemmen I was able to replicated the failures on openmp.gemm_complex_double and serial.gemm_complex_double with GCC 6.4.0 on white using the CMake test script, and also with several compilers (intel, gcc, ibm) on white and bowman using the Makefile test script. So I think that one might be an actual bug, not an IBM compiler bug.

@mhoemmen
Copy link
Contributor

@brian-kelley Is that code calling the system BLAS or is it calling a hand-written matrix-matrix multiply?

@brian-kelley
Copy link
Contributor

@mhoemmen It's the hand-written KokkosBlas::gemm.

@brian-kelley
Copy link
Contributor

Hmm, today I'm not seeing gemm_complex_double fail except with IBM compilers. I started these spot checks this morning so they don't have the fixes from #550. Not sure what else changed, but I'm not going to worry about it.

@mhoemmen
Copy link
Contributor

@brian-kelley Are we perhaps assuming things about alignment of Kokkos::complex and std::complex that only cause issues with IBM compilers? That should only be an issue with CUDA, where std::complex<double> only needs 8-byte alignment (per the Standard) but CUDA's equivalent complex type needs 16-byte alignment.

@brian-kelley
Copy link
Contributor

@mhoemmen I doubt it's an alignment issue, for that reason. Even if IBM handles std::complex in a special way, I think this test only involves Kokkos::complex which is just a plain old struct, with member-wise alignment requirements.

@srajama1
Copy link
Contributor

@crtrott : This is showing up in spot-checks and not allowing us having clean spot-checks to push. We need to resolve this to make progress on other things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants