Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC 4.9.3-SERIAL fails in nightly clean test #3773

Closed
william76 opened this issue Oct 30, 2018 · 12 comments
Closed

GCC 4.9.3-SERIAL fails in nightly clean test #3773

william76 opened this issue Oct 30, 2018 · 12 comments
Labels
Framework tasks Framework tasks (used internally by Framework team) pkg: Kokkos pkg: KokkosKernels type: bug The primary issue is a bug in Trilinos code or tests

Comments

@william76
Copy link
Contributor

@trilinos/kokkos-kernels

An error showed up in the gcc 4.9.3 SERIAL build on 10/25, which means the issue likely was introduced sometime on 10/24.

The errors I'm seeing are:

libkokkoskernels_gtest.so.12.13: undefined reference to `pthread_key_create'
libkokkoskernels_gtest.so.12.13: undefined reference to `pthread_getspecific'
libkokkoskernels_gtest.so.12.13: undefined reference to `pthread_key_delete'
libkokkoskernels_gtest.so.12.13: undefined reference to `pthread_setspecific'
collect2: error: ld returned 1 exit status

I was out of town for the past week... @jwillenbring do you know if anything @trilinos/framework related changed on our configurations that would cause an undefined reference to things like pthread_key_create?

@trilinos/kokkos-kernels was anything merged into Trilinos that is using any new features from pthreads that might require a newer version of pthreads on the testing machines?

I also found this StackOverflow issue which was related to someone linking libgtest incorrectly... were any changes that might relate to gtest stuff in KokkosKernels pushed into Trilinos on 10/24?

@ndellingwood
Copy link
Contributor

@william76 there hasn't been anything from kokkos-kernels pushed into Trilinos since Oct 11, see this SHA 549ca9b

Have you enabled different test options that weren't being used before? Is Pthreads enabled even though this is referencing a serial build?

@william76
Copy link
Contributor Author

@ndellingwood I'm not 100% sure, I was out of town for a wedding since last Wednesday, we just got back home late last night.

I was having a look at the testing and noticed the test in the Clean track failing. It's odd then that it was passing until the 25th and then started failing every night. Perhaps something on the ASCIC build farm got changed?

It's definitely possible that the "SERIAL" build in the Clean track is pulling in parallel stuff, that test was put together a while back and I've never really dug into what it's doing. The variant of the serial test in the PR testing is new and it's passing the latest SHA1 just fine. I put some effort into the new serial test to keep parallel TPLs from loading and to turn off all the parallel stuff I could find.

@jwillenbring Now that we have the dev->master PR set up and running, what do we want to do with the tests in the "Clean" track? Should we deprecate them in favor of supporting the PR versions? We should discuss this in our stand-up on Wednesday.

@mhoemmen
Copy link
Contributor

Looks like somebody turned off the Pthread TPL, or isn't linking with that library when they should. Some OpenMP implementations want that.

@srajama1
Copy link
Contributor

I am going to assume this is a @trilinos/framework issue.

@srajama1 srajama1 added Framework tasks Framework tasks (used internally by Framework team) and removed pkg: KokkosKernels labels Oct 31, 2018
@william76
Copy link
Contributor Author

I looked more closely at the date/times of these tests, this error would have shown up on the 25th of October, not the 24th since the tests that started failing kicked off at 10pm on the 25th.

Here are the PR's that were merged on 10/25. I don't see anything from @trilinos/framework that would have changed the build system.

It looks like @jhux2 changed a couple of CMake settings in #3732 and #3736, but the changes don't look like they'd be causing this error at first glance.

@srajama1 We spoke about this in the Framework meeting this morning, and @prwolfe said that they're seeing this error now as well on their Sierra testing.

Can @trilinos/kokkos have a look at the output from testing and the PR's that were merged in on 10/25 to see if anything pushed in that day looks like it could be the culprit?

@bartlettroscoe Any suggestions on something in TriBiTS that could be looked into to figure out why @trilinos/kokkos-kernels is trying to link pthreads in a serial build?

@prwolfe Can you add any links to information on this you're seeing on the Sierra side? Any dashboard failures you're seeing? If you're in the ceelan area, can you ask them if anything changed on the configuration of the systems recently?

@mhoemmen yeah, the odd thing to me is that this error just popped up out of the blue. Since the gtest stuff is bundled inside @trilinos/kokkos my initial thought would be to look there or into Kokkos-Kernels for an update to Trilinos... but it looks like there weren't any updates on the 25th. So, the next culprit I'd think of is that something changed on the ASCIC nodes that this test is run or something in the framework... but nothing from framework was merged that day.

I'll poke around at the build for that test and see if anything jumps out at me...

@prwolfe
Copy link
Contributor

prwolfe commented Nov 5, 2018

I am seeing timeouts for 2-4 of these tests nightly. Note that we do have pthreads off, I'm not sure why and I will go review that ticket.

@bartlettroscoe bartlettroscoe added pkg: KokkosKernels type: bug The primary issue is a bug in Trilinos code or tests labels Nov 5, 2018
@bartlettroscoe
Copy link
Member

@bartlettroscoe Any suggestions on something in TriBiTS that could be looked into to figure out why @trilinos/kokkos-kernels is trying to link pthreads in a serial build?

@william76, for the build Linux-gcc-4.9.3-SERIAL_Release_gcc_4.9.3__DEV shown here, if you look at the build failure output here, you will not find pthread on any of the link lines. And if you look at the configure output here, you can see that the TriBITS Pthread TPL is disabled. Therefore, this is not a TriBITS problem that I can see. The problems is likely KokkosKernels code being hard-coded to call pthreads form some reason. Should not be that hard for @trilinos/kokkos-kernels developers to figure that out.

@ndellingwood
Copy link
Contributor

@william76 can you try adding this line:

SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DGTEST_HAS_PTHREAD=0")

to the following location in kokkos-kernels:
kokkos-kernels/unit_test/CMakeLists.txt after line 23
https://github.com/kokkos/kokkos-kernels/blob/master/unit_test/CMakeLists.txt

@jhux2
Copy link
Member

jhux2 commented Nov 5, 2018

It looks like @jhux2 changed a couple of CMake settings in #3732 and #3736, but the changes don't look like they'd be causing this error at first glance.

These changes are way downstream in MueLu. I don't see how they can be related.

@william76
Copy link
Contributor Author

@bartlettroscoe Thanks for checking that and providing the links!

@jhux2 Thanks for having a look.

@ndellingwood Thanks, I'll test that change out and see if that disables the flag... from what I see, GTEST_HAS_PTHREAD is used in gtest.h here:

#if GTEST_HAS_PTHREAD
// gtest-port.h guarantees to #include <pthread.h> when GTEST_HAS_PTHREAD is
// true.
# include <pthread.h> // NOLINT
// For timespec and nanosleep, used below.
# include <time.h> // NOLINT
#endif

#if GTEST_HAS_PTHREAD
// gtest-port.h guarantees to #include <pthread.h> when GTEST_HAS_PTHREAD is
// true.
# include <pthread.h>  // NOLINT


// For timespec and nanosleep, used below.
# include <time.h>  // NOLINT
#endif

I'll test this guard out and report back.

@william76
Copy link
Contributor Author

william76 commented Nov 5, 2018

I modified the CMakeLists.txt file in Kokkos-Kernels on my test-PR, #3803 (which won't be merged, it's just a test) to see what happens in PR testing.

@srajama1
Copy link
Contributor

srajama1 commented Nov 6, 2018

@william76 Your PR testing passed, do you want to push this change in ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Framework tasks Framework tasks (used internally by Framework team) pkg: Kokkos pkg: KokkosKernels type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

7 participants