-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEACASExodus tests failing in new cuda 9.2 ATDM build on white/ride #3288
Comments
the same two tests are also failing on the waterman builds:
as shown here |
Not sure where the "Steps to Reproduce" is generated, but the cmake step should have "-DTrilinos_ENABLE_SEACAS=ON" instead of "-DTrilinos_ENABLE_Seacas=ON" . (All uppercase SEACAS) |
The failure seems to be related to the NetCDF library loaded by the module @nmhamster Would it be possible to rebuild the NetCDF library pointed to by the |
@gsjaardema, can we resolve this by just switching to the module |
@gsjaardema said:
It was just a typo. These are hand-generated copy-and-paste from: The hope is that these instructions are so simple that mistakes like that will be easy to spot. |
FYI: As shown here these tests are no longer failing on 'waterman' after the switch back to OpenMPI 2.1.2. Perhaps that si what we should do for 'white'/'ride' as well? |
I am fine switching to the 2.1.2 version. |
@gsjaardema / @bartlettroscoe - where we can, I'd like us to continue to push forward on OpenMPI 3.1.0. Despite some of the challenges, this is where we need to go moving forward for a variety of platforms. I realize that NetCDF 4.6.1 does not require the Exodus changes now but we had continued to perform them in order to match with pNetCDF installs. However, I realize that this may be causing some problems, so we have created a NetCDF 4.6.1 (without Exodus) for testing. Can you try the following:
This will use the standard devpack but change the NetCDF files over to the unmodified (non Exodus) variants. For now, I'd like to try this on White and Ride and if this works successfully then we can make this a standard change moving forward. Overall this is great news as we can reduce the non-standard code we are running (thanks @gsjaardema for the help). |
@bartlettroscoe - just replying to your suggestion to use XL modules instead of GCC - in general we strongly recommend against doing so. This mixture of name mangling and function call dependencies makes this challenging. |
Note that as of parallel-netcdf (PNetCDF) 1.9.0 there are no exodus-specific modifications requiredk for that library either. So recommendations are PNetCDF >= 1.9.0 and NetCDF >= 4.6.1 |
The exodus warnings that triggered this issue are fixed by using the new build of netcdf without the exodus mods. Whether the library had or didn't have the mods should make no difference, so maybe the now passing tests are the result of a newer or better build? However, I am now getting failures related to
So, some improvement, but now new issues arising... |
@gsjaardema, what is the next step here then? Do we need to get help from the Kokkos team about issues with |
@nmhamster said to try:
@gsjaardema, is this what you meant by "new NetCDF" above? |
@bartlettroscoe Yes. |
@fryeguy52, can we go ahead and update the file |
issue: trilinos#3288 switch the netcdf module that is loaded for the builds on ride/white to address some failing tests
Note that my comment above about tests now failing in
Then all tests complete successfully when using the |
@gsjaardema, thanks for pointing that out. Indeed these tests and all SEACAS tests are now fully passing in these CUDA 9.2 builds on 'white'/'ride' as shown here and here (see the little Looks like the problem is solved. We can now close this issue! |
All fixed. Closing as complete! |
trilinos#3290) This new env also has the correct netcdf build for SEACAS (see trilinos#3288).
trilinos#3290) This new env also has the correct netcdf build for SEACAS (see trilinos#3288).
FYI: As part of installing a consistent GCC 7.2.0 + OpenMPI 2.1.2 + CUDA 9.2 + TPLs env on 'white' and 'ride' as part of #3549, @nmhamster determined that the SEACAS tests failing as described in this Issue were not due to a bad NetCDF configuration but were actually due to differences in roundoff on HDF5 when going from @nmhamster and @gsjaardema, does this indicate a defect in HDF5, NetCDF, or SEACAS (or none of the above)? |
issue: trilinos#3288 switch the netcdf module that is loaded for the builds on ride/white to address some failing tests
trilinos#3290) This new env also has the correct netcdf build for SEACAS (see trilinos#3288).
CC: @trilinos/seacas , @kddevin (Trilinos Data Services Product Lead), @gsjaardema , @bartlettroscoe
Next Action Status
PR #3418 which switched from module
netcdf-exo/4.6.1/openmpi/3.1.0/gcc/7.2.0/cuda/9.2.88
tonetcdf/4.6.1/openmpi/3.1.0/gcc/7.2.0/cuda/9.2.88
on white/ride merged on 9/10/2018 and SEACAS tests fully passed on 9/11/2018.Description
As shown in this query the tests:
are failing in the builds:
The output looks similar to what we were seeing in #2815
...
in that issue
SEACASExodus_exodus_unit_tests
was also failing on mutrino and we ended up disabling the test for that build.Steps to Reproduce
One should be able to reproduce this failure on the machine
white
as described in:More specifically, the commands given for the system
white
are provided at:The exact commands to reproduce this issue should be:
The text was updated successfully, but these errors were encountered: