-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add infrastructure for CUDA PR build #4217
Merged
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
3a32119
Add infrastructure for testing cuda build
ZUUL42 7993516
Removing sems module init for cuda & fix workspace ref
ZUUL42 c476288
Fix MPI and sems init calls for cuda
ZUUL42 04bfef8
MPI_NAME -> MPI_VENDOR
ZUUL42 6daed62
Setting KOKKOS_ARCH
jwillenbring aa161c9
Fixing CUDA configuration
jwillenbring a5b2204
Added BLAS and LAPACK options.
jwillenbring cb2edca
Explicitly disabling Scotch and ParMetis.
jwillenbring df416a1
Adding Netcdf library configuration.
jwillenbring 60b4424
Disabled SuperLU and boostlib
jwillenbring 969e5a5
Disabled SEACAS for now.
jwillenbring 915ed4a
Disabled Moertel.
jwillenbring 14c493a
Disabling shared libs and Komplex
jwillenbring b5aebc2
Disabled debug to match ATDM build
jwillenbring 1fc7808
Set Tpetra_INST_SERIAL=ON to match atdm build.
jwillenbring c504c36
Turning off Debug symbols.
jwillenbring 559a7e4
Turning off secondary tested code.
jwillenbring 5fefffa
Disable a failing test.
jwillenbring 1fbde49
Disabled remaining failed tests.
jwillenbring 26ce0f0
Added check for cuda build for if node is ride
ZUUL42 d8cad58
Added check for cuda build for if node is ride
ZUUL42 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# This file contains the options needed to both run the pull request testing | ||
# for Trilinos for the CUDA 9.2 pull request testing builds, and to reproduce | ||
# the errors reported by those builds. Prior to using this this file, the | ||
# appropriate set of modules must be loaded and path must be augmented. | ||
# (See the sems/PullRequestCuda9.2TestingEnv.sh files.) | ||
|
||
# Usage: cmake -C PullRequestLinuxCUDA9.2TestingSettings.cmake | ||
|
||
# Misc options typically added by CI testing mode in TriBITS | ||
|
||
# Use the below option only when submitting to the dashboard | ||
set (CTEST_USE_LAUNCHERS ON CACHE BOOL "Set by default for PR testing") | ||
|
||
# Options necessary for CUDA build | ||
set (TPL_ENABLE_MPI ON CACHE BOOL "Set by default for CUDA PR testing") | ||
set (TPL_ENABLE_CUDA ON CACHE BOOL "Set by default for CUDA PR testing") | ||
set (Kokkos_ENABLE_Cuda ON CACHE BOOL "Set by default for CUDA PR testing") | ||
set (Kokkos_ENABLE_Cuda_UVM ON CACHE BOOL "Set by default for CUDA PR testing") | ||
set (KOKKOS_ARCH Power8 CACHE STRING "Set by default for CUDA PR testing") | ||
|
||
# TPL settings specific to CUDA build | ||
set (TPL_BLAS_LIBRARIES "-L${BLAS_ROOT}/lib -lblas -lgfortran -lgomp -lm" CACHE STRING "Set by default for CUDA PR testing") | ||
set (TPL_LAPACK_LIBRARIES "-L${LAPACK_ROOT}/lib -llapack -lgfortran -lgomp" CACHE STRING "Set by default for CUDA PR testing") | ||
set (BUILD_SHARED_LIBS OFF CACHE BOOL "Set by default for CUDA PR testing") | ||
set (Tpetra_INST_SERIAL ON CACHE BOOL "Set by default for CUDA PR testing") | ||
set (Trilinos_ENABLE_SECONDARY_TESTED_CODE OFF CACHE BOOL "Set by default for CUDA PR testing") | ||
set (TPL_ENABLE_Scotch OFF CACHE BOOL "Set by default for CUDA PR testing") | ||
# Parmetis is available on ride and could be enabled for the CUDA PR build | ||
set (TPL_ENABLE_ParMETIS OFF CACHE BOOL "Set by default for CUDA PR testing") | ||
set (TPL_Netcdf_LIBRARIES "-L${BOOST_ROOT}/lib;-L${NETCDF_ROOT}/lib;-L${NETCDF_ROOT}/lib;-L${PNETCDF_ROOT}/lib;-L${HDF5_ROOT}/lib;${BOOST_ROOT}/lib/libboost_program_options.a;${BOOST_ROOT}/lib/libboost_system.a;${NETCDF_ROOT}/lib/libnetcdf.a;${PNETCDF_ROOT}/lib/libpnetcdf.a;${HDF5_ROOT}/lib/libhdf5_hl.a;${HDF5_ROOT}/lib/libhdf5.a;-lz;-ldl" CACHE STRING "Set by default for CUDA PR testing") | ||
# SuperLU is available on ride and could be enabled for the CUDA PR build | ||
set (TPL_ENABLE_SuperLU OFF CACHE BOOL "Set by default for CUDA PR testing") | ||
set (TPL_ENABLE_BoostLib OFF CACHE BOOL "Set by default for CUDA PR testing") | ||
set (Trilinos_ENABLE_Moertel OFF CACHE BOOL "Disable for CUDA PR testing") | ||
set (Trilinos_ENABLE_Komplex OFF CACHE BOOL "Disable for CUDA PR testing") | ||
|
||
# Temporary options to clean up build | ||
set (Trilinos_ENABLE_SEACAS OFF CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (Trilinos_ENABLE_DEBUG OFF CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (Trilinos_ENABLE_DEBUG_SYMBOLS OFF CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (STK_ENABLE_TESTS OFF CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_interfacepartitionofunity_DIM2_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_interfacesets_DIM2_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_localpartitionofunitybasis_EPETRA_MPI_1_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_GDSWP_DIM2_DPN1_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_GDSWP_DIM2_DPN1_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_GDSWP_DIM2_DPN2_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_GDSWP_DIM2_DPN2_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_RGDSWP_DIM2_DPN1_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_RGDSWP_DIM2_DPN1_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_RGDSWP_DIM2_DPN2_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_RGDSWP_DIM2_DPN2_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLBP_NB2_GDSW_DIM2_DPN1_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLBP_NB2_GDSW_DIM2_DPN1_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLBP_NB2_GDSW_DIM2_DPN2_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLBP_NB2_GDSW_DIM2_DPN2_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLBP_NB3_GDSW_DIM2_DPN1_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLBP_NB3_GDSW_DIM2_DPN1_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLBP_NB3_GDSW_DIM2_DPN2_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLBP_NB3_GDSW_DIM2_DPN2_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_GDSW_DIM2_DPN1_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_GDSW_DIM2_DPN1_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_GDSW_DIM2_DPN2_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_GDSW_DIM2_DPN2_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_IPOUHarmonic_DIM2_DPN1_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_IPOUHarmonic_DIM2_DPN1_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_IPOUHarmonic_DIM2_DPN2_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_IPOUHarmonic_DIM2_DPN2_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_RGDSW_DIM2_DPN1_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_RGDSW_DIM2_DPN1_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_RGDSW_DIM2_DPN2_ORD0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_laplace_TLP_RGDSW_DIM2_DPN2_ORD1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_stokes_hdf5_TLBP_GDSW_O0_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ShyLU_DDFROSch_test_thyra_xpetra_stokes_hdf5_TLBP_GDSW_O1_EPETRA_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_example_PDE-OPT_0ld_adv-diff-react_example_01_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_example_PDE-OPT_0ld_adv-diff-react_example_02_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_example_PDE-OPT_0ld_poisson_example_01_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_example_PDE-OPT_0ld_stefan-boltzmann_example_03_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_example_PDE-OPT_navier-stokes_example_01_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_example_PDE-OPT_navier-stokes_example_02_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_example_PDE-OPT_nonlinear-elliptic_example_01_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_example_PDE-OPT_nonlinear-elliptic_example_02_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_example_PDE-OPT_obstacle_example_01_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_example_PDE-OPT_topo-opt_poisson_example_01_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (ROL_test_elementwise_TpetraMultiVector_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (TpetraCore_Core_initialize_where_tpetra_initializes_kokkos_MPI_1_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (TpetraCore_Core_initialize_where_tpetra_initializes_mpi_and_user_initializes_kokkos_MPI_2_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (TpetraCore_Core_initialize_where_user_initializes_kokkos_MPI_1_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (TpetraCore_Core_initialize_where_user_initializes_mpi_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (TpetraCore_Core_ScopeGuard_where_tpetra_initializes_kokkos_MPI_1_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (TpetraCore_Core_ScopeGuard_where_tpetra_initializes_mpi_and_user_initializes_kokkos_MPI_2_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (TpetraCore_Core_ScopeGuard_where_user_initializes_kokkos_MPI_1_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (TpetraCore_Core_ScopeGuard_where_user_initializes_mpi_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (TrilinosCouplings_Example_Maxwell_MueLu_MPI_1_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
set (TrilinosCouplings_Example_Maxwell_MueLu_MPI_4_DISABLE ON CACHE BOOL "Temporary disable for CUDA PR testing") | ||
|
||
include("${CMAKE_CURRENT_LIST_DIR}/PullRequestLinuxCommonTestingSettings.cmake") | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
#!/bin/bash -l | ||
|
||
if [ "${BSUB_CTEST_TIME_LIMIT}" == "" ] ; then | ||
export BSUB_CTEST_TIME_LIMIT=12:00 | ||
fi | ||
|
||
if [ "${Trilinos_CTEST_DO_ALL_AT_ONCE}" == "" ] ; then | ||
export Trilinos_CTEST_DO_ALL_AT_ONCE=TRUE | ||
fi | ||
|
||
# comment out sh and add what we need individually. | ||
#source $WORKSPACE/Trilinos/cmake/std/atdm/load-env.sh $JOB_NAME | ||
|
||
set -x | ||
|
||
#TODO: review appropriate job size | ||
bsub -x -Is -q rhel7F -n 16 -J $JOB_NAME -W $BSUB_CTEST_TIME_LIMIT \ | ||
$WORKSPACE/Trilinos/cmake/std/PullRequestLinuxDriver.sh | ||
|
||
# NOTE: Above, this bsub command should grab a single rhel7F (Firestone, | ||
# Dual-Socket POWER8, 8 cores per socket, K80 GPUs) node. The option '-x' | ||
# makes sure that only this job runs on that node. The options '-n 16' and | ||
# '-q rhel7G' should make bsub allocate a single one of these nodes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# This script can be used to load the appropriate environment for the | ||
# PR build on ride using CUDA. | ||
|
||
# usage: $ source PullRequestCUDA9.2TestingEnv.sh | ||
|
||
#No SEMS NFS mount on ride | ||
#source /projects/sems/modulefiles/utils/sems-modules-init.sh | ||
module load git/2.10.1 | ||
module load devpack/20180521/openmpi/2.1.2/gcc/7.2.0/cuda/9.2.88 | ||
module swap openblas/0.2.20/gcc/7.2.0 netlib/3.8.0/gcc/7.2.0 | ||
#export OMPI_CXX=`which g++` | ||
export OMPI_CXX=$WORKSPACE/Trilinos/packages/kokkos/bin/nvcc_wrapper | ||
export OMPI_CC=`which gcc` | ||
export OMPI_FC=`which gfortran` | ||
export NVCC_WRAPPER_DEFAULT_COMPILER=`which g++` | ||
export CUDA_LAUNCH_BLOCKING=1 | ||
|
||
# Use manually installed cmake and ninja to try to avoid module loading | ||
# problems (see TRIL-208) | ||
export PATH=/ascldap/users/rabartl/install/white-ride/cmake-3.11.2/bin:/ascldap/users/rabartl/install/white-ride/ninja-1.8.2/bin:$PATH |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
BSUB_CTEST_TIME_LIMIT
a system defined environment variable or is this a new Jenkins parameter that's being introduced?If this doesn't exist, can it break things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartlettroscoe @ZUUL42 @william76
I am not certain, but I think it is just an environment variable that can be defined by a user.
@bartlettroscoe may be able to confirm or correct this.
If it is not defined, it is set to a default value, so it shouldn't break things. The variable is specific to this build as far as current PR tests go (because ride uses bsub).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just an env var that allowed specialization in the individual driver scripts. This came from the file:
which has:
Any questions about this?