From 00065844949511d0f7003a969de5283f9d4b30a2 Mon Sep 17 00:00:00 2001 From: Wayne Franz Date: Wed, 26 Feb 2025 08:43:14 -0500 Subject: [PATCH] Mergeback 6.4 Fixes (#523) * Enable HIP language (#493) If the HIP language can be used, the HIP_USECXX variable sets rocThrust to use the CMake HIP language rather than CXX. Co-authored-by: Lauren Wrubleski * Made TBB optional for hipstdpar tests (#507) * added TBB in dependencies cmake * updated changelog * removed duplicate dependencies declaration in test cmake * Create optional flag to run tests hipstdpar with TBB * Add TBB optional flag to CHANGELOG * Add rocprim location to hipstdpar * Add default and test for DONWLOAD_ROCRAND * Build with TBB when available or when FLAG is set --------- Co-authored-by: NguyenNhuDi * Separate CMake BUILD_TEST and BUILD_HIPSTDPAR_TEST options (#508) * Separate CMake BUILD_TEST and BUILD_HIPSTDPAR_TEST options Previously, enabling BUILD_TEST would also enable hipstdpar tests if we detected that a c++17-capable compiler was present. However, this caused build issues on systems with a c++17 compiler but an outdated version of libstdc++ that didn't support c++17 (RHEL 8.x). Currently, we require a minimum cmake version of 3.10.2. There's no real robust way of detecting the libstdc++ version that will work that far back. To workaround this problem for now, this change splits the BUILD_TEST and BUILD_HIPSTDPAR_TEST cmake options so that they are independent. This means that in order to enable hipstdpar tests, the user must explicitly enable the BUILD_HIPSTDPAR_TEST option. Update the readme to reflect this. * Update README.md Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> --------- Co-authored-by: Di Nguyen Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Updated known issue for inclusive_scan_by_key compiler bug (#513) * updated the known issue * Update CHANGELOG.md Co-authored-by: spolifroni-amd * moved known issue to 6.4 section * updated to include that the issue has been fixed --------- Co-authored-by: spolifroni-amd * Extend fallback coverage for copy_if (#512) We recently added a "fallback" implementation for thrust::copy_if that is invoked when copying a custom type that's too large to fit in shared memory. This change extends the fallback slightly so that it can be used with an overload of copy_if that accepts a stencil buffer (to copy by key). It also adds a unit test to cover this case. It also fixes a small bug in the fallback implementation that could cause the scan accumulator type to overflow when the results are compacted. --------- Co-authored-by: Lauren Wrubleski Co-authored-by: Nick Breed <78807921+NB4444@users.noreply.github.com> Co-authored-by: NguyenNhuDi Co-authored-by: Di Nguyen Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> Co-authored-by: spolifroni-amd --- CHANGELOG.md | 6 ++ CMakeLists.txt | 99 ++++++++++++--------- README.md | 6 +- cmake/Benchmarks.cmake | 15 +++- cmake/Dependencies.cmake | 24 ++++- cmake/DownloadProject.CMakeLists.cmake.in | 2 - cmake/Summary.cmake | 28 ++++-- cmake/VerifyCompiler.cmake | 10 ++- examples/CMakeLists.txt | 15 +++- extra/CMakeLists.txt | 9 ++ test/CMakeLists.txt | 6 +- test/hipstdpar/CMakeLists.txt | 32 +++---- test/test_copy.cpp | 60 +++++++++++++ testing/CMakeLists.txt | 44 ++++++--- testing/async/exclusive_scan/CMakeLists.txt | 13 ++- testing/async/inclusive_scan/CMakeLists.txt | 13 ++- thrust/system/hip/detail/copy_if.h | 69 +++++++++++--- 17 files changed, 338 insertions(+), 113 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index c877f217a..cb7bdfeaf 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,8 @@ Documentation for rocThrust available at ### Added +* Added a section to install Thread Building Block (TBB) inside `cmake/Dependencies.cmake` if TBB is not already available. +* Made Thread Building Block (TBB) an optional dependency with the new `BUILD_HIPSTDPAR_TEST_WITH_TBB` flag, default is `OFF`. When the flag is `OFF` and TBB is not already on the machine it will compile without TBB. Otherwise is will compile it with TBB. * Added extended tests to `rtest.py`. These tests are extra tests that did not fit the criteria of smoke and regression tests. These tests will take much longer to run relative to smoke and regression tests. Use `python rtest.py [--emulation|-e|--test|-t]=extended` to run these tests. * Added regression tests to `rtest.py`. These tests recreate scenarios that have caused hardware problems in past emulation environments. Use `python rtest.py [--emulation|-e|--test|-t]=regression` to run these tests. * Added smoke test options, which runs a subset of the unit tests and ensures that less than 2gb of VRAM will be used. Use `python rtest.py [--emulation|-e|--test|-t]=smoke` to run these tests. @@ -31,6 +33,10 @@ Documentation for rocThrust available at * Split the contents of HIPSTDPAR's forwarding header into several implementation headers. * Fixed `copy_if` to work with large data types (512 bytes) +### Known Issues +* `thrust::inclusive_scan_by_key` might produce incorrect results when it's used with -O2 or -O3 optimization. + - The error is caused by a recent compiler change. There is a fix available that will be released at a later date. + ## rocThrust 3.2.0 for ROCm 6.3 ### Added diff --git a/CMakeLists.txt b/CMakeLists.txt index fe40edcbe..56933c451 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -15,6 +15,39 @@ endif() # Note: C is required here for dependencies project(rocthrust LANGUAGES CXX C) +# Set CXX flags +if (NOT DEFINED CMAKE_CXX_STANDARD) + set(CMAKE_CXX_STANDARD 17) +endif() +set(CMAKE_CXX_STANDARD_REQUIRED ON) +set(CMAKE_CXX_EXTENSIONS OFF) + +# Set HIP flags +set(CMAKE_HIP_STANDARD 17) +set(CMAKE_HIP_STANDARD_REQUIRED ON) +set(CMAKE_HIP_EXTENSIONS OFF) + +include(CheckLanguage) +include(CMakeDependentOption) + +# Build options +# Disable -Werror +option(DISABLE_WERROR "Disable building with Werror" ON) +option(BUILD_TEST "Build tests" OFF) +option(BUILD_HIPSTDPAR_TEST "Build hipstdpar tests" OFF) +option(BUILD_HIPSTDPAR_TEST_WITH_TBB "Build hipstdpar tests with TBB" OFF) +option(BUILD_EXAMPLES "Build examples" OFF) +option(BUILD_BENCHMARKS "Build benchmarks" OFF) +option(DOWNLOAD_ROCPRIM "Download rocPRIM and do not search for rocPRIM package" OFF) +option(DOWNLOAD_ROCRAND "Download rocRAND and do not search for rocRAND package" OFF) +option(BUILD_ADDRESS_SANITIZER "Build with address sanitizer enabled" OFF) +cmake_dependent_option(ENABLE_UPSTREAM_TESTS "Enable upstream (thrust) tests" ON BUILD_TEST OFF) +#Set the header wrapper OFF by default. +option(BUILD_FILE_REORG_BACKWARD_COMPATIBILITY "Build with file/folder reorg with backward compatibility enabled" OFF) + +check_language(HIP) +cmake_dependent_option(USE_HIPCXX "Use CMake HIP language support" OFF CMAKE_HIP_COMPILER OFF) + #Adding CMAKE_PREFIX_PATH list( APPEND CMAKE_PREFIX_PATH /opt/rocm/llvm /opt/rocm ${ROCM_PATH} ) @@ -34,7 +67,6 @@ endif() set(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE CACHE BOOL "Add paths to linker search and installed rpath") -include(CMakeDependentOption) # rocm-cmake contains common cmake code for rocm projects to help # setup and install include( cmake/FindROCMCmake.cmake ) @@ -47,26 +79,30 @@ include( ROCMHeaderWrapper ) include( ROCMCheckTargetIds ) include( ROCMClients ) -# Use target ID syntax if supported for GPU_TARGETS -if (NOT DEFINED AMDGPU_TARGETS) - set(GPU_TARGETS "all" CACHE STRING "GPU architectures to compile for") +if(USE_HIPCXX) + enable_language(HIP) else() - set(GPU_TARGETS "${AMDGPU_TARGETS}" CACHE STRING "GPU architectures to compile for") -endif() -set_property(CACHE GPU_TARGETS PROPERTY STRINGS "all") - -if(GPU_TARGETS STREQUAL "all") - if(BUILD_ADDRESS_SANITIZER) - # ASAN builds require xnack - rocm_check_target_ids(DEFAULT_AMDGPU_TARGETS - TARGETS "gfx908:xnack+;gfx90a:xnack+;gfx942:xnack+;gfx950:xnack+" - ) + # Use target ID syntax if supported for GPU_TARGETS + if (NOT DEFINED AMDGPU_TARGETS) + set(GPU_TARGETS "all" CACHE STRING "GPU architectures to compile for") else() - rocm_check_target_ids(DEFAULT_AMDGPU_TARGETS - TARGETS "gfx803;gfx900:xnack-;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx942;gfx950;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1200;gfx1201" - ) - endif() - set(GPU_TARGETS "${DEFAULT_AMDGPU_TARGETS}" CACHE STRING "GPU architectures to compile for" FORCE) + set(GPU_TARGETS "${AMDGPU_TARGETS}" CACHE STRING "GPU architectures to compile for") + endif() + set_property(CACHE GPU_TARGETS PROPERTY STRINGS "all") + + if(GPU_TARGETS STREQUAL "all") + if(BUILD_ADDRESS_SANITIZER) + # ASAN builds require xnack + rocm_check_target_ids(DEFAULT_AMDGPU_TARGETS + TARGETS "gfx908:xnack+;gfx90a:xnack+;gfx942:xnack+;gfx950:xnack+" + ) + else() + rocm_check_target_ids(DEFAULT_AMDGPU_TARGETS + TARGETS "gfx803;gfx900:xnack-;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx942;gfx950;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1200;gfx1201" + ) + endif() + set(GPU_TARGETS "${DEFAULT_AMDGPU_TARGETS}" CACHE STRING "GPU architectures to compile for" FORCE) + endif() endif() # Get dependencies @@ -76,18 +112,6 @@ include(cmake/Dependencies.cmake) if (NOT WIN32) include(cmake/VerifyCompiler.cmake) endif() -# Build options -# Disable -Werror -option(DISABLE_WERROR "Disable building with Werror" ON) -option(BUILD_TEST "Build tests" OFF) -option(BUILD_HIPSTDPAR_TEST "Build hipstdpar tests" OFF) -option(BUILD_EXAMPLES "Build examples" OFF) -option(BUILD_BENCHMARKS "Build benchmarks" OFF) -option(DOWNLOAD_ROCPRIM "Download rocPRIM and do not search for rocPRIM package" OFF) -option(BUILD_ADDRESS_SANITIZER "Build with address sanitizer enabled" OFF) -cmake_dependent_option(ENABLE_UPSTREAM_TESTS "Enable upstream (thrust) tests" ON BUILD_TEST OFF) -#Set the header wrapper OFF by default. -option(BUILD_FILE_REORG_BACKWARD_COMPATIBILITY "Build with file/folder reorg with backward compatibility enabled" OFF) set(RNG_SEED_COUNT 0 CACHE STRING "Number of true random sequences to test each input size for") set(PRNG_SEEDS 1 CACHE STRING "Seeds of pseudo random sequences to test each input size for") @@ -106,17 +130,10 @@ if (NOT THRUST_HOST_SYSTEM IN_LIST THRUST_HOST_SYSTEM_OPTIONS) ) endif () -# Set CXX flags -if (NOT DEFINED CMAKE_CXX_STANDARD) - set(CMAKE_CXX_STANDARD 17) -endif() -set(CMAKE_CXX_STANDARD_REQUIRED ON) -set(CMAKE_CXX_EXTENSIONS OFF) - if(DISABLE_WERROR) - set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra ") + add_compile_options(-Wall -Wextra) else() - set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -Werror") + add_compile_options(-Wall -Wextra -Werror) endif() if (CMAKE_CXX_STANDARD EQUAL 14) @@ -147,7 +164,7 @@ print_configuration_summary() # Thrust (with HIP backend) add_subdirectory(thrust) -if(BUILD_TEST OR BUILD_BENCHMARKS) +if(BUILD_TEST OR BUILD_BENCHMARKS OR BUILD_HIPSTDPAR_TEST) rocm_package_setup_component(clients) endif() diff --git a/README.md b/README.md index 3093c297a..83325749f 100644 --- a/README.md +++ b/README.md @@ -282,8 +282,7 @@ When compiling with the proper flags (see [LLVM (AMD's fork) docs](https://githu HIPSTDPAR is currently packaged along rocThrust. The `hipstdpar` package is set up as a virtual package provided by `rocthrust`, so the latter needs to be installed entirely for getting HIPSTDPAR's headers. Conversely, installing the `rocthrust` package will also include HIPSTDPAR's headers in the system. ### Tests - -rocThrust also includes some tests for checking the correct building of HIPSTDPAR implementations. These are located under the [tests/hipstdpar](/test/hipstdpar/) folder. When configuring the project with the `BUILD_TEST` option on, these tests will also be enabled. Additionally, one can configure **only** HIPSTDPAR's tests by disabling `BUILD_TEST` and enabling `BUILD_HIPSTDPAR_TEST`. In general, the following steps can be followed for building and running the tests: +rocThrust also includes tests to check the correct building of HIPSTDPAR implementations. They are located in the [tests/hipstdpar](/test/hipstdpar/) folder. When configuring the project with the `BUILD_TEST` option, these tests will not be enabled by default. To enable them, set `BUILD_HIPSTDPAR_TEST=ON`. Additionally, you can configure only HIPSTDPAR's tests by disabling `BUILD_TEST` and enabling `BUILD_HIPSTDPAR_TEST`. In general, the following steps can be followed for building and running the tests: ```sh git clone https://github.com/ROCm/rocThrust @@ -292,7 +291,8 @@ git clone https://github.com/ROCm/rocThrust cd rocThrust; mkdir build; cd build # Configure rocThrust. -[CXX=hipcc] cmake ../. -D BUILD_TEST=ON # Configure rocThrust's and HIPSTDPAR's tests. +[CXX=hipcc] cmake ../. -D BUILD_TEST=ON # Configure rocThrust's tests. +[CXX=hipcc] cmake ../. -D BUILD_TEST=ON -D BUILD_HIPSTDPAR_TEST=ON # Configure both rocThrust's tests and HIPSTDPAR's tests. [CXX=hipcc] cmake ../. -D BUILD_TEST=OFF -D BUILD_HIPSTDPAR_TEST=ON # Only configure HIPSTDPAR's tests. # Build diff --git a/cmake/Benchmarks.cmake b/cmake/Benchmarks.cmake index 7e9a3d5bd..e74f9bb43 100644 --- a/cmake/Benchmarks.cmake +++ b/cmake/Benchmarks.cmake @@ -37,10 +37,17 @@ endfunction() # Registers a .cu as C++ rocThrust benchmark function(add_thrust_benchmark BENCHMARK_NAME BENCHMARK_SOURCE NOT_INTERNAL) set(BENCHMARK_TARGET "benchmark_thrust_${BENCHMARK_NAME}") - set_source_files_properties(${BENCHMARK_SOURCE} - PROPERTIES - LANGUAGE CXX - ) + if(USE_HIPCXX) + set_source_files_properties(${BENCHMARK_SOURCE} + PROPERTIES + LANGUAGE HIP + ) + else() + set_source_files_properties(${BENCHMARK_SOURCE} + PROPERTIES + LANGUAGE CXX + ) + endif() add_executable(${BENCHMARK_TARGET} ${BENCHMARK_SOURCE}) target_link_libraries(${BENCHMARK_TARGET} diff --git a/cmake/Dependencies.cmake b/cmake/Dependencies.cmake index 54225eed3..b0bd252f2 100644 --- a/cmake/Dependencies.cmake +++ b/cmake/Dependencies.cmake @@ -35,10 +35,11 @@ if(NOT rocprim_FOUND) endif() # Test dependencies -if(BUILD_TEST) +if(BUILD_TEST OR BUILD_HIPSTDPAR_TEST) if(NOT DEPENDENCIES_FORCE_DOWNLOAD) # Google Test (https://github.com/google/googletest) find_package(GTest QUIET) + find_package(TBB QUIET) else() message(STATUS "Force installing GTest.") endif() @@ -63,6 +64,27 @@ if(BUILD_TEST) find_package(GTest REQUIRED CONFIG PATHS ${GTEST_ROOT}) endif() + if (NOT TARGET TBB::tbb AND NOT TARGET tbb AND BUILD_HIPSTDPAR_TEST_WITH_TBB) + message(STATUS "TBB not found or force download TBB on. Downloading and building TBB.") + set(TBB_ROOT ${CMAKE_CURRENT_BINARY_DIR}/deps/tbb CACHE PATH "" FORCE) + + download_project( + PROJ TBB + GIT_REPOSITORY https://github.com/oneapi-src/oneTBB.git + GIT_TAG 1c4c93fc5398c4a1acb3492c02db4699f3048dea # v2021.13.0 + INSTALL_DIR ${TBB_ROOT} + CMAKE_ARGS -DCMAKE_CXX_COMPILER=g++ -DTBB_TEST=OFF -DTBB_BUILD=ON -DTBB_INSTALL=ON -DTBBMALLOC_PROXY_BUILD=OFF -DCMAKE_INSTALL_PREFIX= + LOG_DOWNLOAD TRUE + LOG_CONFIGURE TRUE + LOG_BUILD TRUE + LOG_INSTALL TRUE + BUILD_PROJECT TRUE + UPDATE_DISCONNECTED TRUE + ) + find_package(TBB REQUIRED CONFIG PATHS ${TBB_ROOT}) + + endif() + # SQlite (for run-to-run bitwise-reproducibility tests) # Note: SQLite 3.36.0 enabled the backup API by default, which we need # for cache serialization. We also want to use a static SQLite, diff --git a/cmake/DownloadProject.CMakeLists.cmake.in b/cmake/DownloadProject.CMakeLists.cmake.in index 5546c03a3..8a90d2ab1 100644 --- a/cmake/DownloadProject.CMakeLists.cmake.in +++ b/cmake/DownloadProject.CMakeLists.cmake.in @@ -10,14 +10,12 @@ if(${DL_ARGS_BUILD_PROJECT}) ExternalProject_Add(${DL_ARGS_PROJ}-download ${DL_ARGS_UNPARSED_ARGUMENTS} SOURCE_DIR "${DL_ARGS_SOURCE_DIR}" - BUILD_IN_SOURCE TRUE TEST_COMMAND "" ) else() ExternalProject_Add(${DL_ARGS_PROJ}-download ${DL_ARGS_UNPARSED_ARGUMENTS} SOURCE_DIR "${DL_ARGS_SOURCE_DIR}" - BUILD_IN_SOURCE TRUE TEST_COMMAND "" UPDATE_COMMAND "" CONFIGURE_COMMAND "" diff --git a/cmake/Summary.cmake b/cmake/Summary.cmake index 62ccfbc3f..6c061238e 100644 --- a/cmake/Summary.cmake +++ b/cmake/Summary.cmake @@ -26,22 +26,34 @@ function (print_configuration_summary) message(STATUS "General:") message(STATUS " System : ${CMAKE_SYSTEM_NAME}") message(STATUS " HIP ROOT : ${HIP_ROOT_DIR}") +if(USE_HIPCXX) + message(STATUS " HIP compiler : ${CMAKE_HIP_COMPILER}") + message(STATUS " HIP compiler version : ${CMAKE_HIP_COMPILER_VERSION}") + string(STRIP "${CMAKE_HIP_FLAGS}" CMAKE_HIP_FLAGS_STRIP) + message(STATUS " HIP flags : ${CMAKE_HIP_FLAGS_STRIP}") +else() message(STATUS " C++ compiler : ${CMAKE_CXX_COMPILER}") message(STATUS " C++ compiler version : ${CMAKE_CXX_COMPILER_VERSION}") string(STRIP "${CMAKE_CXX_FLAGS}" CMAKE_CXX_FLAGS_STRIP) message(STATUS " CXX flags : ${CMAKE_CXX_FLAGS_STRIP}") +endif() message(STATUS " Build type : ${CMAKE_BUILD_TYPE}") message(STATUS " Install prefix : ${CMAKE_INSTALL_PREFIX}") if(HIP_COMPILER STREQUAL "clang") +if(USE_HIPCXX) + message(STATUS " Device targets : ${CMAKE_HIP_ARCHITECTURES}") +else() message(STATUS " Device targets : ${GPU_TARGETS}") +endif() endif() message(STATUS "") - message(STATUS " DISABLE_WERROR : ${DISABLE_WERROR}") - message(STATUS " DOWNLOAD_ROCPRIM : ${DOWNLOAD_ROCPRIM}") - message(STATUS " DOWNLOAD_ROCRAND : ${DOWNLOAD_ROCRAND}") - message(STATUS " BUILD_TEST : ${BUILD_TEST}") - message(STATUS " BUILD_HIPSTDPAR_TEST : ${BUILD_HIPSTDPAR_TEST}") - message(STATUS " BUILD_EXAMPLES : ${BUILD_EXAMPLES}") - message(STATUS " BUILD_BENCHMARKS : ${BUILD_BENCHMARKS}") - message(STATUS " BUILD_ADDRESS_SANITIZER : ${BUILD_ADDRESS_SANITIZER}") + message(STATUS " DISABLE_WERROR : ${DISABLE_WERROR}") + message(STATUS " DOWNLOAD_ROCPRIM : ${DOWNLOAD_ROCPRIM}") + message(STATUS " DOWNLOAD_ROCRAND : ${DOWNLOAD_ROCRAND}") + message(STATUS " BUILD_TEST : ${BUILD_TEST}") + message(STATUS " BUILD_HIPSTDPAR_TEST : ${BUILD_HIPSTDPAR_TEST}") + message(STATUS " BUILD_HIPSTDPAR_TEST_WITH_TBB : ${BUILD_HIPSTDPAR_TEST_WITH_TBB}") + message(STATUS " BUILD_EXAMPLES : ${BUILD_EXAMPLES}") + message(STATUS " BUILD_BENCHMARKS : ${BUILD_BENCHMARKS}") + message(STATUS " BUILD_ADDRESS_SANITIZER : ${BUILD_ADDRESS_SANITIZER}") endfunction() diff --git a/cmake/VerifyCompiler.cmake b/cmake/VerifyCompiler.cmake index 13749152d..0930d6132 100644 --- a/cmake/VerifyCompiler.cmake +++ b/cmake/VerifyCompiler.cmake @@ -26,8 +26,14 @@ find_package(hip REQUIRED CONFIG PATHS /opt/rocm) if(HIP_COMPILER STREQUAL "nvcc") message(FATAL_ERROR "rocThrust does not support the CUDA backend.") elseif(HIP_COMPILER STREQUAL "clang") - if(NOT (CMAKE_CXX_COMPILER MATCHES ".*hipcc$" OR CMAKE_CXX_COMPILER MATCHES ".*clang\\+\\+")) - message(FATAL_ERROR "On ROCm platform 'hipcc' or HIP-aware Clang must be used as C++ compiler.") + if(USE_HIPCXX) + if(NOT (CMAKE_HIP_COMPILER MATCHES ".*hipcc$" OR CMAKE_HIP_COMPILER MATCHES ".*clang\\+\\+")) + message(FATAL_ERROR "On ROCm platform 'hipcc' or HIP-aware Clang must be used as HIP compiler.") + endif() + else() + if(NOT (CMAKE_CXX_COMPILER MATCHES ".*hipcc$" OR CMAKE_CXX_COMPILER MATCHES ".*clang\\+\\+")) + message(FATAL_ERROR "On ROCm platform 'hipcc' or HIP-aware Clang must be used as C++ compiler.") + endif() endif() else() message(FATAL_ERROR "HIP_COMPILER must be `clang` (AMD ROCm platform)") diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt index 27c959fad..b36063ce4 100644 --- a/examples/CMakeLists.txt +++ b/examples/CMakeLists.txt @@ -9,10 +9,17 @@ endif() function(add_thrust_example EXAMPLE) set(EXAMPLE_SOURCE "${EXAMPLE}.cu") set(EXAMPLE_TARGET "example_thrust_${EXAMPLE}") - set_source_files_properties(${EXAMPLE_SOURCE} + if(USE_HIPCXX) + set_source_files_properties(${EXAMPLE_SOURCE} PROPERTIES - LANGUAGE CXX - ) + LANGUAGE HIP + ) + else() + set_source_files_properties(${EXAMPLE_SOURCE} + PROPERTIES + LANGUAGE CXX + ) + endif() add_executable(${EXAMPLE_TARGET} ${EXAMPLE_SOURCE}) target_link_libraries(${EXAMPLE_TARGET} @@ -45,7 +52,7 @@ function(add_thrust_example EXAMPLE) foreach( file_i ${third_party_dlls}) add_custom_command( TARGET ${EXAMPLE_TARGET} POST_BUILD COMMAND ${CMAKE_COMMAND} ARGS -E copy_if_different ${file_i} ${PROJECT_BINARY_DIR}/examples ) endforeach( file_i ) - endif() + endif() endfunction() diff --git a/extra/CMakeLists.txt b/extra/CMakeLists.txt index 3ef4a5886..6949b7230 100644 --- a/extra/CMakeLists.txt +++ b/extra/CMakeLists.txt @@ -68,6 +68,15 @@ enable_testing() function(add_rocthrust_test TEST_NAME TEST_SOURCES) list(GET TEST_SOURCES 0 TEST_MAIN_SOURCE) get_filename_component(TEST_TARGET ${TEST_MAIN_SOURCE} NAME_WE) + + foreach(test_file ${TEST_SOURCES}) + if(USE_HIPCXX) + set_source_files_properties(${test_file} PROPERTIES LANGUAGE HIP) + else() + set_source_files_properties(${test_file} PROPERTIES LANGUAGE CXX) + endif() + endforeach() + add_executable(${TEST_TARGET} ${TEST_SOURCES}) target_link_libraries(${TEST_TARGET} PRIVATE diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt index 1bd8a8bdd..ec5145908 100644 --- a/test/CMakeLists.txt +++ b/test/CMakeLists.txt @@ -49,6 +49,10 @@ endfunction() function(add_rocthrust_test TEST) set(TEST_SOURCE "test_${TEST}.cpp") + if(USE_HIPCXX) + set_source_files_properties(${TEST_SOURCE} PROPERTIES LANGUAGE HIP) + endif() + set(TEST_TARGET "${TEST}.hip") add_executable(${TEST_TARGET} ${TEST_SOURCE}) target_include_directories(${TEST_TARGET} SYSTEM BEFORE @@ -245,7 +249,7 @@ if(BUILD_TEST) endif() # hipstdpar tests -if(BUILD_TEST OR BUILD_HIPSTDPAR_TEST) +if(BUILD_HIPSTDPAR_TEST) if(WIN32) message( STATUS diff --git a/test/hipstdpar/CMakeLists.txt b/test/hipstdpar/CMakeLists.txt index c93ca7222..fa9479f0f 100644 --- a/test/hipstdpar/CMakeLists.txt +++ b/test/hipstdpar/CMakeLists.txt @@ -4,6 +4,9 @@ function(add_hipstdpar_test TEST TEST_TYPE INTERPOSE_ALLOC) set(TEST_SOURCE "test_${TEST}.cpp") + if(USE_HIPCXX) + set_source_files_properties(${TEST_SOURCE} PROPERTIES LANGUAGE HIP) + endif() set(TEST_TARGET "${TEST}_${TEST_TYPE}.hip") add_executable(${TEST_TARGET} ${TEST_SOURCE}) @@ -11,7 +14,8 @@ function(add_hipstdpar_test TEST TEST_TYPE INTERPOSE_ALLOC) PRIVATE --hipstdpar --hipstdpar-path=${HIPSTDPAR_LOCATION} - --hipstdpar-thrust-path=${THRUST_LOCATION}) + --hipstdpar-thrust-path=${THRUST_LOCATION} + --hipstdpar-prim-path=${ROCPRIM_LOCATION}) if(INTERPOSE_ALLOC) target_compile_options(${TEST_TARGET} PRIVATE @@ -21,9 +25,15 @@ function(add_hipstdpar_test TEST TEST_TYPE INTERPOSE_ALLOC) target_link_libraries(${TEST_TARGET} PRIVATE --hipstdpar - TBB::tbb Threads::Threads ) + + if(TARGET TBB::tbb OR TARGET tbb) + target_link_libraries(${TEST_TARGET} + PRIVATE + TBB::tbb + ) + endif() if (NOT WIN32) foreach(gpu_target ${GPU_TARGETS}) @@ -76,26 +86,12 @@ set(ROCTHRUST_CMAKE_CXX_STANDARD ${CMAKE_CXX_STANDARD}) set(CMAKE_CXX_STANDARD 17) # Dependencies -find_package(TBB QUIET) -if(NOT TARGET TBB::tbb AND NOT TARGET tbb) - message(STATUS "Thread Building Blocks not found. Fetching...") - FetchContent_Declare( - thread-building-blocks - GIT_REPOSITORY https://github.com/oneapi-src/oneTBB.git - GIT_TAG 1c4c93fc5398c4a1acb3492c02db4699f3048dea # v2021.13.0 - ) - # Disable tests for TBB - set(TBB_TEST OFF CACHE BOOL "Disable TBB tests" FORCE) - - FetchContent_MakeAvailable(thread-building-blocks) -else() - find_package(TBB REQUIRED) -endif() find_package(Threads REQUIRED) -# Define where to find rocThrust and hipstdpar headers +# Define where to find rocThrust, hipstdpar and rocPRIM headers set(THRUST_LOCATION ${PROJECT_SOURCE_DIR}) set(HIPSTDPAR_LOCATION ${THRUST_LOCATION}/thrust/system/hip/hipstdpar) +set(ROCPRIM_LOCATION ${ROCPRIM_INCLUDE_DIR}) # Add tests add_hipstdpar_test("algorithms" "compile" OFF) diff --git a/test/test_copy.cpp b/test/test_copy.cpp index ac9935cf7..d92227bf8 100644 --- a/test/test_copy.cpp +++ b/test/test_copy.cpp @@ -444,6 +444,66 @@ TYPED_TEST(CopyIntegerTests, TestCopyIf) } } +TEST(CopyLargeTypesTests, TestCopyIfStencilLargeType) +{ + using T = large_data; + + SCOPED_TRACE(testing::Message() << "with device_id= " << test::set_device_from_ctest()); + + for(auto size : get_sizes()) + { + SCOPED_TRACE(testing::Message() << "with size= " << size); + + thrust::host_vector h_data(size); + thrust::sequence(h_data.begin(), h_data.end()); + thrust::device_vector d_data(size); + thrust::sequence(d_data.begin(), d_data.end()); + + for(auto seed : get_seeds()) + { + SCOPED_TRACE(testing::Message() << "with seed= " << seed); + + thrust::host_vector h_stencil = get_random_data(size, std::numeric_limits::min(), std::numeric_limits::max(), seed);; + thrust::device_vector d_stencil = h_stencil; + + typename thrust::host_vector::iterator h_new_end; + typename thrust::device_vector::iterator d_new_end; + + // test with Predicate that returns a bool + { + thrust::host_vector h_result(size); + thrust::device_vector d_result(size); + + h_new_end + = thrust::copy_if(h_data.begin(), h_data.end(), h_stencil.begin(), h_result.begin(), is_even()); + d_new_end + = thrust::copy_if(d_data.begin(), d_data.end(), d_stencil.begin(), d_result.begin(), is_even()); + + h_result.resize(h_new_end - h_result.begin()); + d_result.resize(d_new_end - d_result.begin()); + + ASSERT_EQ(h_result, d_result); + } + + // test with Predicate that returns a non-bool + { + thrust::host_vector h_result(size); + thrust::device_vector d_result(size); + + h_new_end + = thrust::copy_if(h_data.begin(), h_data.end(), h_stencil.begin(), h_result.begin(), mod_3()); + d_new_end + = thrust::copy_if(d_data.begin(), d_data.end(), d_stencil.begin(), d_result.begin(), mod_3()); + + h_result.resize(h_new_end - h_result.begin()); + d_result.resize(d_new_end - d_result.begin()); + + ASSERT_EQ(h_result, d_result); + } + } + } +} + TYPED_TEST(CopyIntegerTests, TestCopyIfStencil) { using T = typename TestFixture::input_type; diff --git a/testing/CMakeLists.txt b/testing/CMakeLists.txt index 03aa0a39b..d89d35cfc 100644 --- a/testing/CMakeLists.txt +++ b/testing/CMakeLists.txt @@ -32,17 +32,21 @@ foreach(gpu_target ${GPU_TARGETS}) ) endforeach() -if(NOT CMAKE_VERSION VERSION_LESS 3.13) +if(USE_HIPCXX) set_source_files_properties(unittest/testframework.cu PROPERTIES - LANGUAGE CXX + LANGUAGE HIP ) - add_library(testframework OBJECT unittest/testframework.cu) -else() # Workaround +else() set_source_files_properties(unittest/testframework.cu PROPERTIES - LANGUAGE CXX + LANGUAGE CXX ) +endif() + +if(NOT CMAKE_VERSION VERSION_LESS 3.13) + add_library(testframework OBJECT unittest/testframework.cu) +else() # Workaround add_library(testframework STATIC unittest/testframework.cu) endif() @@ -53,10 +57,17 @@ if(HIP_COMPILER STREQUAL "nvcc") ) target_sources(testframework PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/unittest/cuda/testframework.cu) elseif(HIP_COMPILER STREQUAL "clang") - set_source_files_properties(unittest/hip/testframework.cu - PROPERTIES - LANGUAGE CXX - ) + if(USE_HIPCXX) + set_source_files_properties(unittest/hip/testframework.cu + PROPERTIES + LANGUAGE HIP + ) + else() + set_source_files_properties(unittest/hip/testframework.cu + PROPERTIES + LANGUAGE CXX + ) + endif() target_sources(testframework PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/unittest/hip/testframework.cu) endif() @@ -76,10 +87,17 @@ function(add_thrust_test TEST) # This code path used to add "-x c++" to the compiler flags too, but that was # superfluous as "-x hip" was already present on the command-line due to linking to # hip::device transitively from roc::rocprim_hip. - set_source_files_properties(${TEST_SOURCE} - PROPERTIES - LANGUAGE CXX - ) + if(USE_HIPCXX) + set_source_files_properties(${TEST_SOURCE} + PROPERTIES + LANGUAGE HIP + ) + else() + set_source_files_properties(${TEST_SOURCE} + PROPERTIES + LANGUAGE CXX + ) + endif() if(NOT CMAKE_VERSION VERSION_LESS 3.13) add_executable(${TEST_TARGET} ${TEST_SOURCE} $) target_link_libraries(${TEST_TARGET} diff --git a/testing/async/exclusive_scan/CMakeLists.txt b/testing/async/exclusive_scan/CMakeLists.txt index 10fa820dc..f9e3ae7fe 100644 --- a/testing/async/exclusive_scan/CMakeLists.txt +++ b/testing/async/exclusive_scan/CMakeLists.txt @@ -9,10 +9,17 @@ function(add_thrust_test_exclusive TEST) # This code path used to add "-x c++" to the compiler flags too, but that was # superfluous as "-x hip" was already present on the command-line due to linking to # hip::device transitively from roc::rocprim_hip. - set_source_files_properties(${TEST_SOURCE} - PROPERTIES + if(USE_HIPCXX) + set_source_files_properties(${TEST_SOURCE} + PROPERTIES + LANGUAGE HIP + ) + else() + set_source_files_properties(${TEST_SOURCE} + PROPERTIES LANGUAGE CXX - ) + ) + endif() if(NOT CMAKE_VERSION VERSION_LESS 3.13) add_executable(${TEST_TARGET} ${TEST_SOURCE} $) target_link_libraries(${TEST_TARGET} diff --git a/testing/async/inclusive_scan/CMakeLists.txt b/testing/async/inclusive_scan/CMakeLists.txt index 1a419b2f7..9b771fb5e 100644 --- a/testing/async/inclusive_scan/CMakeLists.txt +++ b/testing/async/inclusive_scan/CMakeLists.txt @@ -9,10 +9,17 @@ function(add_thrust_test_inclusive TEST) # This code path used to add "-x c++" to the compiler flags too, but that was # superfluous as "-x hip" was already present on the command-line due to linking to # hip::device transitively from roc::rocprim_hip. - set_source_files_properties(${TEST_SOURCE} - PROPERTIES + if(USE_HIPCXX) + set_source_files_properties(${TEST_SOURCE} + PROPERTIES + LANGUAGE HIP + ) + else() + set_source_files_properties(${TEST_SOURCE} + PROPERTIES LANGUAGE CXX - ) + ) + endif() if(NOT CMAKE_VERSION VERSION_LESS 3.13) add_executable(${TEST_TARGET} ${TEST_SOURCE} $) target_link_libraries(${TEST_TARGET} diff --git a/thrust/system/hip/detail/copy_if.h b/thrust/system/hip/detail/copy_if.h index 4e6d25e35..bbf121390 100644 --- a/thrust/system/hip/detail/copy_if.h +++ b/thrust/system/hip/detail/copy_if.h @@ -156,21 +156,29 @@ namespace __copy_if return output + num_selected; } - template + template THRUST_HIP_RUNTIME_FUNCTION auto - copy_if(execution_policy& policy, InputIt first, InputIt last, OutputIt output, Predicate predicate) + copy_if_common(execution_policy& policy, InputIt first, InputIt last, OutputIt output, Predicate predicate, PredicateInputIt predicate_input) -> std::enable_if_t::value_type) < 512), OutputIt> { using namespace thrust::system::hip_rocprim::temp_storage; using size_type = typename iterator_traits::difference_type; + using pos_type = thrust::detail::uint32_t; + using flag_type = thrust::detail::uint8_t; size_type num_items = thrust::distance(first, last); hipStream_t stream = hip_rocprim::stream(policy); bool debug_sync = THRUST_HIP_DEBUG_SYNC_FLAG; - thrust::detail::temporary_array flags(policy, num_items); + if(num_items == 0) + return output; + + // Note: although flags can be stored in a uint8_t, in the inclusive scan performed on flags below, + // the scan accumulator type to something larger (flag_type) to prevent overflow. + // For this reason, we call rocprim::inclusive_scan directly here and pass in the accumulator type as template argument. + thrust::detail::temporary_array flags(policy, num_items); - hip_rocprim::throw_on_error(rocprim::transform(first, + hip_rocprim::throw_on_error(rocprim::transform(predicate_input, flags.begin(), num_items, [predicate] __host__ __device__ (auto const & val){ return predicate(val) ? 1 : 0; }, @@ -178,27 +186,69 @@ namespace __copy_if debug_sync), "copy_if failed on transform"); - thrust::detail::temporary_array pos(policy, num_items); + thrust::detail::temporary_array pos(policy, num_items); + + // Get the required temporary storage size. + size_t storage_size = 0; + hip_rocprim::throw_on_error(rocprim::inclusive_scan::iterator, + typename thrust::detail::temporary_array::iterator, + rocprim::plus, + pos_type>(nullptr, storage_size, flags.begin(), pos.begin(), num_items, rocprim::plus{}, stream, debug_sync), + "copy_if failed while determining inclusive scan storage size"); - thrust::inclusive_scan(policy, flags.begin(), flags.end(), pos.begin()); + // Allocate temporary storage. + thrust::detail::temporary_array tmp(policy, storage_size); + void *ptr = static_cast(tmp.data().get()); + // Perform a scan on the positions. + hip_rocprim::throw_on_error(rocprim::inclusive_scan::iterator, + typename thrust::detail::temporary_array::iterator, + rocprim::plus, + pos_type>(ptr, storage_size, flags.begin(), pos.begin(), num_items, rocprim::plus{}, stream, debug_sync), + "copy_if failed on inclusive scan"); + + // Pull out the values for which the predicate evaluated to true and compact them into the output array. constexpr static size_t items_per_thread = 16; constexpr static size_t threads_per_block = 256; - const size_t block_size = std::ceil(static_cast(num_items) / 16 / threads_per_block); + const size_t block_size = std::ceil(static_cast(num_items) / items_per_thread / threads_per_block); copy_if_kernel<<>>(first, flags.begin(), pos.begin(), num_items, output); - return output + pos[num_items-1]; + return output + pos[num_items - 1]; + } + + template + THRUST_HIP_RUNTIME_FUNCTION auto + copy_if(execution_policy& policy, InputIt first, InputIt last, OutputIt output, Predicate predicate) + -> std::enable_if_t::value_type) < 512), OutputIt> + { + return copy_if_common(policy, first, last, output, predicate, first); + } + + template + THRUST_HIP_RUNTIME_FUNCTION auto + copy_if(execution_policy& policy, + InputIt first, + InputIt last, + StencilIt stencil, + OutputIt output, + Predicate predicate) + -> std::enable_if_t::value_type) < 512), OutputIt> + { + return copy_if_common(policy, first, last, output, predicate, stencil); } template - THRUST_HIP_RUNTIME_FUNCTION OutputIt + THRUST_HIP_RUNTIME_FUNCTION auto copy_if(execution_policy& policy, InputIt first, InputIt last, StencilIt stencil, OutputIt output, Predicate predicate) + -> std::enable_if_t<(sizeof(typename std::iterator_traits::value_type) < 512), OutputIt> { using namespace thrust::system::hip_rocprim::temp_storage; typedef typename iterator_traits::difference_type size_type; @@ -264,7 +314,6 @@ namespace __copy_if return output + num_selected; } - } // namespace __copy_if //-------------------------