Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for HIP backend policies (par, par_nosync) #421

Merged
merged 1 commit into from
Jun 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions docs/hip-execution-policies.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
.. meta::
:description: rocThrust documentation and API reference
:keywords: rocThrust, ROCm, API, reference, execution policy

.. _hip-execution-policies:

******************************************
Execution Policies
******************************************

In addition to the standard Thrust execution policies (eg. ``thrust::host``, ``thrust::device``, ``thrust::seq``),
rocThrust's HIP backend provides the following:

* ``hip_rocprim::par`` - This policy causes algorithms to be launched in a parallel configuration.
API calls are blocking (synchronous with respect to the host).

* ``hip_rocprim::par_nosync`` - This policy tells Thrust that algorithms may avoid synchronization
barriers when it is possible to do so. As a result, algorithms may be launched asynchronously with
respect to the host. This can allow you to perform other host-side work while the algorithms
are running on the device. If you use this policy, you must synchronize before accessing results
on the host side.

The example below illustrates the behaviour of these two policies.

.. code-block:: cpp

#include <hip/hip_runtime_api.h>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/random.h>
#include <thrust/count.h>
#include <thrust/reduce.h>
#include <thrust/system/hip/execution_policy.h>
#include <ctime>
#include <iostream>

int main(int argc, char* argv[])
{
// Allocate host and device vectors.
const size_t size = 100;
thrust::host_vector<int> h_vec(size);
thrust::device_vector<int> d_vec1(size);
thrust::device_vector<int> d_vec2(size);

// Fill host vector with random values.
const int limit = 100;
auto seed = std::time(nullptr);
thrust::default_random_engine rng(seed);
for (int i = 0; i < size; i++)
h_vec[i] = rng() % limit;

// Copy data to device vectors.
d_vec1 = h_vec;
d_vec2 = h_vec;

// Launch some algorithms using the hip_rocprim::par policy.
// The calls below are blocking with respect to the host.
// However, internally, each algorithm will run in parallel.
auto par_policy = thrust::hip_rocprim::par;
int count = thrust::count(par_policy, d_vec1.begin(), d_vec1.end(), 50);
int reduction = thrust::reduce(par_policy, d_vec2.begin(), d_vec2.end());

// Print out the results.
std::cout << "par results:" << std::endl;
std::cout << "count: " << count << std::endl;
std::cout << "reduction: " << reduction << std::endl;

// Launch the algorithms using the hip_rocprim::par_nosync policy.
// These calls may not be blocking with respect to the host.
auto nosync_policy = thrust::hip_rocprim::par_nosync;
int count2 = thrust::count(nosync_policy, d_vec1.begin(), d_vec1.end(), 50);
int reduction2 = thrust::reduce(nosync_policy, d_vec2.begin(), d_vec2.end());

// We can perform other host-side work here, and it may overlap with the
// algorithms launched above.
DoHostSideWork();

// We must synchronize before accessing the results on the host.
hipDeviceSynchronize();

// Print out the results.
std::cout << "par_nosync results:" << std::endl;
std::cout << "count: " << count2 << std::endl;
std::cout << "reduction: " << reduction2 << std::endl;

return 0;
}
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ The documentation is structured as follows:
* :ref:`data-type-support`
* :ref:`bitwise-repro`
* :ref:`api-reference`
* :ref:`hip-execution-policies`
* :ref:`genindex`

To contribute to the documentation, refer to
Expand Down