Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Replace terms.md page with page that provides example of API syntax mapping #3726

Merged
merged 1 commit into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 2 additions & 33 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,39 +65,8 @@ platforms.
Additional porting might be required to deal with architecture feature
queries or CUDA capabilities that HIP doesn't support.

How does HIP compare with OpenCL?
---------------------------------

HIP offers several benefits over OpenCL:

* Device code can be written in modern C++, including templates, lambdas,
classes and so on.
* Host and device code can be mixed in the source files.
* The HIP API is less verbose than OpenCL and is familiar to CUDA developers.
* Porting from CUDA to HIP is significantly easier than from CUDA to OpenCL.
* HIP uses development tools specialized for each platform: :doc:`amdclang++ <llvm-project:index>`
for AMD GPUs or `nvcc <https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html>`_
for NVIDIA GPUs, and profilers like :doc:`ROCm Compute Profiler <rocprofiler-compute:index>` or
`Nsight Systems <https://developer.nvidia.com/nsight-systems>`_.
* HIP provides
* pointers and host-side pointer arithmetic.
* device-level control over memory allocation and placement.
* an offline compilation model.

How does porting CUDA to HIP compare to porting CUDA to OpenCL?
---------------------------------------------------------------

OpenCL differs from HIP and CUDA when considering the host runtime,
but even more so when considering the kernel code.
The HIP device code is a C++ dialect, while OpenCL is C99-based.
OpenCL does not support single-source compilation.

As a result, the OpenCL syntax differs significantly from HIP, and porting tools
must perform complex transformations, especially regarding templates or other
C++ features in kernels.

To better understand the syntax differences, see :doc:`here<reference/terms>` or
the :doc:`HIP porting guide <how-to/hip_porting_guide>`.
To better understand the syntax differences, see :doc:`CUDA to HIP API Function Comparison <reference/api_syntax>`
or the :doc:`HIP porting guide <how-to/hip_porting_guide>`.

Can I install CUDA and ROCm on the same machine?
------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ The HIP documentation is organized into the following categories:
* [HSA runtime API for ROCm](./reference/virtual_rocr)
* [HIP math API](./reference/math_api)
* [HIP environment variables](./reference/env_variables)
* [Comparing syntax for different APIs](./reference/terms)
* [CUDA to HIP API Function Comparison](./reference/api_syntax)
* [List of deprecated APIs](./reference/deprecated_api_list)
* [FP8 numbers in HIP](./reference/fp8_numbers)
* {doc}`./reference/hardware_features`
Expand Down
176 changes: 176 additions & 0 deletions docs/reference/api_syntax.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
.. meta::
:description: Maps CUDA API syntax to HIP API syntax with an example
:keywords: AMD, ROCm, HIP, CUDA, syntax, HIP syntax

********************************************************************************
CUDA to HIP API Function Comparison
********************************************************************************

This page introduces key syntax differences between CUDA and HIP APIs with a focused code
example and comparison table. For a complete list of mappings, visit :ref:`HIPIFY <HIPIFY:index>`.

The following CUDA code example illustrates several CUDA API syntaxes.

.. code-block:: cpp

#include <iostream>
#include <vector>
#include <cuda_runtime.h>

__global__ void block_reduction(const float* input, float* output, int num_elements)
{
extern __shared__ float s_data[];

int tid = threadIdx.x;
int global_id = blockDim.x * blockIdx.x + tid;

if (global_id < num_elements)
{
s_data[tid] = input[global_id];
}
else
{
s_data[tid] = 0.0f;
}
__syncthreads();

for (int stride = blockDim.x / 2; stride > 0; stride >>= 1)
{
if (tid < stride)
{
s_data[tid] += s_data[tid + stride];
}
__syncthreads();
}

if (tid == 0)
{
output[blockIdx.x] = s_data[0];
}
}

int main()
{
int threads = 256;
const int num_elements = 50000;

std::vector<float> h_a(num_elements);
std::vector<float> h_b((num_elements + threads - 1) / threads);

for (int i = 0; i < num_elements; ++i)
{
h_a[i] = rand() / static_cast<float>(RAND_MAX);
}

float *d_a, *d_b;
cudaMalloc(&d_a, h_a.size() * sizeof(float));
cudaMalloc(&d_b, h_b.size() * sizeof(float));

cudaStream_t stream;
cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking);

cudaEvent_t start_event, stop_event;
cudaEventCreate(&start_event);
cudaEventCreate(&stop_event);

cudaMemcpyAsync(d_a, h_a.data(), h_a.size() * sizeof(float), cudaMemcpyHostToDevice, stream);

cudaEventRecord(start_event, stream);

int blocks = (num_elements + threads - 1) / threads;
block_reduction<<<blocks, threads, threads * sizeof(float), stream>>>(d_a, d_b, num_elements);

cudaMemcpyAsync(h_b.data(), d_b, h_b.size() * sizeof(float), cudaMemcpyDeviceToHost, stream);

cudaEventRecord(stop_event, stream);
cudaEventSynchronize(stop_event);

cudaEventElapsedTime(&milliseconds, start_event, stop_event);
std::cout << "Kernel execution time: " << milliseconds << " ms\n";

cudaFree(d_a);
cudaFree(d_b);

cudaEventDestroy(start_event);
cudaEventDestroy(stop_event);
cudaStreamDestroy(stream);

return 0;
}

The following table maps CUDA API functions to corresponding HIP API functions, as demonstrated in the
preceding code examples.

.. list-table::
:header-rows: 1
:name: syntax-mapping-table

*
- CUDA
- HIP

*
- ``#include <cuda_runtime.h>``
- ``#include <hip/hip_runtime.h>``

*
- ``cudaError_t``
- ``hipError_t``

*
- ``cudaEvent_t``
- ``hipEvent_t``

*
- ``cudaStream_t``
- ``hipStream_t``

*
- ``cudaMalloc``
- ``hipMalloc``

*
- ``cudaStreamCreateWithFlags``
- ``hipStreamCreateWithFlags``

*
- ``cudaStreamNonBlocking``
- ``hipStreamNonBlocking``

*
- ``cudaEventCreate``
- ``hipEventCreate``

*
- ``cudaMemcpyAsync``
- ``hipMemcpyAsync``

*
- ``cudaMemcpyHostToDevice``
- ``hipMemcpyHostToDevice``

*
- ``cudaEventRecord``
- ``hipEventRecord``

*
- ``cudaEventSynchronize``
- ``hipEventSynchronize``

*
- ``cudaEventElapsedTime``
- ``hipEventElapsedTime``

*
- ``cudaFree``
- ``hipFree``

*
- ``cudaEventDestroy``
- ``hipEventDestroy``

*
- ``cudaStreamDestroy``
- ``hipStreamDestroy``

In summary, this comparison highlights the primary differences between CUDA and HIP APIs.
44 changes: 0 additions & 44 deletions docs/reference/terms.md

This file was deleted.

3 changes: 1 addition & 2 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,7 @@ subtrees:
- file: reference/virtual_rocr
- file: reference/math_api
- file: reference/env_variables
- file: reference/terms
title: Comparing syntax for different APIs
- file: reference/api_syntax
- file: reference/deprecated_api_list
title: List of deprecated APIs
- file: reference/fp8_numbers
Expand Down