Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HIP-RT support for rendering on AMD GPUs #473

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jammm
Copy link
Contributor

@jammm jammm commented Dec 30, 2024

This PR adds AMD GPU rendering support for PBRTv4 via HIP-RT. HIP-RT also runs on NVIDIA GPUs and also provides the H-PLOC BVH builder. More details can be found on HIP-RT website and its paper.

It also supports interactive mode for AMD GPUs via. HIP-OpenGL interop. Instructions on how to build on Windows and Linux can be found in the README.

FWIW, the build seems to break on my windows laptop if I use the master branch's ptex commit. However, it seems to build fine with the older commit 4cd8e9a6db2b06e478dfbbd8c26eb6df97f84483. This PR might fix it #472

cc @meistdan @mmp @wjakob

meistdan and others added 2 commits December 30, 2024 13:37
@jammm jammm changed the title Add HIP-RT support for AMD and NVIDIA GPUs Add HIP-RT support for rendering on AMD GPUs Dec 30, 2024
Copy link
Owner

@mmp mmp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comments. Looks good in general! More to come in the coming days (though feel free to start addressing these!)

@@ -601,7 +600,8 @@ struct : public PtexErrorHandler {
PtexTextureBase::PtexTextureBase(const std::string &filename, ColorEncoding encoding,
Float scale)
: filename(filename), encoding(encoding), scale(scale) {
ptexMutex.lock();
std::mutex mutex;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work: each thread calling into this will get its own value of mutex and so no mutual exclusion around will happen. (What was the intent here?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember this change. Let me check whether it's actually needed.

@@ -199,9 +205,9 @@ inline constexpr Float gamma(int n) {
inline PBRT_CPU_GPU Float AddRoundUp(Float a, Float b) {
#ifdef PBRT_IS_GPU_CODE
#ifdef PBRT_FLOAT_AS_DOUBLE
return __dadd_ru(a, b);
return __dadd_rn(a, b);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to round-nearest versus round-up breaks these functions, which are used to be able to compute conservative floating-point intervals. If this is because AMD GPUs don't support this rounding mode, then the CPU path below should be used there (which doesn't give as tight bounds but is still conservative.)

(This applies to the corresponding changes in following functions as well.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check with others whether there are some equivalents in HIP for these functions, but the ones they are in PBRT now are not supported. If there are not such functions, we will use the ones in the CPU path.

@@ -180,7 +180,7 @@ std::string SpectrumBilerpTexture::ToString() const {
}

// CheckerboardTexture Function Definitions
Float Checkerboard(TextureEvalContext ctx, TextureMapping2D map2D,
PBRT_CPU_GPU Float Checkerboard(TextureEvalContext ctx, TextureMapping2D map2D,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as possible I'd like to have these sorts of changes that clean up declarations but don't change functionality in a separate PR. This is motivated by 2bae563, which fixed an inadvertent bug in the first merge of the HIP-RT foundations in 5374fba that broke the CUDA build. It's just easier to chase those down if commits are more clearly broken into foundations that (shouldn't) change functionality and changes that are expected to add/change functionality.

SampledWavelengths lambda) const {
#ifdef PBRT_IS_GPU_CODE
assert(!"Should not be called in GPU code");
CHECK(!"Should not be called in GPU code");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to have these asserts promoted to CHECKs for consistency, but again would be good to have in a "foundations" PR.

Copy link
Owner

@mmp mmp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with the review. Directionally all looks good! I would like to break it into one more "foundations" PR that includes things like the PBRT_CPU_GPU declaration changes and changes that don't change functionality like the TextureArray type. Then otherwise just a few open questions for the main PR in the above comments.

.gitignore Outdated
@@ -2,6 +2,7 @@
.#*
#*#
src/build
src/ext/hiprtSdk/hiprt
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove? (Not sure if this is generally applicable.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should not be included.

@@ -0,0 +1,24 @@
From 56933471af36147e1032fbbc7912ca0088797b78 Mon Sep 17 00:00:00 2001
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be included?

Copy link
Contributor Author

@jammm jammm Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm IIRC it was because libdeflate wasn't compiling due to some issue with BitScanReverse not being declared. Perhaps the CUDA SDK included the relevant intrin.h but not in the HIPCC path. It's applied here

COMMAND cd ${CMAKE_CURRENT_SOURCE_DIR}/src/ext/libdeflate && git apply ${CMAKE_CURRENT_SOURCE_DIR}/0001-MSVC-HIP.patch > nul 2> nul & exit 0

A newer libdeflate commit seems to include the intrin.h though. Should we update the submodule to this commit 8f3c3f0000c6a09943e34908654f0489489b6047 then ? ebiggers/libdeflate@8f3c3f0#diff-cad417885447534a89122739a27dcf3c0e4b4629f37befb66117c31dddb50f0aR35

#ifdef __HIPCC__
using TextureArray = cudaArray_t;
#else
using TextureArray = cudaMipmappedArray_t;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again as far as breaking things into smaller steps: could we have a separate PR that just rewrites the existing CUDA functionality to be in terms of the new TextureArray type, without any of the HIPCC additions, then have the HIPCC changes subsequently come in and generalize things?

PBRT_CONST RGBColorSpace *RGBColorSpace_DCI_P3;
PBRT_CONST RGBColorSpace *RGBColorSpace_Rec2020;
PBRT_CONST RGBColorSpace *RGBColorSpace_ACES2065_1;
PBRT_GPU __constant__ RGBColorSpace *RGBColorSpace_sRGB;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent with PBRT_CONST defined in pbrt.h is to handle the messiness of device-side constant buffers. Does that not work here? In general I'd rather keep using PBRT_CONST but potentially #define it differently for AMD targets in pbrt.h. (Related, does changing it to effectively __device__ __constant__ as done here work on NVIDIA? I assume so but we should be sure.)

Copy link
Contributor

@meistdan meistdan Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PBRT_CONST is defined as __device__ const in pbrt.h not __constant__. I need to check the actual error message, but I remember HIP was complaining in hipMemcpyToSymbol and hipMemcpyFromSymbol functions, expecting the constant memory.

constexpr unsigned int type = TypeIndex<T>();
#if defined(PBRT_IS_WINDOWS) && defined(__HIPCC__)
uint64_t aptr = (iptr & tagMask) >> 5ull;
bits = (iptr & ptrMask) | aptr | ((uint64_t)type << (tagShift + 2ull));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little unclear to me what's going on with the tricky pointer tagging stuff here. You're generally using tagShift + 2 here for HIPCC, though have decreased tagShift by 2 when it's defined below. Can this be cleaned up? (I'm not sure if there are other places where you're depending on it having a smaller value though.)

Copy link
Contributor

@meistdan meistdan Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the HIP pointers uses internally the top bits for some purposes on Windows. This shifts PBRT tags to lower bits to avoid conflicts.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Generally speaking, let's just try to centralize the bit operations in a small number of well-commented blocks of code, since it's all pretty tricky stuff.

void *ptr() { return reinterpret_cast<void *>(bits & ptrMask); }
void *ptr() {
#if defined(PBRT_IS_WINDOWS) && defined(__HIPCC__)
unsigned int aptr = (bits >> tagShift) & 3;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this has gotten complicated with the workaround, how about not duplicating the code but changing the non-const ptr() method to just be something like return const_cast<void *>(((const TaggedPointer *)this)->ptr(); (Or an equivalent cleanup so that the code isn't duplicated.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants