-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HIP-RT support for rendering on AMD GPUs #473
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: Aaryaman Vasishta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments. Looks good in general! More to come in the coming days (though feel free to start addressing these!)
@@ -601,7 +600,8 @@ struct : public PtexErrorHandler { | |||
PtexTextureBase::PtexTextureBase(const std::string &filename, ColorEncoding encoding, | |||
Float scale) | |||
: filename(filename), encoding(encoding), scale(scale) { | |||
ptexMutex.lock(); | |||
std::mutex mutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work: each thread calling into this will get its own value of mutex
and so no mutual exclusion around will happen. (What was the intent here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember this change. Let me check whether it's actually needed.
@@ -199,9 +205,9 @@ inline constexpr Float gamma(int n) { | |||
inline PBRT_CPU_GPU Float AddRoundUp(Float a, Float b) { | |||
#ifdef PBRT_IS_GPU_CODE | |||
#ifdef PBRT_FLOAT_AS_DOUBLE | |||
return __dadd_ru(a, b); | |||
return __dadd_rn(a, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching to round-nearest versus round-up breaks these functions, which are used to be able to compute conservative floating-point intervals. If this is because AMD GPUs don't support this rounding mode, then the CPU path below should be used there (which doesn't give as tight bounds but is still conservative.)
(This applies to the corresponding changes in following functions as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check with others whether there are some equivalents in HIP for these functions, but the ones they are in PBRT now are not supported. If there are not such functions, we will use the ones in the CPU path.
@@ -180,7 +180,7 @@ std::string SpectrumBilerpTexture::ToString() const { | |||
} | |||
|
|||
// CheckerboardTexture Function Definitions | |||
Float Checkerboard(TextureEvalContext ctx, TextureMapping2D map2D, | |||
PBRT_CPU_GPU Float Checkerboard(TextureEvalContext ctx, TextureMapping2D map2D, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as possible I'd like to have these sorts of changes that clean up declarations but don't change functionality in a separate PR. This is motivated by 2bae563, which fixed an inadvertent bug in the first merge of the HIP-RT foundations in 5374fba that broke the CUDA build. It's just easier to chase those down if commits are more clearly broken into foundations that (shouldn't) change functionality and changes that are expected to add/change functionality.
SampledWavelengths lambda) const { | ||
#ifdef PBRT_IS_GPU_CODE | ||
assert(!"Should not be called in GPU code"); | ||
CHECK(!"Should not be called in GPU code"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to have these assert
s promoted to CHECK
s for consistency, but again would be good to have in a "foundations" PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with the review. Directionally all looks good! I would like to break it into one more "foundations" PR that includes things like the PBRT_CPU_GPU
declaration changes and changes that don't change functionality like the TextureArray
type. Then otherwise just a few open questions for the main PR in the above comments.
.gitignore
Outdated
@@ -2,6 +2,7 @@ | |||
.#* | |||
#*# | |||
src/build | |||
src/ext/hiprtSdk/hiprt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove? (Not sure if this is generally applicable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should not be included.
@@ -0,0 +1,24 @@ | |||
From 56933471af36147e1032fbbc7912ca0088797b78 Mon Sep 17 00:00:00 2001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to be included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm IIRC it was because libdeflate wasn't compiling due to some issue with BitScanReverse not being declared. Perhaps the CUDA SDK included the relevant intrin.h but not in the HIPCC path. It's applied here
Line 1033 in 1b53728
COMMAND cd ${CMAKE_CURRENT_SOURCE_DIR}/src/ext/libdeflate && git apply ${CMAKE_CURRENT_SOURCE_DIR}/0001-MSVC-HIP.patch > nul 2> nul & exit 0 |
A newer libdeflate commit seems to include the intrin.h though. Should we update the submodule to this commit 8f3c3f0000c6a09943e34908654f0489489b6047
then ? ebiggers/libdeflate@8f3c3f0#diff-cad417885447534a89122739a27dcf3c0e4b4629f37befb66117c31dddb50f0aR35
#ifdef __HIPCC__ | ||
using TextureArray = cudaArray_t; | ||
#else | ||
using TextureArray = cudaMipmappedArray_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again as far as breaking things into smaller steps: could we have a separate PR that just rewrites the existing CUDA functionality to be in terms of the new TextureArray
type, without any of the HIPCC additions, then have the HIPCC changes subsequently come in and generalize things?
PBRT_CONST RGBColorSpace *RGBColorSpace_DCI_P3; | ||
PBRT_CONST RGBColorSpace *RGBColorSpace_Rec2020; | ||
PBRT_CONST RGBColorSpace *RGBColorSpace_ACES2065_1; | ||
PBRT_GPU __constant__ RGBColorSpace *RGBColorSpace_sRGB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent with PBRT_CONST
defined in pbrt.h
is to handle the messiness of device-side constant buffers. Does that not work here? In general I'd rather keep using PBRT_CONST
but potentially #define it differently for AMD targets in pbrt.h
. (Related, does changing it to effectively __device__ __constant__
as done here work on NVIDIA? I assume so but we should be sure.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PBRT_CONST
is defined as __device__ const
in pbrt.h
not __constant__
. I need to check the actual error message, but I remember HIP was complaining in hipMemcpyToSymbol
and hipMemcpyFromSymbol
functions, expecting the constant memory.
constexpr unsigned int type = TypeIndex<T>(); | ||
#if defined(PBRT_IS_WINDOWS) && defined(__HIPCC__) | ||
uint64_t aptr = (iptr & tagMask) >> 5ull; | ||
bits = (iptr & ptrMask) | aptr | ((uint64_t)type << (tagShift + 2ull)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little unclear to me what's going on with the tricky pointer tagging stuff here. You're generally using tagShift + 2
here for HIPCC, though have decreased tagShift
by 2 when it's defined below. Can this be cleaned up? (I'm not sure if there are other places where you're depending on it having a smaller value though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the HIP pointers uses internally the top bits for some purposes on Windows. This shifts PBRT tags to lower bits to avoid conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Generally speaking, let's just try to centralize the bit operations in a small number of well-commented blocks of code, since it's all pretty tricky stuff.
void *ptr() { return reinterpret_cast<void *>(bits & ptrMask); } | ||
void *ptr() { | ||
#if defined(PBRT_IS_WINDOWS) && defined(__HIPCC__) | ||
unsigned int aptr = (bits >> tagShift) & 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this has gotten complicated with the workaround, how about not duplicating the code but changing the non-const ptr()
method to just be something like return const_cast<void *>(((const TaggedPointer *)this)->ptr();
(Or an equivalent cleanup so that the code isn't duplicated.)
This PR adds AMD GPU rendering support for PBRTv4 via HIP-RT. HIP-RT also runs on NVIDIA GPUs and also provides the H-PLOC BVH builder. More details can be found on HIP-RT website and its paper.
It also supports interactive mode for AMD GPUs via. HIP-OpenGL interop. Instructions on how to build on Windows and Linux can be found in the README.
FWIW, the build seems to break on my windows laptop if I use the master branch's ptex commit. However, it seems to build fine with the older commit 4cd8e9a6db2b06e478dfbbd8c26eb6df97f84483. This PR might fix it #472
cc @meistdan @mmp @wjakob