Skip to content

toytag/CUDA-Path-Tracer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUDA Path Tracer

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3

  • Zhenzhong Tang
  • Tested on: Windows 11 Pro 22H2, AMD EPYC 7V12 64-Core Processor (4 vCPU cores) @ 2.44GHz 28GiB, Tesla T4 16GiB (Azure)

Overview

A path tracer is a rendering technique that simulates the behavior of light in a scene. It uses Monte Carlo method to estimate the radiance at each pixel of an image by tracing the path of light through the scene. The algorithm is iterative and parallel in nature, so it runs intuitively and fairly well on CUDA. And it is able to simulate many effects that are difficult with other rendering techniques, such as soft shadows, depth of field, caustics, ambient occlusion, and indirect lighting.

Coffee Shop Stanford Bunny
Cow Gear

All the above scenes were rendered in $2000 \times 2000$ resolution with $1000$ samples per pixel and $8$ light bounces.

And we also have an interesting mirror scene, where a glossy sphere is placed in a cube with all sides mirrors, rendered in $2000 \times 2000$ resolution with $200$ samples per pixel and with different number of light bounces.

1 Bounce 8 Bounces 64 Bounces

Visual Features

Material System

Material system is adopted on the above rendering implementation from glTF Specification.

  • Albedo: The color of the material.
  • Metallic: The ratio of diffuse and specular reflection. A value of $0$ means the material is dielectric, and a value of $1$ means the material is metal.
  • Roughness: The roughness of the material. A value of $0$ means the material is perfectly smooth, and a value of $1$ means the material is pure diffuse reflection.
  • IOR: The index of refraction of the material. A value of $1$ means the material is vacuum, and a value of $1.5$ is a good compromise for most opaque, dielectric materials.
  • Opacity: The opacity of the material. A value of $0$ means the material is fully transparent, and a value of $1$ means the material is fully opaque.
  • Emittance: The emittance of the material. A value of $0$ means the material is not emissive, and a value greater than $0$ means the material is emissive, controlling the brightness of the material.

Using the metallic and roughness parameter, the material can be either dielectric or metal, and its reflection model can be either diffuse or specular. In addition with multiple importance sampling, the path tracer is able to render imperfect specular materials and produce better roughness effect. Also, by controlling the ior and opacity of dielectrics, the material can produce glass-like refraction with fresnel effect.

Diffuse Imperfect Specular Pure Specular
Dielectric
Metal
Glass

Use the material system, we can mimic many real-world materials. For example, we have the following materials like aluminum, titanium, stainless steel, and different glasses.

More of Metal More of Glass

And many Suzanne

Glass Aluminum Yellow Plastic Stainless Steel

All scenes were rendered in $800 \times 800$ resolution with $2000$ spp and $8$ light bounces.

Anti-Aliasing

Anti-aliasing can be achieved by jittering rays within a pixel. In the following example, the image is rendered in low resolution to exaggerate the effect.

AA OFF AA ON

All scenes were rendered in $200 \times 200$ (up-sampled to $800 \times 800$) resolution with $2000$ spp and $8$ light bounces.

Physically-Based Depth-of-Field

Depth-of-field can be achieved by jittering rays within an aperture. In the following example, the aperture is modeled as a circle with a radius of $0.5$ and the focal length is $10$.

DoF OFF DoF ON

All scenes were rendered in $800 \times 800$ resolution with $2000$ spp and $8$ light bounces.

Mesh Loading

With the help of tinyobjloader and tinygltf libraries, the path tracer is able to load .obj and .gltf files (partially). Thus, we can render more complex scenes, and put more stress on the path tracer.

Procedural Textures

Procedural textures can be achieved by using the barycentric interpolated uv coordinate of the intersection point. There is hardly any performance impact. Check out the following example.

Gradient Mario Checkerboard Mario

All scenes were rendered in $800 \times 800$ resolution with $1000$ spp and $8$ light bounces.

Open Image Denoise

Open Image Denoise is a high-performance, high-quality denoising library for ray tracing. It is able to remove noise from rendered images without losing much details. Additional filters like albedo and normal map are added to the denoiser pre-filter to improve the quality of the denoised image.

The denoiser is integrated into the system as a post-processing step. Triggered every fixed number of intervals, the denoised image is merged to the original image using exponential moving average. Although it does have a small impact on the performance, the quality of the image is significantly improved and we could get a much cleaner image with the much fewer number of samples.

The following example shows the effect of the denoiser with $200$ samples per pixel, a relatively low sample rate.

Denoiser OFF Denoiser ON

All scenes were rendered in $800 \times 800$ resolution with $200$ spp and $8$ light bounces.

Performance Features

Stream Compaction

When a ray hits a light source, goes into void, or exceeds the maximum number of bounces, it is terminated. The terminated rays are removed from the ray pool using stream compaction. Luckily the stream compaction algorithm is already implemented in the CUDA Thrust library, we can use thrust::remove_if or in this case thrust::partition to remove the terminated rays from the ray pool. Any custom work efficient stream compaction implementation with shared memory optimization and bank conflict avoidance, like Project2-Stream-Compaction, will do just fine.

First Bounce Caching

When anti-aliasing is not enabled, the first ray from the camera is always the same for every iteration. So we can cache the first ray bounce and reuse it for every iteration. However, this optimization is not particularly useful when more advanced visual features like anti-aliasing, depth-of-field, and motion blur are enabled.

Material Sorting

Additionally, we could sort the rays by material type to improve the performance. The idea is that rays with the same material type will have similar process time so that we can reduce warp divergence. However, this optimization later proved to be not very useful and even harmful to the performance. The reason is that the sorting process itself is very expensive compared to the performance gain. There is not no significant performance improvement to compensate for the cost.

Bounding Volume Hierarchy

Bounding volume hierarchy (BVH) is a tree structure on top of the scene geometry to accelerate ray tracing. The idea is to group the scene geometry into a hierarchy of bounding volumes, and the ray tracer can quickly discard the entire group of primitives if the ray does not intersect with the bounding volume.

Image from PBRT 4.3 is a good illustration of BVH true. The BVH is built using the equal count partition method, which tries to split the primitives into two equal sized groups. The BVH is built on the CPU in a linear buffer (heap like structure) and then copied to the GPU for ray tracing. BVH could be potentially optimized by utilizing SAH (Surface Area Heuristic) and building the BVH directly on the GPU.

Performance Analysis

Let's take a look at the performance of the path tracer with different features enabled. Stream compaction plays a important role in the correctness of the algorithm in addition to its performance benefits. So stream compaction will be enabled in all tests and we will use path tracer with only stream compaction method enabled as the baseline.

Cornell-Metal and Cornell-Glass are simple scenes with metal or glass material balls in side the cornell box. Those spheres is not in the mesh system therefore BVH has no effect on the performance.

More complex scenes like Mario-Metal and Mario-Glass are the same as the previous two scenes except that the spheres are replaced with Mario mesh. The mesh system is able to load .obj files or .gltf files (partially). The number of triangles in the Mario mesh is about 5,000.

Lastly the Teapot-Complex scene consists of 5 teapots with different materials. The teapots are loaded from .obj file. The teapots are uniformly placed in the scene and the total number of triangles is about 50,000.

Observations

  • Material Sorting is not a good optimization. It is slowing down the path tracer. The reason, as hinted before, is that the sorting process itself is very expensive compared to the performance gain. There is not no significant performance improvement to compensate for the cost.
  • First Bounce Caching has limited performance improvement. The reason is that the first bounce is only a small part of the entire ray tracing process. Besides, when enabling more advanced visual features like anti-aliasing, depth-of-field, and motion blur, the first bounce is no longer the same for every iteration.
  • BVH is mind-blowing. It is able to improve the performance by a factor of 15x or reducing the rendering time by 90%! BVH enables quick discard of groups of primitives if the ray does not intersect with the bounding volume. This is especially useful when the scene is complex and the number of primitives is large. Although BVH traversal requires additional global memory access, the performance gain is still significant.

Possible Improvements

  • Subsurface scattering
  • Wavelength dependent refraction
  • Volumetric rendering
  • Texture and normal map
  • Motion blur
  • Environment map
  • Better random number generator
  • BVH with SAH and BVH on GPU
  • Occupancy optimization
  • Shared memory optimization

References

  1. Physically Based Rendering: From Theory To Implementation
  2. glTF Specification and Example BxDF Implementation
  3. GPU-based Importance Sampling
  4. Axis-Aligned Bounding Box (AABB) intersection algorithm
  5. Iterative BVH Traversal with near $O(1)$ Memory
  6. Open Image Denoise

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 87.7%
  • C 11.0%
  • Other 1.3%