Deferred Path Tracing By Enscape

Posted on December 6, 2017January 16, 2018 by Thomas Schander

ProRender, Radeon ProRender, Radeon Rays, ray tracing

Here at Enscape we would like to share some insights as to how we designed a renderer that produces path traced real time global illumination and can also converge to offline rendered image quality.

The challenge

Enscape is a plugin for architectural design software like Autodesk® Revit®, SketchUp or Rhino. It enables architects to get a high quality real-time rendering from within the planning stage – without exporting or importing. Changes in the underlying BIM (Building Information Modeling) data, like a modification of the floor plan or the activation of a so called “design option” is reflected immediately.

Since CAD data is not specially prepared for real time rendering, global illumination is of high importance. Even for undetailed geometry without a lot of lights, it enables the viewer to grasp the underlying ideas and scales.

Ideally, we want a GI solution that can be scaled across different hardware capabilities and can even produce photorealistic, crisp images at offline quality – if given a bit more time.

Existing approaches

Since we want to change the time of day instantaneously and want very little loading times, light map baking is not an option.

Additionally, glass is a design element of high importance – which means, we need sharp and correct reflections. Therefore, solutions like Light Propagation Volumes are not suitable. Even Voxel Cone Tracing struggles with pixel sharp off-screen reflections.

For the previous version of our renderer, we used an automated cubemap placement algorithm. It placed cubemaps at positions where the average lengths or random rays that were cast from the cubemap were locally maximized. This was done at runtime and the cubemaps were updated when the lighting (or scene) had changed in their radius of influence. Combined with screen-space diffuse rays, it gave plausible results, but the amount of cubemaps had to be enormous to cover medium frequency GI phenomenas.

Real time path tracing

The core algorithm of classic Whitted-Style path tracing is brute force in nature, so we need to optimize here. We faced the following problems after naively implementing it:

High demand for memory bandwidth during traversal
Noise with varying spatial frequencies (like fireflies)
Large data storage demand on the GPU
Long BVH construction times
Risk of cache incoherence, thus destroying SIMT efficiency
BVH unfriendly geometry with a high variation of polygon sizes

Radeon Rays

We use Radeon Rays (formerly AMD FireRays) for the BVH construction and traversal. We vary between different tracing kernels across different hardware setups to achieve the best possible performance. We ported the stackless traversal algorithm to run on OpenGL® 4.2 hardware, so that the kernel runs in a plain fragment or vertex shader without the need for Compute Shader or OpenCL™.

Deferred path tracing

We completely avoid casting primary rays by using our G Buffer as a starting point.

We then accumulate ray samples across multiple frames to solve each fragment’s BRDF. A mapping function defines a distinctive ray direction for a group of four fragments (half resolution). We use a global low-discrepancy seed per frame and a local random value which comes from a noise texture. Using any plain pseudo-random sampling must be avoided, since it will lead to visible artifacts.

Figure 1: White equals a SS hit, black is a BVH trace request

First, we try to cast the diffuse rays in screen space. If we’re able to detect a hit in the last frame’s irradiance buffer, we even get a local multi bounce reflection for free. If a screen space intersection wasn’t found, we path trace the ray in our BVH (Fig 1). This optimization alone saves 30% of the first bounce of secondary rays, depending on the scene.

For specular, we basically do the same and vary the number of local samples based on the materials roughness and metallic-value.

Ray bundling

In order to get coherent data access, we bundle our rays into separate workgroups (or in terms of the OpenGL 4.2 implementation: different draw calls).

We bundle 12 world space direction segments into separate buckets, based on their generated ray direction. The usage of a tiled noise lookup texture during ray creation ensures that those buckets are roughly equally sized.

Tracing the directions separately both in screen space and for path tracing improves cache coherency.

BVH streaming

Building the BVH for a complete architectural scene will fail very quickly under real-time constraints and hardware limitations. Therefore, we only store a fraction of the scene at a time. We determine what objects to include into the BVH based on their estimated visual importance weighted against their BVH cost (Fig 2).

float objectScore = lightingRelevance * visibleVolume / polycount;
while(sumOfObjects > BVH_COMPLEXITY_THRESHOLD)
deleteObject(getWeakestScoreObject);

This BVH update is done on the CPU and continuously uploaded to the GPU. The update is divided into smaller chunks to avoid lags during memory transfer.

Mesh Preprocessing

Mesh preprocessing is usually necessary because high polygon objects occur pretty frequently and can slow down the BVH traversal. It’s important to only “shrink” objects during simplification to avoid self-occlusion.

Special objects like leaves are converted into procedurals, which are compactly stored in the BVH. In terms of vegetation, the self-occlusion is not too noticeable, but the overall shape and density has to be maintained to look plausible in reflections.

Figure 2: The generated BVH geometry to capture elements that are not in screen space

Direct light

For every ray intersection in our BVH, we calculate incoming sun light using a shadow map lookup. For artificial lights or emissive surfaces (which can be thousands per scene), lighting calculation during traversal is not feasible. Therefore, we bake the direct light (except the sunlight) into the BVH on a per vertex basis. We re-tessellate the geometry anyway, so it’s easy to enlarge the tessellation density at points where we expect direct light detail.

This gives us the advantage of reduced memory fetches for direct lights and also allows to change the sunlight with no special precomputation or update time, other than the usual sun shadow maps.

Specular

For the material’s reflective component, we sample at half resolution and use previous sampling results to combine them to a high-resolution image. The filtering is done BRDF aware, to keep the smearing and blurring artifacts at a minimum. For high quality outputs, we even create a refinement queue based on unexpected variance in a 3×3 pixel quad to get full, high resolution image quality.

Alpha Reflections

We support order independent transparency and want to use the path traced reflections on those surfaces as well. The challenge is the unpredictable layer depth and the required performance budget. We therefore render every layer in a deferred shading style and run our specular tracing in upsampled half resolution. We do not store a separate history buffer for each layer, so we have to accept a little blur to hide the missing history which would be required for a proper temporal upsampling to reach a higher quality.

Filtering

We use several temporal accumulation buffers to keep the neighborhood clamp window as small as possible. The new results are first combined with the accumulation buffer (Fig 3) before filtering to avoid smearing. Before filtering, we compute the expected radius in a local neighborhood to keep the amount of texture reads at a minimum.

Figure 3: Close-Up of unfiltered radiance

Performance

The critical point, besides overall rendering performance, is a content agnostic ray traversal cost. The screen space ray traversal performance is not dependent on the scene complexity and can be scaled in terms of sample count and march length. In most cases in architecture, the number of polygons in the BVH correlate with the traversal cost. Keeping the BVH complexity constant is therefore key. We measure the tracing and overall rendering performance at run time and adjust the allowed BVH complexity. We even adjust the image resolution to keep a steady frame rate. Multi bounce diffuse lighting however is currently only enabled on our Ultra profile.

Next steps

We continuously improve our real-time path tracer and intend to publish more details about it in the future. If you want to join us at our office in Karlsruhe (Germany), send an email to jobs@enscape3d.com.

Thomas Schander is the founder of Enscape and part of the rendering team. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

D3D12 Memory Allocator 1.0.0

Discovering the structure of RDNA

RDNA Shader Instruction Set Architecture document is now available

Radeon GPU Profiler 1.6

Radeon Cauldron, the new SDK framework

Radeon GPU Analyzer 2.2 for Direct3D®12 Compute

New Vulkan® Extensions in Driver 19.6.2

Using AMD FreeSync 2 HDR: Gamut Mapping

AMD at Digital Dragons and Vulkanised Conference

Microsoft PIX Introduces AMD-Integrated Plug-In with High Frequency Counter Graph

Radeon GPU Profiler 1.5.1

Using AMD Freesync 2 HDR: Tone Mapping

GDC 2019 Presentation Links

Using AMD FreeSync 2 HDR: Color Spaces

Radeon GPU Analyzer 2.1

GDC 2019 Presentations

Vulkan Memory Allocator 2.2

Ryzen Threadripper for Game Development – optimising UE4 build times

Radeon GPU Profiler 1.4

AMD GPU Services 5.3.0

New Compressonator 3.1 SDK for seamless integration into asset toolchains – and more!

Optimize your engine using compute @ 4C Prague 2018

CodeXL 2.6 is released!

Radeon GPU Profiler 1.3.1

Vulkan Memory Allocator 2.1

Radeon GPU Profiler 1.3

Decoding Radeon Vulkan versions

Porting your engine to Vulkan or DX12

ROCm Tensorflow 1.8 Release

Understanding GPU context rolls

Microsoft PIX Introduces AMD-Integrated Plug-In with Occupancy Data Graph

GDC 2018 Presentation Links

AMD GPU Services 5.2.0

Radeon GPU Profiler 1.2

Compressonator V3.0 Release Brings Powerful New 3D Model Features

TrueAudio Next Version 1.2 Now Posted to Github

Reducing Vulkan API call overhead

First steps when implementing FP16

GDC 2018 Presentation: Real-Time Ray-Tracing Techniques for Integration into Existing Renderers

V-EZ brings “Easy Mode” to Vulkan

Real-Time Ray Tracing with Radeon ProRender

GDC 2018 Presentations

TrueAudio Next is Now Integrated into Steam Audio

Radeon GPU Profiler 1.1.1

Radeon GPU Profiler 1.1.0

Deferred Path Tracing By Enscape

Compressonator V2.7 Release adds cross platform support and 3D Model compression with glTF v2.0

Radeon GPU Profiler 1.0.3

AMD GPU Services 5.1.1

CPU core count detection on Windows

Stable barycentric coordinates

Radeon GPU Profiler 1.0.2

AMD Vega Instruction Set Architecture documentation

Understanding Vulkan objects

Open-source Radeon ProRender

Radeon GPU Profiler 1.0

TressFX 4 Simulation Changes

Vulkan Memory Allocator 1.0

What’s new in HIP and HCC for ROCm 1.6

Compressonator V2.6 Release Adds HDR Tonemapping Compression, New Image Analysis Features

Developer Quick Start: MIOpen 1.0

Developer Quickstart: OpenCL on ROCm 1.6

Open and Shut: The Case for AMD’s Open-Source Machine Intelligence Software Stack

We ported CAFFE to HIP – and here’s what happened…

Vega Frontier : How to for developers

Vega Frontier : How to install the driver

Optimizing GPU occupancy and resource usage with large thread groups

DirectX12 Hardware Counter Profiling with Microsoft PIX and the AMD Plugin

CodeXL 2.3 is released!

Content Creation Tools and Multi-GPU

Capsaicin and Cream developer talks at GDC 2017

Compressonator V2.5 Release Adds Enhanced HDR Support

Live VGPR Analysis with Radeon GPU Analyzer

The Radeon Loom Stitching Pipeline

AMD LiquidVR MultiView Rendering in Serious Sam VR

Using Sub DWord Addressing on AMD GPUs with ROCm

TrueAudio Next Demo and Paper at GameSoundCon

Profiling video memory with Windows Performance Analyzer

GDC 2017 Presentations

AGS 5.0 – Shader Compiler Controls